AI Infrastructure Architect for Multi-Model Router and Inference Cost Optimization

Posted 3 days ago

Only freelancers located in the U.S. may apply.U.S. located freelancers only

Summary

We are building a confidential AI platform and need a senior architect to lead the design and build of a multi-model routing layer. The router sits in front of multiple AI systems and decides, for every request (text, image, video), the cheapest path that still meets quality: cache, reuse, small or local model, open-source model, fine-tuned model, or a higher-cost fallback. The north-star metric is the percentage of requests served without hitting an expensive frontier GPU, and the resulting cost reduction on a representative workload. This is not a chatbot project. You will own the architecture, define the routing logic, and lead execution alongside a separate model-training team. We need someone who thinks in systems, not individual model calls. What You Will Own - Router architecture: request intake and normalization, classification, routing taxonomy, model-selection rules, fallback logic, cache and reuse rules, logging and telemetry, and the evaluation feedback loop. - The interface must be model-agnostic, so fine-tuned and open-source models can be swapped in and out without rearchitecting. - Cost optimization: reduce unnecessary GPU usage while preserving quality, using exact and semantic cache, existing output reuse, lightweight model routing, batching, local or edge execution where appropriate, and a clear fallback hierarchy. - Caching text is straightforward, but caching generative image and video is not, since the same prompt should produce variation rather than an identical result. You need a credible approach to reuse at the asset or component level, not just for text. -Evaluation loop: a framework that scores output quality by content domain and flags weakness, so the training team can target improvements instead of retraining broadly. Track output quality against intent, failure modes, cost per route, latency per route, cache hit rate, fallback rate, and regeneration rate. - Execution plan and leadership: an architecture diagram, recommended POC scope, milestones, infrastructure assumptions, and risks that leadership can review, plus hands-on architecture review and task breakdown to guide the engineering team through the build. Ideal Background - You have led or architected production AI infrastructure involving several of the following: multi-model orchestration and LLM routing, multimodal AI, model serving, inference cost optimization, GPU cost reduction, open-source and fine-tuned model deployment, evaluation pipelines, semantic caching, and AI observability. - Strong candidates may have experience with Python and a modern serving and infrastructure stack such as vLLM, Triton, Ray Serve, BentoML, Kubernetes, Docker, vector databases, Redis or similar caching layers, and evaluation tracking with MLflow or Weights and Biases. Specific tools matter less than the ability to architect the system correctly and lead execution. We are not looking for someone who only builds basic chatbot workflows, only uses hosted APIs without understanding the underlying infrastructure, or works as a prompt engineer alone. Deliverables - The initial engagement should produce a router architecture blueprint, a routing taxonomy, a POC execution plan with milestones and success criteria, and an evaluation and feedback framework, followed by technical leadership through the POC build. Screening Questions 1. Describe the most relevant AI routing, model-serving, or inference infrastructure system you have personally designed or built. What was routed, what models or execution paths were involved, and what role did you own? 2. How would you design a router that decides whether a request should use cache/reuse, a smaller model, an open-source or fine-tuned model, or a higher-cost fallback? 3. For generative image or video requests, how would you approach caching or reuse when the same prompt should still allow variation? Please be specific. 4. What metrics and evaluation loop would you use to prove the router is reducing cost without degrading output quality, and to help a separate model-training team identify weaknesses? To Apply Answer the questions above to the best of your ability. Summarize your most relevant routing or inference-infrastructure work, link any repos or examples, give your high-level approach to a router that cuts GPU usage while preserving quality, and note your availability.

  • More than 30 hrs/week
    Hourly
  • 1-3 months
    Duration
  • Expert
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
Machine Learning
MLOps
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:yesterday
  • Interviewing:
    1
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Aug 2, 2015
  • United States
    Alpharetta5:07 AM
  • $25K total spent
    72 hires, 5 active
  • 619 hours
  • Tech & IT
    Individual client

Explore similar jobs on Upwork

Computer Science InternshipFixed-price‐ Posted 3 weeks ago
Content Writing
Mathematics Tutoring
Writing
Chemistry
AI Instructor for Teaching AI SkillsFixed-price‐ Posted 4 weeks ago
Artificial Intelligence
Machine Learning
Generative AI
Prompt Engineering
LLM Prompt Engineering

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo