Sovereign AI Routing Control Plane
Only freelancers located in the U.S. may apply.U.S. located freelancers only
We need a senior architect to design and build a multi-model routing control plane, then lead a small senior team through the build. The control plane sits in front of a family of AI systems and decides, per request (text, image, video), the optimal path across cost, quality, latency, business value, and sovereignty (data residency, rights, and cultural fit): cache and reuse, a small or on-device model, an open-weight, fine-tuned, or sovereign model, or a higher-cost frontier fallback. It routes across compute too: CPU, GPU, inference accelerators, on-device, and edge. Core KPIs: the share of eligible workload kept off frontier accelerators and the resulting cost reduction on a representative workload, plus sovereignty compliance, with no quality regression. This is not a chatbot and not a wrapper over hosted APIs. You own the architecture, define the routing logic, and lead execution. You think in systems, not individual model calls. Context The router is one component of a larger AI platform. It must be model-agnostic: open-weight, fine-tuned, and proprietary models swap in and out behind a stable interface without rearchitecting. A separate team owns the models you route to. The engagement is a 60 to 90 day POC with a working router demo (text-first, with a defined path to image and video), followed by technical leadership through the build. What you'll own Control plane: intake and normalization, classification, routing taxonomy, model-selection logic, fallback hierarchy, cache and reuse rules, telemetry, and the eval feedback loop. Routing that is learned and calibrated, not just static rules: predict per-query difficulty and expected quality, and escalate on confidence thresholds. Comfort with cascades and speculative decoding is expected. Routing across cost, quality, latency, and policy. In constrained environments some requests must stay local regardless of cost. Model-agnostic interface: clean, stable contracts so models and execution paths swap without rework, and the separate model team can work independently of the routing layer. Cost optimization across compute: exact and semantic cache, prefix/KV cache reuse, output reuse, batching, small-model routing, CPU offload, and on-device/edge execution, with a clear fallback hierarchy. The goal is to move most eligible workload off frontier accelerators without degrading output. Generative caching and reuse: caching text is easy; image and video are not, since the same prompt should produce variation rather than an identical result. We need credible reuse at the asset or component level, not just for text. Eval loop: scores output quality by domain and flags weakness so the training team can target fixes instead of retraining broadly. Track quality vs intent, failure modes, cost per route, latency per route, cache hit rate, fallback rate, and regeneration rate. Execution and leadership: architecture blueprint, POC scope, milestones, infra assumptions, and risks leadership can review, plus hands-on architecture review and task breakdown. You'll lead a small senior team, and one of your first deliverables is recommending its exact composition (see screening questions). Ideal background Led or architected production AI infrastructure across several of: multi-model orchestration and LLM routing, multimodal, model serving, inference cost and GPU reduction, CPU and on-device inference, open-source and fine-tuned deployment, cascades and speculative decoding, semantic and prefix caching, eval pipelines, and AI observability. Deployed in at least one constrained environment: on-prem, self-hosted, air-gapped, or data-residency-restricted. You know what breaks when you can't lean on a single cloud. Can lead: set architecture, break down work, review the team's output, and keep the build on track. Tools matter less than the ability to architect the system correctly and lead execution. Not a fit: basic chatbot workflows, hosted APIs only, or prompt engineering alone. Deliverables Control plane blueprint, routing taxonomy, POC plan with milestones and success criteria, and an eval/feedback framework, with a working router demo as the 60 to 90 day target, then technical leadership of a small team through the build. Screening questions The most relevant AI routing, model-serving, or inference infrastructure system you personally designed or built: what was routed, which models or execution paths, and what role did you own? How would you design a router that chooses between cache/reuse, a smaller or local model, an open-weight or fine-tuned model, or a frontier fallback, across CPU and GPU? Where do learned routing, cascades, or speculative decoding fit? For generative image or video requests, how would you approach caching or reuse when the same prompt should still allow variation? Be specific. What metrics and eval loop would you use to prove the router cuts cost without degrading quality, and to help a separate training team find weaknesses? Beyond yourself, what team would you staff to hit these deliverables in 8 weeks? Give the roles, seniority, and headcount, how you'd split the work, and flag any deliverable that 8 weeks and a team of roughly 4 engineers can't realistically cover. To apply Answer the five questions, summarize your most relevant routing or inference-infrastructure work (repos, writeups, talks, or architecture you can describe), and give your high-level approach to a control plane that routes across cost, quality, and sovereignty while preserving quality. Note your availability, your rate, whether you've led a small engineering team before, and the team you'd staff to hit the deliverables in 8 weeks.
- Less than 30 hrs/weekHourly
- 1-3 monthsDuration
- ExpertExperience Level
- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:15 to 20
- Interviewing:0
- Invites sent:0
- Unanswered invites:0
About the client
- United StatesAlpharetta7:16 PM
- $26K total spent73 hires, 5 active
- 740 hours
- Tech & ITIndividual client
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by