Search Freelance Jobs on Upwork

Posted last week

Senior AI Infrastructure Architect - Multi-Model Routing Control Plane

Hourly
Expert
Est. time: 1 to 3 months, Less than 30 hrs/week

We need a senior architect to lead the design and build of a multi-model routing control plane, then guide a small senior team through the build. The control plane sits in front of a family of AI systems and decides, for every request (text, image, video), the cheapest path that still meets quality: cache, reuse, a small or local model, an on-device model, an open-weight model, a fine-tuned model, or a higher-cost frontier fallback. It must route not just across models but across compute: CPU, GPU, on-device, and edge. The north-star metric is the share of requests served without touching an expensive frontier GPU, and the resulting cost reduction on a representative workload. The ambition is to move the majority of eligible workload off frontier GPUs onto cheaper paths without degrading output. This is not a chatbot project and it is not a thin wrapper over hosted APIs. You will own the architecture, define the routing logic, and lead execution. We need someone who thinks in systems, not individual model calls. Context (so you understand what we need delivered) The router is one component of a larger AI platform, not a standalone product. It must be model-agnostic: open-weight, fine-tuned, and proprietary models get swapped in and out behind a stable interface without rearchitecting. You will coordinate with a separate team that owns the models you route to. The initial engagement is a 60 to 90 day POC with a working demo of the router as the goal, followed by technical leadership through the build. What You Will Own - Control plane architecture: request intake and normalization, classification, routing taxonomy, model-selection rules, fallback logic, cache and reuse rules, logging and telemetry, and the evaluation feedback loop. - Model-agnostic interface: clean, stable contracts so models and execution paths swap in and out without rework, and so the separate team that owns the models can work independently of the routing layer. - Cost optimization across compute, not just models: reduce unnecessary GPU usage while preserving quality, using exact and semantic cache, existing output reuse, lightweight and small-model routing, batching, CPU offload, on-device and edge execution where appropriate, and a clear fallback hierarchy. The explicit goal is to shift a large share of workload off frontier GPUs. Generative caching and reuse: caching text is straightforward. Caching generative image and video is not, since the same prompt should produce variation rather than an identical result. We need a credible approach to reuse at the asset or component level, not just for text. - Evaluation loop: a framework that scores output quality by content domain and flags weakness, so the training team can target improvements instead of retraining broadly. Track output quality against intent, failure modes, cost per route, latency per route, cache hit rate, fallback rate, and regeneration rate. - Execution plan and technical leadership: an architecture diagram, recommended POC scope, milestones, infrastructure assumptions, and risks that leadership can review, plus hands-on architecture review and task breakdown. You will lead a small senior team (up to 4 engineers) through the POC build. Ideal Background - You have led or architected production AI infrastructure involving several of the following: multi-model orchestration and LLM routing, multimodal AI, model serving, inference cost optimization, GPU cost reduction, CPU and on-device inference, open-source and fine-tuned model deployment, evaluation pipelines, semantic caching, and AI observability. - You have deployed in at least one constrained environment: on-prem, self-hosted, air-gapped, or data-residency-restricted. You know what breaks when you cannot lean on a single cloud. - You can lead. This is a technical lead role, so you will set architecture, break down work, review the team's output, and keep the build on track. Specific tools matter less than the ability to architect the system correctly and lead execution. We are not looking for someone who only builds basic chatbot workflows, only uses hosted APIs without understanding the underlying infrastructure, or works as a prompt engineer alone. Deliverables - The initial engagement should produce a control plane architecture blueprint, a routing taxonomy, a POC execution plan with milestones and success criteria, and an evaluation and feedback framework, with a working router demo as the 60 to 90 day target, followed by technical leadership of a small team through the build. Screening Questions - Describe the most relevant AI routing, model-serving, or inference infrastructure system you have personally designed or built. What was routed, what models or execution paths were involved, and what role did you own? - How would you design a router that decides whether a request should use cache/reuse, a smaller or local model, an open-weight or fine-tuned model, or a higher-cost frontier fallback, across both CPU and GPU? - For generative image or video requests, how would you approach caching or reuse when the same prompt should still allow variation? Please be specific. - What metrics and evaluation loop would you use to prove the router is reducing cost without degrading output quality, and to help a separate model-training team identify weaknesses? To Apply Answer the questions above to the best of your ability. Summarize your most relevant routing or inference-infrastructure work, link any repos or examples, give your high-level approach to a control plane that cuts GPU usage while preserving quality, and note your availability and whether you have led a small engineering team before.

Posted last week

Hermes Agent & AI DevOps Expert - MCP & Playwright Orchestration

Fixed price
Expert
Est. budget: $5,000.00

We are looking for an expert backend developer and automation engineer to extend an existing, production-grade Model Context Protocol (MCP) server and overhaul its orchestration layer. The headline correction for this project: the existing Lawfather MCP is to be retained and extended, not rebuilt. It already exposes deterministic, parameterized Playwright tools for every required county portal (District Clerk, HCSO, HCDAO) and a client database. Those backend tools are the reliable layer and are not the source of the instability this project exists to fix. The instability lives entirely in the orchestration layer — the model-driven layer that decides when and how to call the tools. The fix is to move deterministic control out of model-followed prose and into code, and to host the agent on an always-on machine with persistent memory. Core Project Principles • Extend, Don't Rebuild: Retain and extend the existing MCP; do not re-implement portal scrapers from scratch. • Code Over Prompts: Deterministic logic lives strictly in tool code, never in instructions the model must remember each session. • No Caller Loops: Batch operations must run to completion server-side. No operation may require the caller (model) to loop. • Agnostic Architecture: The system must remain model-agnostic and host-agnostic. No single provider — Anthropic, OpenAI, Z.ai/GLM, or Nous — may be a hard dependency. • Privilege First: Client data stays on owned hardware; the model is never the gatekeeper of which case a file belongs to. Existing Tool Inventory (To Be Inherited As-Is) The following tools already exist on the production MCP (containerized on a local Synology NAS) and are in daily use. Re-deriving their behavior is completely out of scope: • hcdc_get_docket: Court settings by date range + bar number (District Clerk). • hcdc_check_filings: Per case: standard defense filings present vs. missing. • hcdc_download_filings: Images-tab documents: bulk OR selective by filters; dest_subfolder; dry_run. Note: The parameterized download tools already cover most retrieval requests. "All filings," "this filing," "all subpoenas," "all resets," and "everything filed that day" are argument combinations on this tool, not separate features. • hcso_locate: Defendant custody location (facility / floor / pod) by SPN. • hcdao_grab_file: Download a single named file from the DA portal Files tab. • hcdao_download_discovery: Batch / delta discovery download from the DA portal. • hcdao_download_media_alert: Batch-download files listed in a 'New Media Available' portal email. • hcdao_case_summary: Scrape the Case Jacket quick summary / DAO narrative. • hcdao_plea_offer: Scrape current plea offer + full offer history. • hcdao_assigned_ada: Assigned ADA name / email / phone on a case. • lookup_client / list_clients: Resolve / list clients from the shared client database. Scoped Work (Paid Deliverables) 1. County Case Resolver (New Tool): Find a case from partial identifiers — any subset of (name, SPN, DOB, court, cause). Searches county systems (not just the local client DB). MUST return a ranked candidate list for the user to choose from; MUST NEVER auto-select. Wrong-defendant selection is a privilege failure, not a cosmetic bug. 2. Latest-Version Retrieval: Add scope=latest to hcdao_grab_file so 'most recent' selects the newest among supplements instead of the first match. 3. Async Transcribe Tool (Skill to Tool Promotion): Build a deterministic MCP tool using Gemini 3.1 Pro Preview for transcription, followed by a second pass that sends the transcript back with case context for cleanup (speaker mapping, defense-moment preamble). Long-running: implement as an async job (submit to job id to poll to fetch), NOT a synchronous call. 4. OCR Tool (Skill to Tool Promotion): Implement a readability check on ingest. If a document is not cleanly readable, FLAG it and ASK before sending to Gemini 3.1 Pro Preview for OCR. OCR must be gated and confirmed, never automatic. 5. Server-Side Batch Jobs: Move all chunk, loop, delta, and throttle logic OFF the caller and INTO the tool code. One call runs the batch to completion. 6. Queued HCDAO Fixes: For hcdao_download_discovery, add a portal_ids filter for targeted single-file pulls and a custom output-path / Drive-folder destination feature. Known Portal Quirks to Handle from Day One • hcdc_get_docket returns a broader date range than requested; results must be filtered to the requested window. • hcdao_download_discovery delta detection is blind to files organized into dated subfolders and must be explicitly handled. • Court DG7 does not surface through standard bar-number docket lookup and requires separate handling. • The Playwright Node.js driver subprocess can die silently while database tools respond; you must health-check the driver proactively. Orchestration, Host Layer, & Deployment Topology • Target Host: Hermes Agent (Nous Research) running as the persistent shell, providing persistent memory, the scheduler, and messaging surfaces. The MCP server will plug directly into it. • Agnostic LLM Routing: Default the agent/dispatch role to the most reliable tool-calling model (currently Claude Opus). Route bulk, non-critical generations (draft summaries, transcript cleanup) to a cheaper model (e.g., GLM-5.2). No provider may be hard-wired. Per-tool pins are allowed strictly for transcription/OCR tasks (pinned to Gemini 3.1 Pro Preview). • Memory Fencing: Hermes's persistent memory and learning loops must remain enabled to accumulate facts and user preferences. However, the agent must be strictly fenced from self-editing or rewriting its own mechanical execution paths (portals, downloads, filings), which must remain frozen in MCP tool code. • Hardware Deployment Infrastructure: • Always-on Brain: M1 Pro MacBook Pro (16 GB, mains-powered, lid open) running the Hermes gateway, Messages.app, and a BlueBubbles iMessage bridge. Must be fully automated via launchd services to handle headless crash recovery, auto-login, and sleep prevention (pmset autorestart / caffeinate). • Tools and Storage: Synology NAS (10.0.0.149) hosting the Lawfather MCP container, local client folders, and Drive sync. • Private Network: Tailscale mesh across all devices for secure remote access without open inbound ports. Acceptance Criteria for Sign-Off • No batch operation requires the caller to iterate. • The case resolver returns ranked candidates and never auto-selects. • Transcription runs seamlessly as an async two-stage job surviving multi-hour files without timing out. • OCR never fires automatically on low-readability files without gated confirmation. • Zero regressions on the existing MCP tool inventory. • The Resiliency Test: The full stack successfully restarts completely unattended after a host reboot or simulated power loss, and is reachable via iMessage/SMS immediately after. • Self-editing is fenced on mechanical download/filing paths. Hard Guardrails • Privilege: Downloads route strictly to the correct client folder; a wrong-case match is treated as a severe defect, not a warning. Privileged audio/discovery data stays on owned hardware where the chosen model allows. • Determinism: Repeatable steps live entirely in tool code, never in prompts. • Agnosticism: Model and host layers must remain fully swappable without modifying the core MCP tools. Before quoting "done," you will be expected to confirm live portal behaviors regarding District Clerk document labels, DA portal stable identifiers, and county search surfaces. How to Apply Please submit a proposal detailing your specific experience with MCP architectures, Playwright browser automation, and macOS/Docker DevOps automation. Anti-Bot Filtering: To prove you read this entire scope, please start your application with the phrase "PROTECT THE LAW" in all caps. Automated or generic copy-paste applications will be instantly rejected.

Posted 4 weeks ago

Technical Architecture Design for AI/Data Platform - PDF Deliverable

Hourly: $50.00 - $75.00
Expert
Est. time: Less than 1 month, Less than 30 hrs/week

We're looking for a Principal-level engineer to design the end-to-end technical architecture for an AI-powered platform and deliver it as a polished PDF document. The deliverable should cover system architecture (data pipelines, backend APIs, GenAI/RAG components), multi-cloud infrastructure (AWS/GCP), data flow and integration patterns, scalability and reliability strategy, security/compliance considerations (HIPAA, SOC 2), and clear architecture diagrams. We need a clear, well-structured document that engineering and stakeholders can use as the blueprint for implementation. Scope / Deliverable: A single, professional PDF document (architecture overview, diagrams, component breakdown, tech-stack recommendations, and a phased implementation plan).

Posted 3 quarters ago

Software Developer - GenAI

Hourly: $70.00 - $85.00
Expert
Est. time: More than 6 months, 30+ hrs/week

Company Overview Pay Ready is a leading provider of innovative payment solutions tailored for the property management industry. We help property owners and managers streamline financial processes and accounts receivable functions, including processing current and post-resident rent payments and recoveries. As we integrate Generative AI (GenAI) across our operations, we're seeking a Senior Software Developer to drive the development of AI-powered solutions that enhance both internal workflows and customer-facing applications. Position Overview Joining our team as a Senior Software Developer – Generative AI means being at the forefront of innovation, working on cutting-edge projects that are shaping the future of AI and machine learning. You'll have the opportunity to collaborate with top experts in the field, contributing to groundbreaking research and development that has real-world impact. We offer a dynamic and collaborative work environment where your ideas and contributions are valued, and where you'll have the resources and support needed to bring your vision to life. Being part of our team means embracing a culture that fosters continuous learning and professional growth, with access to ongoing training and development opportunities. You'll work on diverse and challenging projects, gaining valuable experience and expertise that will set you apart in your career. Key Responsibilities - Design and develop AI-driven applications that address both internal operational needs and external client requirements. - Utilize frameworks such as LangGraph and LangSmith to build, orchestrate, and monitor AI workflows. - Implement solutions that integrate seamlessly with existing systems, ensuring reliability and scalability. - Work in tandem with project managers and product owners to understand project scopes, timelines, and deliverables. - Participate in sprint planning, code reviews, and team meetings to ensure alignment and timely delivery of projects. - Provide technical insights and recommendations during the planning and execution phases. - Develop and refine AI models, ensuring they meet performance and accuracy benchmarks. - Monitor and analyze AI application performance, making necessary adjustments to optimize outcomes. -Stay updated with the latest advancements in AI and machine learning to incorporate best practices into development processes.

Posted 2 weeks ago

GenAI Engineer for Client

Hourly
Intermediate
Est. time: More than 6 months, 30+ hrs/week

We are seeking a skilled GenAI engineer to work with our client in a remote or Chicago-based capacity. The ideal candidate will have experience in developing and implementing AI solutions, with a strong understanding of machine learning and data analysis. Responsibilities include designing AI models, integrating AI into existing systems, and collaborating with cross-functional teams to enhance AI capabilities. If you have a passion for AI and a proven track record in delivering innovative solutions, we would love to hear from you.

Posted 3 weeks ago

Fix ML Model Inference Bug + Improve FastAPI Prediction Endpoint

Fixed price
Expert
Est. budget: $200.00

We have a small Python-based machine learning inference service built with FastAPI and scikit-learn. The model was trained on structured tabular data, but our prediction endpoint is currently failing because of feature mismatch errors between the training pipeline and incoming API payloads. We need an experienced ML/MLOps engineer to quickly debug the issue, clean up the preprocessing logic, and make the `/predict` endpoint work reliably again. The goal is not to retrain the full model or build a large system. We only need a focused fix: review the existing model artifact, inspect the expected feature columns, update the API preprocessing code, and provide a short explanation of what was wrong. Bonus if you can also add a simple test request example or basic validation for missing fields. This should be a quick one-time task for someone comfortable with Python, scikit-learn, Pandas, FastAPI, and ML deployment workflows.

Posted 3 months ago

Expert Ai Developer Needed

Hourly: $30.00 - $50.00
Expert
Est. time: 3 to 6 months, 30+ hrs/week

AI Developer Needed – Build Us a Marketing AI Agent We need a skilled developer to build an AI-powered Marketing Assistant for our business. **Core Tasks the Agent Will Handle:** - Appointment setting & lead qualification - Copywriting (emails, ads, social content) - Automated follow-up sequences - Lead research and CRM updates **Requirements:** - Experience with AI agent frameworks (LangChain, CrewAI, AutoGen, etc.) - Strong prompt engineering skills - Ability to integrate with our existing tools (CRM, calendar, email) - Past projects to show us – links or demos preferred **Budget:** Open to discussion based on scope **Timeline:** Looking to kick off within 1–2 weeks

Posted 2 weeks ago

Senior AI Developer

Hourly: $30.00 - $150.00
Expert
Est. time: More than 6 months, Less than 30 hrs/week

We are seeking a senior AI developer to build and enhance AI models for our business. The role involves developing, testing, and deploying AI solutions, as well as improving existing models to increase accuracy and performance. The ideal candidate should have strong experience in AI development and be able to work independently on complex projects.

Posted 3 weeks ago

Sr AI Engineer

Fixed price
Expert
Est. budget: $2,000.00

We are hiring an AI Engineer with strong hands-on experience building and shipping real AI products. Requirement: If you don't have a GitHub profile to share, this role is not a fit. What we’re looking for: • Strong experience in AI/ML engineering • Ability to build, test, and deploy production-ready AI systems • Practical experience working on real-world AI projects To apply: Please share your portfolio, past AI projects, and relevant work samples. Applicants without portfolio will be ignored.

Jobs Per Page: