Search Freelance Jobs on Upwork

Posted 2 weeks ago

Consult on Model Design

Hourly
Intermediate
Est. time: Less than 1 month, Less than 30 hrs/week

I am seeking an experienced ML engineer to provide insights on the design of a model I am planning to build. Your expertise in model design and architecture will be invaluable in helping me make informed decisions.

Posted last week

Senior AI solutions Architect & lead engineer - multi-agent intelligence platform

Hourly: $50.00 - $150.00
Expert
Est. time: 1 to 3 months, Less than 30 hrs/week

I want to build a private multi-model RAG-based Opportunity Intelligence Agent. It should support document ingestion, opportunity-specific workspaces, vector search, source citations, multi-model routing across OpenAI, Claude, Perplexity, and possibly DeepSeek, and generate strategic recommendations from both uploaded files and live web research. This is intended to become a reusable base agent capable of knowledge retrieval, web research, multi-model orchestration, document analysis, citation generation, and agent clonding and configuration. It will be used for analyzing & strategy development for project opportunities, responding to RFPs, and proposal assistance, as well as other applications.

Posted 2 weeks ago

Senior AI Developer

Hourly: $30.00 - $150.00
Expert
Est. time: More than 6 months, Less than 30 hrs/week

We are seeking a senior AI developer to build and enhance AI models for our business. The role involves developing, testing, and deploying AI solutions, as well as improving existing models to increase accuracy and performance. The ideal candidate should have strong experience in AI development and be able to work independently on complex projects.

Posted last week

Senior AI Infrastructure Architect - Multi-Model Routing Control Plane

Hourly
Expert
Est. time: 1 to 3 months, Less than 30 hrs/week

We need a senior architect to lead the design and build of a multi-model routing control plane, then guide a small senior team through the build. The control plane sits in front of a family of AI systems and decides, for every request (text, image, video), the cheapest path that still meets quality: cache, reuse, a small or local model, an on-device model, an open-weight model, a fine-tuned model, or a higher-cost frontier fallback. It must route not just across models but across compute: CPU, GPU, on-device, and edge. The north-star metric is the share of requests served without touching an expensive frontier GPU, and the resulting cost reduction on a representative workload. The ambition is to move the majority of eligible workload off frontier GPUs onto cheaper paths without degrading output. This is not a chatbot project and it is not a thin wrapper over hosted APIs. You will own the architecture, define the routing logic, and lead execution. We need someone who thinks in systems, not individual model calls. Context (so you understand what we need delivered) The router is one component of a larger AI platform, not a standalone product. It must be model-agnostic: open-weight, fine-tuned, and proprietary models get swapped in and out behind a stable interface without rearchitecting. You will coordinate with a separate team that owns the models you route to. The initial engagement is a 60 to 90 day POC with a working demo of the router as the goal, followed by technical leadership through the build. What You Will Own - Control plane architecture: request intake and normalization, classification, routing taxonomy, model-selection rules, fallback logic, cache and reuse rules, logging and telemetry, and the evaluation feedback loop. - Model-agnostic interface: clean, stable contracts so models and execution paths swap in and out without rework, and so the separate team that owns the models can work independently of the routing layer. - Cost optimization across compute, not just models: reduce unnecessary GPU usage while preserving quality, using exact and semantic cache, existing output reuse, lightweight and small-model routing, batching, CPU offload, on-device and edge execution where appropriate, and a clear fallback hierarchy. The explicit goal is to shift a large share of workload off frontier GPUs. Generative caching and reuse: caching text is straightforward. Caching generative image and video is not, since the same prompt should produce variation rather than an identical result. We need a credible approach to reuse at the asset or component level, not just for text. - Evaluation loop: a framework that scores output quality by content domain and flags weakness, so the training team can target improvements instead of retraining broadly. Track output quality against intent, failure modes, cost per route, latency per route, cache hit rate, fallback rate, and regeneration rate. - Execution plan and technical leadership: an architecture diagram, recommended POC scope, milestones, infrastructure assumptions, and risks that leadership can review, plus hands-on architecture review and task breakdown. You will lead a small senior team (up to 4 engineers) through the POC build. Ideal Background - You have led or architected production AI infrastructure involving several of the following: multi-model orchestration and LLM routing, multimodal AI, model serving, inference cost optimization, GPU cost reduction, CPU and on-device inference, open-source and fine-tuned model deployment, evaluation pipelines, semantic caching, and AI observability. - You have deployed in at least one constrained environment: on-prem, self-hosted, air-gapped, or data-residency-restricted. You know what breaks when you cannot lean on a single cloud. - You can lead. This is a technical lead role, so you will set architecture, break down work, review the team's output, and keep the build on track. Specific tools matter less than the ability to architect the system correctly and lead execution. We are not looking for someone who only builds basic chatbot workflows, only uses hosted APIs without understanding the underlying infrastructure, or works as a prompt engineer alone. Deliverables - The initial engagement should produce a control plane architecture blueprint, a routing taxonomy, a POC execution plan with milestones and success criteria, and an evaluation and feedback framework, with a working router demo as the 60 to 90 day target, followed by technical leadership of a small team through the build. Screening Questions - Describe the most relevant AI routing, model-serving, or inference infrastructure system you have personally designed or built. What was routed, what models or execution paths were involved, and what role did you own? - How would you design a router that decides whether a request should use cache/reuse, a smaller or local model, an open-weight or fine-tuned model, or a higher-cost frontier fallback, across both CPU and GPU? - For generative image or video requests, how would you approach caching or reuse when the same prompt should still allow variation? Please be specific. - What metrics and evaluation loop would you use to prove the router is reducing cost without degrading output quality, and to help a separate model-training team identify weaknesses? To Apply Answer the questions above to the best of your ability. Summarize your most relevant routing or inference-infrastructure work, link any repos or examples, give your high-level approach to a control plane that cuts GPU usage while preserving quality, and note your availability and whether you have led a small engineering team before.

Posted 6 days ago

LLM / RAG Architecture Consultant — Review & Advise on-Prem Local AI Build

Hourly
Expert
Est. time: Less than 1 month, Less than 30 hrs/week

We're building an internal AI system that runs entirely on our own hardware (no cloud inference) against our own company data. We have a working proof-of-concept and want to get the architecture right. We need an experienced consultant to review what we've built, pressure-test our decisions, and tell us where we're wrong. This is an advisory/validation role first — we have someone doing the hands-on work; what we want is a senior second opinion to make sure we're building this the right way. What we're running today: Inference: RTX 5090 (32GB, Blackwell), Ubuntu 24.04, running llama-server (llama.cpp + CUDA) serving Gemma 4 31B-it (Q4_K_M GGUF) at a 262,144 context window. Also hosts our MCP retrieval server, PostgreSQL, and Qdrant. Embeddings: separate machine with an RTX 3060 running vLLM serving Qwen3-Embedding-4B. RAG: hybrid retrieval — Postgres full-text search + Qdrant semantic search with RRF fusion, exposed through a custom MCP server with tool-calling. Data: ingesting our own internal operational data into Postgres + Qdrant. Planned stack: LiteLLM for model routing, n8n for automation, Open WebUI for the interface, Langfuse for observability, Vault or Infisical for secrets, Keycloak/Azure AD for SSO. What we need help with: Validating our two-machine split (inference vs. embeddings) and whether our VRAM/context budget holds up under real load — specifically whether a 256K context window is real and performant on a single 32GB card or just nominal. Model selection and routing strategy: which open-weight models for which tasks, and how to structure LiteLLM routes. RAG quality: chunking, embedding dimensionality, hybrid search tuning, reranking — making retrieval actually accurate on messy real-world data. Sanity-checking our overall architecture and telling us our blind spots. You should have done: Stood up local LLM inference in production — llama.cpp/llama-server and vLLM, not just Ollama on a laptop. You understand GGUF quantization (Q4_K_M, IQ-series), KV cache, KV-cache quantization, and how context length maps to actual VRAM consumption. Real fluency in GPU sizing math — given a model, a quant, and a context window, you can tell us whether it fits on a given card and what throughput to expect. Bonus if you've worked with Blackwell / sm_120a. Built production RAG — vector DBs (Qdrant, pgvector), hybrid search, RRF fusion, embedding model selection, reranking, evaluation. Worked with agentic/tool-calling systems and ideally MCP servers. Know the open-weight model landscape (Gemma, Qwen, Llama, Mistral, Phi, Nemotron, Hermes) and their licenses well enough to advise. Production ops: systemd, Docker, model gateways (LiteLLM or similar), observability (Langfuse), secrets management, SSO.

Posted 2 weeks ago

Computer Vision Engineer for Sports Card Recognition

Hourly
Intermediate
Est. time: 1 to 3 months, Less than 30 hrs/week

We are seeking a US-based computer vision and full stack developer to build a platform for sports card recognition. The project includes developing subscription management, dashboards, and user account features. The ideal candidate will have experience creating scalable applications and integrating computer vision capabilities into a user-friendly platform. Hiring: Computer Vision + Full Stack Developer for Sports Card Live Auction Overlay App (SaaS) 📌 Overview I’ve built an MVP of a real-time sports trading card scanning and comping overlay tool using Loveable.dev. The product helps buyers gain an edge during live auctions by instantly identifying cards and showing real-time market comps. Now I’m looking for a U.S.-based developer (or strong US-aligned freelancer) to take this from MVP → production SaaS. This is a subscription-based product, so I need someone who can help build something fast, accurate, scalable, and hard to replicate. 🧠 What the product does Users can: Capture or upload sports trading card images during live auctions (mobile + desktop) Instantly identify: Player Year / set Parallel / serial number Pull live market comps Display a real-time “buy / avoid / fair price” overlay The goal is speed + accuracy in live buying situations (seconds matter). ⚙️ What I already have MVP built in Loveable.dev Basic overlay + UI flow Initial comp logic concept Subscription idea (not yet fully implemented) 🛠️ What I need help building (Phase 1 → Scale) I’m looking for someone to help rebuild and harden the system into a real SaaS product: 1. Computer Vision / OCR Layer Card detection from images (mobile + desktop) OCR extraction (player name, set, serial numbers) Image recognition / matching to known cards Confidence scoring (very important — must avoid wrong matches) 2. Comp Engine (Core Value) Integrate or build system for: eBay sold listings 130point or similar comp sources Card Ladder / ALT-style pricing logic Return: last sale average comp trend direction liquidity estimate 3. Real-Time Overlay System Lightweight overlay that works during live auctions Low latency (fast lookup is critical) Works on mobile + desktop workflows 4. SaaS Infrastructure User accounts + authentication Subscription billing (Stripe) Usage tracking / rate limiting Admin dashboard 5. Scaling / Production Hardening API architecture improvements Database structure Performance optimization for real-time use Error handling for imperfect images 💡 Ideal candidate You should have experience with: Computer vision (OpenCV, YOLO, or similar) OCR pipelines AI image classification or similarity matching Full-stack SaaS development Stripe subscriptions API design (Node.js / Python / Next.js preferred) Huge plus if you have: Sports card / collectibles knowledge Experience with marketplaces or scraping pricing data Real-time / low-latency systems 🎯 Why this is interesting This is not a generic app. It’s: A real-time decision engine for high-value collectibles Built for a passionate, high-spend niche (sports cards) Subscription-based with strong monetization potential Designed for speed advantage in live auctions 📍 Requirements Must be U.S.-based (preferred for communication/time zone alignment) Must be able to work independently Must have strong GitHub/code examples Bonus if you’ve built AI or vision-based SaaS tools before 💰 Budget Open to: Hourly or fixed project 📩 To apply, please include: Relevant CV / GitHub Past AI / computer vision projects Any SaaS or startup experience Your approach to building a real-time image → comp system Availability per week

Posted 2 weeks ago

AWS Rekognition Custom Labels expert needed to improve plumbing parts image classification model.

Fixed price
Expert
Est. budget: $1,500.00

Need an AWS Rekognition Custom Labels expert to improve an image classification model for identifying plumbing parts. Current model accuracy is approximately 55%. Dataset consists of approximately 300+ images per item captured with a Foldio turntable. Need assistance with: Dataset review Training strategy Classification vs object detection recommendations Improving model accuracy to 90%+ AWS Rekognition Custom Labels implementation Experience with computer vision and AWS Rekognition required. Deliverables: Review the existing dataset Create a new image capture strategy Train the model Test the model Document the entire process 2 hours of screen-sharing sessions explaining everything

Posted 3 months ago

Extraordinary AI developer to manage our Base44 projects across apps and sales channels

Fixed price
Expert
Est. budget: $100,000.00

We’re hiring an extraordinary developer to own and grow our Base44 apps and sales products. around the future of AI discovery 1. Future of AI Discovery Core Demo – https://pull-discovery-core.base44.app/ You’ll evolve https://pull-discovery-core.base44.app/ into a beautiful, fluid, high‑performance, full-functional future of AI discovery demo following our advanced and sophisticated technical blueprint Integrate and orchestrate AI models incorporating LLM's, Search and World Models into a seamless experience with no visible seams between UX and intelligence. Own front‑end performance, responsiveness, and micro‑interactions—animations, transitions, and state changes should feel intentional and “alive,” not bolted on. Implement robust logging and analytics to understand how users explore, where they get stuck, and how the discovery engine can adapt dynamically. 2. Book Sales Engine – Six‑Channel Publishing System The second current Base44 project is a system that operationalizes our comprehensive sales plan across six channels. SEE THE COMPREHENSIVE BOOKSALES PLAN ATTACHMENT UNDERNEATH THIS POSTING You will: Translate a detailed multi‑channel publishing strategy (KDP optimization, physical bookstores via IngramSpark, other digital platforms, libraries, bulk institutional sales, and authority‑engine content marketing) into concrete workflows, tools, and dashboards. Build internal interfaces and automations to: Track metadata, pricing, and promotions across Amazon KDP and other platforms. Monitor campaigns across TikTok, Meta, LinkedIn, YouTube, newsletters, and partnerships. Surface KPIs like BSR, review velocity, ad spend, email growth, library adoptions, and bulk orders in a single, coherent view. Design light internal UIs that make it easy for non‑technical team members to update copy, add titles, trigger campaigns, and view performance without breaking anything. Implement robust, testable integrations between Base44, external APIs, and data sources to keep everything in sync as we scale from 8 to 22+ titles and beyond. Who You Are We’re not looking for a generic “full‑stack dev.” We’re looking for an unusual combination of visionary and doer: Creative technologist mindset – You think in systems and interfaces at the same time. You care deeply about how a product feels as well as how it works. Obsessed with execution – You’re disciplined, structured, and relentless about shipping. You break ambiguity into sprints, reduce complexity into tickets, and never let projects stall. Proactive owner – You don’t wait for instructions. You propose better ways to do things, flag risks early, and bring options—not problems—to every conversation. Strong product sense – You can balance ideal UX with realistic constraints and understand when to ship v1 vs. when to invest in polish. Comfortable with complexity – Multi‑channel distribution, layered data flows, and evolving requirements don’t scare you; they energize you. Ideal Skills & Experience You don’t need all of these, but you should recognize yourself in most: 5+ years building production web applications, ideally with a strong front‑end/UI focus. Deep experience with modern web stacks (React/Vue/Svelte or similar) and TypeScript, plus comfort with Node or comparable back‑end runtimes. Strong visual/UI instincts: experience collaborating with designers or owning design yourself for data‑rich interfaces and dashboards. Experience integrating AI/LLM APIs and retrieval systems into real products (RAG flows, multi‑step tool use, chat‑like interfaces, recommendation engines). Experience with analytics and experimentation: event tracking, funnel analysis, A/B testing. Familiarity with publishing, ecommerce, or multi‑channel marketing systems is a plus (KDP, IngramSpark, email platforms, ad platforms, analytics). Prior work in environments like Base44 or other low‑code/agentic platforms is a strong plus, but not required if you learn fast.

Posted last week

AI Engineer

Hourly: $5.00 - $10.00
Intermediate
Est. time: 1 to 3 months, Less than 30 hrs/week

I’m looking for an AI Engineer to help build an automated red-teaming product based on open-source models. This is a short-term, hands-on project for around 2 months, with an expected commitment of about 20 hours per week. The goal is to build a specialized red-teaming engine that can generate adversarial prompts across different risk domains, severity levels, and attack strategies — then automatically run those prompts against target AI models to identify bad cases, failure patterns, and safety gaps. 🔍 What you’ll work on Build red-teaming systems on top of open-source LLMs, including fine-tuning, prompt optimization, evaluation pipelines, and model orchestration. Design automated prompt generation workflows across risk domains such as self-harm, hate, violence, sexual safety, misinformation, fraud, cyber, and other high-risk areas. Generate prompts across different harm levels, from benign edge cases to policy-borderline and clearly unsafe scenarios, while maintaining structured taxonomies and evaluation criteria. Run automated tests against target models such as Gemma, Llama, Qwen, or other open-source / closed-source models to surface jailbreak patterns, over-refusal, under-refusal, and policy inconsistencies. Build feedback loops that turn model failures into stronger red-team prompts, improved eval sets, remediation recommendations, and continuous safety testing. 🧠 What I’m looking for Hands-on experience with open-source LLMs, fine-tuning, LoRA / QLoRA, RAG, model evaluation, and LLM inference pipelines. Familiarity with AI safety, red teaming, adversarial prompting, jailbreaks, safety evals, or trust & safety systems. Ability to build end-to-end systems, including data pipelines, model serving, eval harnesses, scoring, dashboards, and automation workflows. Bonus if you’ve worked on model safety, content moderation, policy evaluation, agentic testing, or automated eval infrastructure. ⏳ Project setup Duration: around 2 months Time commitment: about 20 hours per week Format: flexible / remote-friendly Stage: early-stage build, from 0 to 1 🚀 Why this is interesting This is not about manually writing red-team prompts one by one. The goal is to build a scalable system that can continuously generate, test, categorize, and learn from model failures — helping teams understand where AI models break, why they break, and how to improve them. If you enjoy working with open-source models, AI safety, red teaming, and fast 0-to-1 product building, I’d love to chat. Feel free to DM me if this sounds like you, or if you know someone who might be a good fit.

Posted 3 weeks ago

AI Agent Infrastructure Consultant — Parallel Agents Optimization (Python/Node)

Fixed price
Expert
Est. budget: $150.00

**Overview** We are a fast-growing SaaS company with a lean engineering team (~10 devs) utilizing a modern Python (FastAPI/Django) and Node.js backend, React frontend, and PostgreSQL stack. We have already deployed an initial multi-model agent stack—Claude Code, LiteLLM gateway, Git worktrees, and MCP integrations. We need an expert to run an intensive architecture review and optimization session for our current infrastructure. We are not looking for someone to build a full-time, weeks-long project from scratch. Instead, we need a seasoned engineer who has shipped this exact type of infrastructure end-to-end to audit our setup, identify architectural gaps, and guide our team on hardened implementation. This project must move fast. If your timeline is measured in weeks, please do not apply. We want someone who looks at this scope, jumps into a review session, and delivers actionable architectural guidance in days. This starts as a focused, urgent consultation. However, we expect ongoing advisory work—follow-ups, architecture adjustments, and enhancement reviews—as the AI tooling landscape shifts. For the right engineer, this will turn into a recurring relationship. We are completely open to a fixed price per milestone or an hourly structure. **What You Need to Have Actually Shipped and Can Review (Not Just Read About)** * **Full Agentic Coding Harnesses:** The entire loop: orchestrator → subagent → CI gate → merge loops. * **Isolation Layers:** Configured execution layers (such as E2B, Modal, or secure Docker runtimes) as isolated sandboxes for AI-generated code. * **Parallel Claude Code Sessions:** Managed multiple simultaneous subagents on scoped tasks via Git worktrees. * **Self-Hosted LiteLLM Gateways:** Routing to multiple models (Claude, GPT, Gemini, DeepSeek). * **MCP Server Infrastructure:** Connected file system, PostgreSQL, Atlassian, and Slack tool layers for active agents. * **Agent Framework Structures:** Used CLAUDE.md, COMMON\_MISTAKES.md, subagent role definitions, hook scripts, and settings.json. * **Human-in-the-Loop Orchestration:** Built Plan Mode or equivalent approval gates before agent execution. * **Multi-Agent Frameworks:** 7-agent feature factory patterns or frameworks like LangGraph, CrewAI, or Autogen. * **Durable Workflow Engines:** Applied Temporal, n8n, or similar tools for long-running agent workflow execution. * **Mechanical Quality Gates:** Treating CI green as the ultimate gate for agent output quality. \[[1](https://manveerc.substack.com/p/ai-agent-sandboxing-guide)\] **Our Current Stack (What you are reviewing)** * **Backend:** Python (FastAPI / Django) & Node.js (TypeScript) * **Frontend:** React (Next.js) * **Database & ORM:** PostgreSQL / Prisma / SQLAlchemy * **Infrastructure:** Docker Compose, AWS (ECS/EKS) * **CI/CD:** GitHub Actions / GitLab CI * **AI Layer:** Claude Code with shared `.claude/` directory, CLAUDE.md, and LiteLLM gateway in Docker * **MCP:** Atlassian (Jira/Confluence), custom PostgreSQL MCP server, Slack * **Workflow Automation:** Temporal / n8n * **QA Automation:** Playwright / Autonoma **Scope of Work (Review & Advisory Only)** 1. **Comprehensive Audit:** Audit our current agent harness and identify architectural gaps against a production-grade standard. 2. **Sandbox Strategy Consultation:** Review our environment strategy to ensure highly secure, isolated execution runtimes for agent code runs. 3. **Workflow Hardening Review:** Evaluate our parallel agent workflow setup (Git worktrees, subagent role configs, hook scripts, and settings lockdown). 4. **CI Pipeline Integration Strategy:** Advise on wiring our sandbox execution layer into the existing CI pipeline so agent-executed code runs in clean snapshots, not live infra. 5. **Architectural Runbook:** Deliver an optimization report / documented standard that our backend lead can easily own and execute going forward. **How to Apply** Skip the generic pitch. Show us something real to be considered: 1. A GitHub repo, architecture diagram, or Loom walkthrough of an agentic harness you have actually shipped. 2. Specific tools from our stack you have personally configured (E2B, LiteLLM, Claude Code, etc.). 3. One sentence explaining the hardest problem you solved to get full agent loops running reliably. 4. Your availability to conduct this high-impact architectural review session this week.

Jobs Per Page: