AI Infrastructure Engineer for RAG Systems

Posted 4 weeks ago

Worldwide

Summary

We are seeking an AI Infrastructure Engineer to deploy, optimize, and maintain production-grade AI systems with a strong focus on Retrieval-Augmented Generation (RAG). The ideal candidate will have experience in cloud infrastructure, containerization, and CI/CD pipelines. You will work closely with our team to ensure the reliability and scalability of our AI systems. You will own the full stack: from GPU inference serving and vector database integration to production API endpoints. The ideal candidate has already shipped LLM infrastructure in production and understands the real-world challenges of GPU memory limits, multi-user concurrency, retrieval accuracy, and system reliability. Responsibilities Deploy and optimize RAG pipelines end-to-end — from document ingestion and chunking to embedding, vector retrieval, and LLM response generation Configure and run production inference servers (vLLM, llama.cpp, Ollama, TGI, TensorRT-LLM, or SGLang) for open-source LLMs Integrate deployed LLMs with existing RAG backends and vector databases (Pinecone, Qdrant, Chroma, Weaviate, or similar) Optimize GPU inference for VRAM usage, token latency, concurrency throughput, and inference speed Expose production-ready, OpenAI-compatible API endpoints for internal or external consumption Handle CUDA driver setup, GPU scaling decisions, and quantization strategies (GGUF, AWQ, GPTQ, EXL2) Build and maintain embedding pipelines, chunking strategies, and re-ranking workflows Set up monitoring, logging, alerting, and automatic restart/recovery for AI services Containerize AI workloads using Docker and deploy via CI/CD pipelines Advise on hardware upgrades, multi-GPU setups, and cost/performance tradeoffs Document deployment architecture and provide clear handover notes Skills: Strong, proven experience deploying RAG pipelines in production environments Hands-on experience with open-source LLMs: Gemma, Llama, Mistral, Qwen, or similar Proficiency with inference frameworks: vLLM, llama.cpp, Ollama, TGI, or TensorRT-LLM Experience with vector databases and embedding models (semantic search, context window optimization, metadata filtering) CUDA GPU optimization and Linux server environment experience Model quantization experience: GGUF, AWQ, GPTQ, or EXL2 Python proficiency — backend APIs using FastAPI, Flask, or equivalent Docker and containerized deployment experience Cloud infrastructure experience: AWS, GCP, or Azure (compute, storage, networking) Strong debugging skills across GPU, inference, and API layers

  • More than 30 hrs/week
    Hourly
  • 6+ months
    Duration
  • Intermediate
    Experience Level
  • $15.00

    -

    $20.00

    Hourly
  • Remote Job
  • Complex project
    Project Type
Skills and Expertise
Mandatory skills
Embedded System
Eagle
Nice-to-have skills
Microcontroller Programming
Activity on this job
  • Proposals:15 to 20
  • Last viewed by client:3 weeks ago
  • Interviewing:
    4
  • Invites sent:
    1
  • Unanswered invites:
    1
About the client
Member since Oct 1, 2025
  • IND
    Delhi5:56 PM
  • $359 total spent
    2 hires, 1 active
  • HR & Business Services
    Small company (2-9 people)

Explore similar jobs on Upwork

Gen AI Developer (Contract)Fixed-price‐ Posted 1 month ago
AI Agent Development
Python
JavaScript
API
Node.js
Deep Learning
React
PostgreSQL
Quantum Computing

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo