You will get Production-Grade RAG System for Your Documents. Full Cycle

Name: You will get Production-Grade RAG System for Your Documents. Full Cycle
Availability: InStock

Victor M.

Victor M.

Project details

You get a complete, ready-to-use RAG system not a prototype, not a demo,
but a real product deployed to your cloud and ready to grow with your business.

Full-cycle delivery: chat UI with source citations, enterprise backend with
swappable components, and Infrastructure as Code deployed to your cloud
with one-command deploys.

Your documents never leave your infrastructure. Every answer comes with
citations you can verify. Hybrid search (semantic + keyword) catches what
pure vector search misses. Role-based access controls who sees what. Rate
limits keep AI costs predictable.

Stack: Python 3.14, FastAPI (async), PostgreSQL 16 with pgvector + full-text
search (one database instead of three systems). React 19 + TypeScript
frontend. Terraform IaC, Docker, CI/CD on push. OpenAI, Claude, and Gemini
supported out of the box.

AI Algorithms

Large Language Model

AI Applications

AI Chatbot, AI Content Creation, AIOps, Natural Language Generation, Natural Language Understanding

AI Development Language

Python

AI Models

ChatGPT, GPT-4

What's included $3,000

These options are included with the project scope.

$3,000

Delivery Time 7 days
- AI Model Integration
- Database Integration
- Detailed Code Comments
- Setup File
- Source Code

About Victor

Senior Python Backend Engineer | Production RAG & AI Agents | GCP

89% Job Success

Bishkek, Kyrgyzstan - 7:26 am local time

I build production-grade RAG systems and LangGraph agents that survive real
traffic. Not demos that break when the first client uploads a malformed PDF.
15+ years as a Senior Python backend engineer, with the last year focused on
retrieval pipelines, evals, and agentic workflows deployed on Google Cloud.
I'm looking for long-term partners, not one-off gigs. Most of my career has
been multi-year relationships with teams who needed someone to own the
system, not just close tickets. My longest previous engagement on this
platform was 3,999 hours with a single client.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
MOST RECENT PROJECT — ChatAnyFile (live on GCP)
A self-hosted RAG application I built end-to-end as a reference
implementation. Running in production on Google Cloud Run, FastAPI backend,
React frontend, full Terraform IaC, automated CI/CD on push to main.
What's inside:
• Hybrid retrieval — pgvector (HNSW, cosine) + PostgreSQL full-text search.
One database instead of three separate systems. Lower cost, lower ops.
• Adaptive chunking and citation-grounded answers. Every response shows
document name, page number, and the exact quote — no more "the AI said
something but I can't verify it."
• Async document processing pipeline (Taskiq + Redis). Upload and forget,
query when ready.
• Swappable LLM providers (OpenAI, Anthropic, Gemini) via Strategy pattern.
Changing providers is one config line, not a rewrite.
• Multi-layer architecture — Routers → Services → Repositories → Unit of
Work. Pydantic v2 at every boundary. Three-level auth with RBAC
middleware.
• Three-tier test pyramid — unit (mocked repos) → integration (real DB with
rollback) → e2e (full HTTP flows with auth). Schema built via Alembic
migrations, not create_all() — tests mirror production exactly.
• Multi-stack Terraform: shared network → VM instance group → application.
Spot VM with MIG auto-healing, stateless boot disk, persistent data disk.
Live at chat-any-file.merkushov.dev. Source code and the formal
architecture spec available on request.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
WHAT I BUILD FOR CLIENTS
▸ Production RAG pipelines
FastAPI + pgvector + OpenAI/Claude. Hybrid search, citation-grounded
answers, Langfuse observability, and an eval harness that measures
groundedness and retrieval precision, because "it seemed to work on my
three test questions" is not a production signal. Your documents stay in
your infrastructure.
▸ LangGraph multi-agent systems
Multi-step workflows with explicit state tracking, human-in-the-loop
checkpoints for high-stakes decisions, tool calling, and retry logic.
I write agents you can debug six months later not black boxes that
work until they don't.
▸ GCP-native Python backends
Cloud Run, Cloud SQL, Secret Manager, Cloud Build, Artifact Registry,
Compute Engine, Terraform IaC. 4+ years running production workloads on
GCP as my primary cloud, including an Airflow migration from Cloud
Composer to Compute Engine.
▸ Rescuing stuck AI projects
Inherited a messy LangChain/LangGraph codebase from a previous
developer? Token costs spiraling? Hallucinations? I do architecture
audits with concrete fix plans not generic "refactor everything"
advice.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STACK
Python 3.14 · FastAPI (async) · SQLAlchemy 2.0 · Pydantic v2 · PostgreSQL ·
pgvector · LangChain · LangGraph · Langfuse · OpenAI API · Anthropic Claude ·
Taskiq · Redis · Docker · Terraform · GCP (Cloud Run, Cloud SQL, Cloud Build,
Secret Manager, Compute Engine, Cloud DNS) · Caddy · Nginx · GitHub Actions
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HOW I WORK
• Spec before code. Every project starts with a written technical
specification. Fewer surprises, predictable scope, faster delivery.
• Weekly demos so you see actual progress, not commit hashes.
• Tests as a default, not an afterthought. 70-80% coverage on my recent
MVP work.
• Scope I can keep. I will tell you honestly if your timeline or budget
is unrealistic before we sign, not after.
• Bishkek (GMT+6). Comfortable overlap with EU mornings and US afternoons.
Async-first for the rest.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
BACKGROUND
15+ years building production backends before specializing in RAG and
LLM systems, I shipped code in fintech, real estate, advertising data,
social media analytics, and enterprise systems.
Recent full-time roles:
• LeadPlus (Japan, remote) — Senior Backend Engineer on GCP + Airflow,
4+ years. Refactored 70-140 data pipelines across 7-10 data providers,
eliminated incidents, migrated Airflow from Cloud Composer to Compute
Engine.
• Commercial real estate MVP (3 months, 2025-2026) — end-to-end backend
and infrastructure for a platform with chatbot-driven lead processing.
60k LoC, 78% test coverage, PostgreSQL full-text search across 6+
tables, one-command release pipeline. Delivered on schedule.
• WaveMetrix (California, USA, remote) — 2.5 years on

Steps for completing your project

After purchasing the project, send requirements so Victor can start the project.

Delivery time starts when Victor receives requirements from you.

Victor works on your project following the steps below.

Revisions may occur after the delivery date.

Discovery & Scoping

Understand your documents, users, and use cases. Agree on features, cloud provider, and timeline

Backend & RAG Pipeline

Build the API, document processing pipeline, hybrid search (vector + full-text), and LLM integration

Review the work, release payment, and leave feedback to Victor.

What's included $3,000

RAG chatbot over your document

Chat UI with source citations. Hybrid search. Deployed to your cloud

Delivery Time 7 days
- AI Model Integration
- Database Integration
- Detailed Code Comments
- Setup File
- Source Code

7 days delivery — Jul 8, 2026

Revisions may occur after this date.

Upwork Payment Protection

Fund the project upfront. Victor gets paid once you are satisfied with the work.

You will get Production-Grade RAG System for Your Documents. Full Cycle

Let a pro handle the details

Let a pro handle the details

Project details

AI Algorithms

AI Applications

AI Development Language

AI Models

What's included $3,000

About Victor

Senior Python Backend Engineer | Production RAG & AI Agents | GCP

Steps for completing your project

After purchasing the project, send requirements so Victor can start the project.

Victor works on your project following the steps below.

Discovery & Scoping

Backend & RAG Pipeline

Review the work, release payment, and leave feedback to Victor.

What's included $3,000