Re-architect & Build a Lean LLM Conversation Worker

Posted 2 weeks ago

Worldwide

Summary

# Help me rebuild the brain of an AI that sits with people in their hardest moments — lean, fast, and right Hi, and thank you for actually reading this. I'm a solo founder. For the last few months I've been building something I believe in more than anything I've worked on: an AI companion that meets people in the moments most apps ignore — at 3am when the head won't switch off, when something heavy is being carried and there's no one to call. It's not a chatbot and it's not a wellness tracker. It's a space that listens, remembers you across conversations, and responds like someone who actually knows you. It's live. Real people use it. On its best days, it does something rare. But I've hit a wall I can't get past alone, and I'm looking for someone who has genuinely built this kind of system before. ## How I built it A few honest words on the approach, because it shapes the job. I built this **solo and fast, with an AI pair-programmer** — iterating and stacking ideas as I learned. But the core bets were deliberate, not random: - **Two parallel memory systems.** A shared clinical **Knowledge RAG** (vetted frameworks) with **corrective grading (CRAG)** so the AI grounds itself instead of making things up — plus a **per-user Memory RAG** that learns the individual over time: their themes, the people who matter, the patterns underneath what they say. - **A "perception-first" design.** Before the main model answers, a smaller, faster model reads the emotional state of the message — the idea being to behave like a good listener who reads the room before speaking. - **Distilled memory, not transcripts.** After each conversation I extract what *mattered* — themes, patterns, what landed — instead of storing raw text. More private, and sharper for recall. - **Owned infrastructure, one model family.** Supabase + Cloudflare so I control the data, and Anthropic Claude for the voice, because the conversational quality is the whole product. The intent behind all of it: a warm, attentive presence that genuinely remembers you. Every piece was in service of that feeling. ## What it is (the stack, and what runs each turn) - **The product:** an AI mental-health **depth-conversation companion** — an *adjunct to human care, never a replacement for therapy*. In production, with real users. - **The stack:** Cloudflare Workers (the orchestration "brain"), Supabase + pgvector (memory and vectors), Anthropic Claude (Sonnet for the voice, Haiku for background work), OpenAI embeddings, a vanilla-JS PWA. - **What runs on every single turn, before the reply streams:** a crisis check → a "perception" pass that reads the emotional state → a second classifier that decides how memory is used → a four-stream retrieval over the user's stored memory → a clinical knowledge retrieval with self-correction (CRAG) → all of it rendered into a ~23,000-character context block → and *only then* the model answers. **Four model calls, ~35,000 tokens, ~7–8 seconds — per message.** ## Where I think the bottleneck is I'll give you my honest diagnosis, and I want you to confirm it or tear it apart. **I believe the bottleneck is the worker itself** — the single orchestration layer that does everything. There are two sides to it: - **It does too much, synchronously, before it speaks.** That long, mostly-*sequential* chain above — perception, a classifier, multi-stream retrieval, a structured-memory call, knowledge + CRAG, then assembling a 23K-character block — all runs *before* the first word streams. That's the ~7–8 seconds, and it's the most visible failure. - **It computes intelligence it then throws away.** The perception pass reads the moment *accurately* — and the reply often ignores it. So I pay for that work in latency and cost and don't get the benefit. The context block has also grown bloated and self-contradicting, which makes the model behave inconsistently. In short: **the system *knows* more than it *uses*, all in one heavy place.** That's where I think it's broken. You may see it differently — and that's exactly what I want to find out. ## What I need you to do Not an audit. Not a report. **I want you to design and build the lean, smart version** — a worker, or a genuinely different structure, that fixes this at the root. If you believe the right answer is a **completely different mental model** than my stack-of-classifiers-plus-giant-context-block approach, I want to hear it, and build toward it. What "fixed" looks like: - **Fast** — first word in a conversational range, not 7–8 seconds. - **Lean** — far fewer tokens per turn, a context layer that's tight and doesn't argue with itself. - **Coherent** — the AI's read of the moment actually shapes what it says. - **Right** — it recalls the correct thing, in the correct context, with a real sense of time (right now it can treat something you *just said you're about to do* as already done). I'm open to phasing it: a target design and a small proof-of-concept first, then the full build. ## Who I'm hoping to find - Someone who has **designed and shipped a real production LLM system — persistent memory, RAG, multi-step / agentic orchestration — in the last year**, and can show it. - Fluent in **latency engineering for LLM apps** (parallelizing the hot path, streaming, prompt caching, cutting synchronous work). - Real opinions about **context engineering** — knows when a big injected block is the disease, not the cure. - Comfortable with Cloudflare Workers + Supabase/pgvector + Anthropic (or clearly equivalent). ## How to apply (I'll be honest about what gets a reply) I read every word, and I'll only respond to applications that: 1. **Show me the production LLM memory/RAG/agent system you built in the last 12 months** — a link, a repo, or a real write-up. 2. **Speak to *my* situation, not a template** — one concrete thing you'd rebuild here and why, and whether you'd keep my approach or replace it. 3. **Back it up.** I'm looking for someone who has actually done this. Generic, copy-pasted proposals won't get a response — no hard feelings, I just won't. Also i will be getting allot of proposals, so at a certain point there is just that level of over-saturation. I will try my best to answer all, but usually first come first serve. Thank you! ## How we'd work I'll share the architecture and the problems openly (most of it is above). Deeper codebase access comes after we had a deeper conversation — the one piece I protect at first is the "voice," the part that makes it feel human.

  • Less than 30 hrs/week
    Hourly
  • 1-3 months
    Duration
  • Intermediate
    Experience Level
  • $15.00

    -

    $35.00

    Hourly
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
JavaScript
PHP
Nice-to-have skills
Android App Development
Activity on this job
  • Proposals:50+
  • Last viewed by client:2 weeks ago
  • Hires:
    1
  • Interviewing:
    1
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Mar 10, 2016
  • Germany
    Gundelsheim7:07 AM
  • $2.7K total spent
    10 hires, 3 active
  • 82 hours

Explore similar jobs on Upwork

Heath Rouse Balcony and stairsFixed-price‐ Posted 10 months ago
3D Design
3D Modeling
CAD
Adobe Photoshop
Autodesk 3ds Max
Autodesk Revit
Autodesk AutoCAD
Virtual Reality
3D Animation
Layout Design
Exterior Rendering
Interior Design
Animation
Lumion
Rhinoceros 3D
Structural Engineering

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo