Senior Agentic AI Engineer
Worldwide
We're hiring a senior Agentic AI Engineer on a project-based contract to audit and harden our production personalization engine. You'll work directly with our Head of Product (Chris) and engineering team to take working prototypes to production-ready quality before November 2026. This is a senior, project-based engagement (not full-time). Estimated 30–60 hours per month, ~3 months, with the possibility of extension. We're building a career-intelligence and upskilling platform serving learners across MENA and Africa. We deliver outcomes — completed cohorts, secured placements, career progression — for government training contracts, university partnerships, and large-employer partnerships. What you'll do: We've prototyped a personalization engine on top of our new Learn app. The basic framework exists to validate the concept; we want a senior engineer to make it production-grade. Specifically: 1. Architecture audit (Weeks 1–3) Review the personalization engine end-to-end: - Zone 1 — Surfaces: homepage canvas, in-course chat, events / jobs / comms cards - Zone 2 — Agents: LangGraph supervisor + vertical agents (Courses, Events, Jobs, Comms) - Zone 3 — Backends: MongoDB Atlas vector store, course content + transcript ingestion, employer pipeline, PostHog telemetry - Zone 4 — Self-improvement loop: scoring agent → user.md → tuned routing Output: written assessment of what's load-bearing, what scales, what needs to change before COP32 onboarding (~10K learners, Nov 2026). 2. RAG / retrieval design review (Weeks 2–5) - Chunking strategy for video transcripts + Markdown lessons - Hybrid retrieval (dense + sparse) recommendations - Reranking strategy - Per-user scope enforcement (no cross-tenant leakage) - Multilingual retrieval — Arabic + English minimum; Arabic word-error-rate is real - Vector store choice review — MongoDB Atlas today; pgvector under evaluation 3. Prompt + eval system (Weeks 4–8) - Supervisor routing prompts - Vertical-agent prompts (Courses, Jobs, Comms) - Structured-output validation - Regression eval set design + CI integration - Failure-mode catalog 4. Cost discipline (Weeks 6–10) - Per-feature + per-organization token budgets with enforcement (we bill at org level) - Cache strategy (we already cache canvas cards by content version) Multi-tier model routing — frontier (Sonnet / GPT-4o) for paid cohorts, mid-tier for general learners, cheap-tier or self-hosted for unverified Anti-abuse limits — topical-relevance classification, per-user daily caps Cost reporting to PostHog dashboard Our current stack - LLMs: OpenAI + Anthropic (multi-provider posture) - Orchestration: LangChain.js + LangGraph (supervisor + sub-agent pattern) - Vector store: MongoDB Atlas (pgvector swap under evaluation) - Backend: Node.js, Express, BullMQ workers, MySQL (Aurora) - Frontend: Next.js 15 App Router, React, Tailwind - Eval / observability: PostHog (in-flight); LangSmith or Helicone under evaluation What success looks like After 3 months we should have: - Architecture assessment - Working RAG/retrieval pass with documented quality metrics on a fixture eval set - Production-ready prompt + eval pipeline in CI - Adaptive AI framework that will improve based on learners' interactions - Scaffolding for evaluations / quality control - Cost projection for ~10K learners with cap + cache + tier strategy locked Who you are Required: - Built production agentic systems before — not just chat wrappers around an LLM API - Strong production RAG experience — chunking, retrieval quality, eval discipline - Comfortable in JavaScript / TypeScript (Node + Next.js) - LangChain.js / LangGraph experience, or strong opinions on alternatives you can defend - Cost-aware — you've watched LLM bills explode and have systems-level opinions about budgets, caches, multi-tier routing - Able to teach — engineers on the team are learning; you'll be expected to pair, write, and explain Strongly preferred: - Multilingual retrieval (especially Arabic) - Eval framework experience (LangSmith, Helicone, custom) - Vector store experience beyond Mongo (pgvector, Qdrant, Pinecone) - Worked on platforms (not just internal tools) — you've shipped to real users Engagement details Duration: ~3 months starting June 2026 Time commitment: project-scoped; estimate 30–60 hours per month Rate: senior contractor; competitive, scoped per project Location: anywhere — async-first Communication: weekly sync with Chris; ad-hoc Slack with team How to apply Send a short note about a production agentic system you built or significantly improved
- Less than 30 hrs/weekHourly
- 1-3 monthsDuration
- ExpertExperience Level
$30.00
-
$60.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:3 weeks ago
- Interviewing:4
- Invites sent:0
- Unanswered invites:0
About the client
- United StatesKings County12:22 AM
- $57K total spent73 hires, 0 active
- 1,276 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by