Senior Agentic AI Engineer

Posted 3 weeks ago

Worldwide

Summary

We're hiring a senior Agentic AI Engineer on a project-based contract to audit and harden our production personalization engine. You'll work directly with our Head of Product (Chris) and engineering team to take working prototypes to production-ready quality before November 2026. This is a senior, project-based engagement (not full-time). Estimated 30–60 hours per month, ~3 months, with the possibility of extension. We're building a career-intelligence and upskilling platform serving learners across MENA and Africa. We deliver outcomes — completed cohorts, secured placements, career progression — for government training contracts, university partnerships, and large-employer partnerships. What you'll do: We've prototyped a personalization engine on top of our new Learn app. The basic framework exists to validate the concept; we want a senior engineer to make it production-grade. Specifically: 1. Architecture audit (Weeks 1–3) Review the personalization engine end-to-end: - Zone 1 — Surfaces: homepage canvas, in-course chat, events / jobs / comms cards - Zone 2 — Agents: LangGraph supervisor + vertical agents (Courses, Events, Jobs, Comms) - Zone 3 — Backends: MongoDB Atlas vector store, course content + transcript ingestion, employer pipeline, PostHog telemetry - Zone 4 — Self-improvement loop: scoring agent → user.md → tuned routing Output: written assessment of what's load-bearing, what scales, what needs to change before COP32 onboarding (~10K learners, Nov 2026). 2. RAG / retrieval design review (Weeks 2–5) - Chunking strategy for video transcripts + Markdown lessons - Hybrid retrieval (dense + sparse) recommendations - Reranking strategy - Per-user scope enforcement (no cross-tenant leakage) - Multilingual retrieval — Arabic + English minimum; Arabic word-error-rate is real - Vector store choice review — MongoDB Atlas today; pgvector under evaluation 3. Prompt + eval system (Weeks 4–8) - Supervisor routing prompts - Vertical-agent prompts (Courses, Jobs, Comms) - Structured-output validation - Regression eval set design + CI integration - Failure-mode catalog 4. Cost discipline (Weeks 6–10) - Per-feature + per-organization token budgets with enforcement (we bill at org level) - Cache strategy (we already cache canvas cards by content version) Multi-tier model routing — frontier (Sonnet / GPT-4o) for paid cohorts, mid-tier for general learners, cheap-tier or self-hosted for unverified Anti-abuse limits — topical-relevance classification, per-user daily caps Cost reporting to PostHog dashboard Our current stack - LLMs: OpenAI + Anthropic (multi-provider posture) - Orchestration: LangChain.js + LangGraph (supervisor + sub-agent pattern) - Vector store: MongoDB Atlas (pgvector swap under evaluation) - Backend: Node.js, Express, BullMQ workers, MySQL (Aurora) - Frontend: Next.js 15 App Router, React, Tailwind - Eval / observability: PostHog (in-flight); LangSmith or Helicone under evaluation What success looks like After 3 months we should have: - Architecture assessment - Working RAG/retrieval pass with documented quality metrics on a fixture eval set - Production-ready prompt + eval pipeline in CI - Adaptive AI framework that will improve based on learners' interactions - Scaffolding for evaluations / quality control - Cost projection for ~10K learners with cap + cache + tier strategy locked Who you are Required: - Built production agentic systems before — not just chat wrappers around an LLM API - Strong production RAG experience — chunking, retrieval quality, eval discipline - Comfortable in JavaScript / TypeScript (Node + Next.js) - LangChain.js / LangGraph experience, or strong opinions on alternatives you can defend - Cost-aware — you've watched LLM bills explode and have systems-level opinions about budgets, caches, multi-tier routing - Able to teach — engineers on the team are learning; you'll be expected to pair, write, and explain Strongly preferred: - Multilingual retrieval (especially Arabic) - Eval framework experience (LangSmith, Helicone, custom) - Vector store experience beyond Mongo (pgvector, Qdrant, Pinecone) - Worked on platforms (not just internal tools) — you've shipped to real users Engagement details Duration: ~3 months starting June 2026 Time commitment: project-scoped; estimate 30–60 hours per month Rate: senior contractor; competitive, scoped per project Location: anywhere — async-first Communication: weekly sync with Chris; ad-hoc Slack with team How to apply Send a short note about a production agentic system you built or significantly improved

Less than 30 hrs/week
Hourly
1-3 months
Duration
Expert
Experience Level
$30.00
-
$60.00
Hourly
Remote Job
Ongoing project
Project Type

Skills and Expertise

Mandatory skills

Microcontroller Programming

Nice-to-have skills

Reverse Engineering

Embedded C

Activity on this job

Proposals:50+
Last viewed by client:3 weeks ago
Interviewing:
4
Invites sent:
0
Unanswered invites:
0

About the client

Member since Apr 27, 2019

United States
Kings County12:22 AM
$57K total spent
73 hires, 0 active
1,276 hours

Explore similar jobs on Upwork

Long-Term AI Automation Developer (Voice AI + AI Chatbots + Advan…Fixed-price‐ Posted 3 months ago

AI Agent Development

AI Implementation

Chatbot Development

Gen AI Developer (Contract)Fixed-price‐ Posted 1 month ago

AI Agent Development

Python

JavaScript

API

Node.js

Deep Learning

React

PostgreSQL

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.