Claude Code & Prompt Engineering Expert — multi-agent AI systems
Worldwide
We run a production multi-agent AI system: a fleet of agents that do real marketing work for real clients (research, creative, social, paid-media analysis, presentations). The platform agents run on GPT-5-class models; internally we build, orchestrate and automate heavily with Claude Code (skills, subagents, MCP servers, hooks, gates). It's live and used daily, so quality and reliability are everything. THE ROLE You'll own AI quality and agent engineering across both worlds. Day to day: - Prompt engineering & design: write, refactor and optimise agent prompts so they behave reliably — call the right tool at the right moment, follow the workflow, hit the gates, and never leak internal/technical detail to the end user. This is craft, and you're expert at it. - Claude Code engineering: build and improve agentic workflows in Claude Code — authoring and hardening skills, wiring MCP tools, designing subagent orchestration, hooks, and gates (deterministic checks that stop an agent before it does the wrong thing). You know this environment cold. - Behaviour debugging: when an agent hallucinates, repeats itself, picks the wrong tool, or surfaces something it shouldn't, you read the logs and execution traces, find the real root cause, and fix it at the prompt / tool / gate level. - Evals & regression testing: build and maintain evals so a change that fixes one thing doesn't quietly break another. Measure before/after with evidence, not vibes. - Tool / MCP wiring: slot new data sources and tools into agents and tune prompts so they use them efficiently (fewer wasted or failed tool calls). WHO YOU ARE This is a senior, high-ownership role. "Can write a prompt" is not enough — we need someone who owns outcomes. - Absolute Claude Code expert. You live in it: skills, subagents, MCP, hooks, gates, agentic orchestration. You've built real, reliable systems with it, not toy demos. - Expert prompt engineer with genuine craft and a strong mental model of why agents behave the way they do — tool-use, multi-step workflows, context, gating. - Deep understanding of AI models. You know the current Claude and GPT families, their strengths, failure modes, and how to get the best out of each. You pick the right model and reasoning settings for the job. - You judge the output, not the prompt. If a result looks wrong, it's wrong — whatever the instructions say. You trust your eye, push back, and take ownership of quality. We don't want someone who follows the prompt literally and ships a bad result. - High initiative, low hand-holding. You spot problems before we flag them, dig to the real root cause, and prove your work moved the needle. - Comfortable reading logs, traces and SQL (Postgres/Supabase). Python for scripting, evals and glue. NICE TO HAVE - Background in marketing / advertising / creative tooling. - Experience with eval frameworks (OpenAI Evals or similar). - Image / video generation pipelines and routing. HOW WE WORK Async-friendly and fast-moving, with some UK-hours overlap helpful. We use Linear for tickets and GitHub for code, and we value plain-English communication — explain things simply, no jargon walls. TO APPLY In your proposal, tell us briefly about: 1) A real agentic system you built or fixed in Claude Code (skills / MCP / subagents / gates) and what made it reliable. 2) A time you diagnosed a misbehaving LLM agent from logs or traces, and how you proved your fix actually worked. 3) How you decide which AI model and settings to use for a task, and how you design gates to stop an agent doing the wrong thing. A short Loom or a tight paragraph is perfect — we care about how you think, not cover-letter polish.
- Less than 30 hrs/weekHourly
- 6+ monthsDuration
- ExpertExperience Level
$40.00
-
$70.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:7 hours ago
- Interviewing:10
- Invites sent:30
- Unanswered invites:15
About the client
- United KingdomGuildford6:28 PM
- $267K total spent139 hires, 21 active
- 6,108 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by