Claude Code & Prompt Engineering Expert — multi-agent AI systems

Posted 2 days ago

Worldwide

Summary

We run a production multi-agent AI system: a fleet of agents that do real marketing work for real clients (research, creative, social, paid-media analysis, presentations). The platform agents run on GPT-5-class models; internally we build, orchestrate and automate heavily with Claude Code (skills, subagents, MCP servers, hooks, gates). It's live and used daily, so quality and reliability are everything. THE ROLE You'll own AI quality and agent engineering across both worlds. Day to day: - Prompt engineering & design: write, refactor and optimise agent prompts so they behave reliably — call the right tool at the right moment, follow the workflow, hit the gates, and never leak internal/technical detail to the end user. This is craft, and you're expert at it. - Claude Code engineering: build and improve agentic workflows in Claude Code — authoring and hardening skills, wiring MCP tools, designing subagent orchestration, hooks, and gates (deterministic checks that stop an agent before it does the wrong thing). You know this environment cold. - Behaviour debugging: when an agent hallucinates, repeats itself, picks the wrong tool, or surfaces something it shouldn't, you read the logs and execution traces, find the real root cause, and fix it at the prompt / tool / gate level. - Evals & regression testing: build and maintain evals so a change that fixes one thing doesn't quietly break another. Measure before/after with evidence, not vibes. - Tool / MCP wiring: slot new data sources and tools into agents and tune prompts so they use them efficiently (fewer wasted or failed tool calls). WHO YOU ARE This is a senior, high-ownership role. "Can write a prompt" is not enough — we need someone who owns outcomes. - Absolute Claude Code expert. You live in it: skills, subagents, MCP, hooks, gates, agentic orchestration. You've built real, reliable systems with it, not toy demos. - Expert prompt engineer with genuine craft and a strong mental model of why agents behave the way they do — tool-use, multi-step workflows, context, gating. - Deep understanding of AI models. You know the current Claude and GPT families, their strengths, failure modes, and how to get the best out of each. You pick the right model and reasoning settings for the job. - You judge the output, not the prompt. If a result looks wrong, it's wrong — whatever the instructions say. You trust your eye, push back, and take ownership of quality. We don't want someone who follows the prompt literally and ships a bad result. - High initiative, low hand-holding. You spot problems before we flag them, dig to the real root cause, and prove your work moved the needle. - Comfortable reading logs, traces and SQL (Postgres/Supabase). Python for scripting, evals and glue. NICE TO HAVE - Background in marketing / advertising / creative tooling. - Experience with eval frameworks (OpenAI Evals or similar). - Image / video generation pipelines and routing. HOW WE WORK Async-friendly and fast-moving, with some UK-hours overlap helpful. We use Linear for tickets and GitHub for code, and we value plain-English communication — explain things simply, no jargon walls. TO APPLY In your proposal, tell us briefly about: 1) A real agentic system you built or fixed in Claude Code (skills / MCP / subagents / gates) and what made it reliable. 2) A time you diagnosed a misbehaving LLM agent from logs or traces, and how you proved your fix actually worked. 3) How you decide which AI model and settings to use for a task, and how you design gates to stop an agent doing the wrong thing. A short Loom or a tight paragraph is perfect — we care about how you think, not cover-letter polish.

Less than 30 hrs/week
Hourly
6+ months
Duration
Expert
Experience Level
$40.00
-
$70.00
Hourly
Remote Job
Ongoing project
Project Type

Skills and Expertise

Mandatory skills

AI Agent Development

API Integration

Activity on this job

Proposals:50+
Last viewed by client:7 hours ago
Interviewing:
10
Invites sent:
30
Unanswered invites:
15

About the client

Member since Sep 15, 2015

United Kingdom
Guildford6:28 PM
$267K total spent
139 hires, 21 active
6,108 hours

Explore similar jobs on Upwork

Software DeveloperHourly‐ Posted 7 months ago

ASP.NET MVC

Django

Python

AngularJS

JavaScript

jQuery

WordPress

Google Chrome Extension

React

CRM Development

Microsoft Dynamics 365

Microsoft Dynamics CRM

Microsoft Dynamics Development

Microsoft PowerApps

Single Sign-On

Build Marketplace on TokopediaHourly‐ Posted 4 weeks ago

PHP

HTML5

JavaScript

Web Development

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.