Claude Code & Prompt Engineering Expert — multi-agent AI systems

Posted 2 days ago

Worldwide

Summary

We run a production multi-agent AI system: a fleet of agents that do real marketing work for real clients (research, creative, social, paid-media analysis, presentations). The platform agents run on GPT-5-class models; internally we build, orchestrate and automate heavily with Claude Code (skills, subagents, MCP servers, hooks, gates). It's live and used daily, so quality and reliability are everything. THE ROLE You'll own AI quality and agent engineering across both worlds. Day to day: - Prompt engineering & design: write, refactor and optimise agent prompts so they behave reliably — call the right tool at the right moment, follow the workflow, hit the gates, and never leak internal/technical detail to the end user. This is craft, and you're expert at it. - Claude Code engineering: build and improve agentic workflows in Claude Code — authoring and hardening skills, wiring MCP tools, designing subagent orchestration, hooks, and gates (deterministic checks that stop an agent before it does the wrong thing). You know this environment cold. - Behaviour debugging: when an agent hallucinates, repeats itself, picks the wrong tool, or surfaces something it shouldn't, you read the logs and execution traces, find the real root cause, and fix it at the prompt / tool / gate level. - Evals & regression testing: build and maintain evals so a change that fixes one thing doesn't quietly break another. Measure before/after with evidence, not vibes. - Tool / MCP wiring: slot new data sources and tools into agents and tune prompts so they use them efficiently (fewer wasted or failed tool calls). WHO YOU ARE This is a senior, high-ownership role. "Can write a prompt" is not enough — we need someone who owns outcomes. - Absolute Claude Code expert. You live in it: skills, subagents, MCP, hooks, gates, agentic orchestration. You've built real, reliable systems with it, not toy demos. - Expert prompt engineer with genuine craft and a strong mental model of why agents behave the way they do — tool-use, multi-step workflows, context, gating. - Deep understanding of AI models. You know the current Claude and GPT families, their strengths, failure modes, and how to get the best out of each. You pick the right model and reasoning settings for the job. - You judge the output, not the prompt. If a result looks wrong, it's wrong — whatever the instructions say. You trust your eye, push back, and take ownership of quality. We don't want someone who follows the prompt literally and ships a bad result. - High initiative, low hand-holding. You spot problems before we flag them, dig to the real root cause, and prove your work moved the needle. - Comfortable reading logs, traces and SQL (Postgres/Supabase). Python for scripting, evals and glue. NICE TO HAVE - Background in marketing / advertising / creative tooling. - Experience with eval frameworks (OpenAI Evals or similar). - Image / video generation pipelines and routing. HOW WE WORK Async-friendly and fast-moving, with some UK-hours overlap helpful. We use Linear for tickets and GitHub for code, and we value plain-English communication — explain things simply, no jargon walls. TO APPLY In your proposal, tell us briefly about: 1) A real agentic system you built or fixed in Claude Code (skills / MCP / subagents / gates) and what made it reliable. 2) A time you diagnosed a misbehaving LLM agent from logs or traces, and how you proved your fix actually worked. 3) How you decide which AI model and settings to use for a task, and how you design gates to stop an agent doing the wrong thing. A short Loom or a tight paragraph is perfect — we care about how you think, not cover-letter polish.

  • Less than 30 hrs/week
    Hourly
  • 6+ months
    Duration
  • Expert
    Experience Level
  • $40.00

    -

    $70.00

    Hourly
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
AI Agent Development
API Integration
Activity on this job
  • Proposals:50+
  • Last viewed by client:7 hours ago
  • Interviewing:
    10
  • Invites sent:
    30
  • Unanswered invites:
    15
About the client
Member since Sep 15, 2015
  • United Kingdom
    Guildford6:28 PM
  • $267K total spent
    139 hires, 21 active
  • 6,108 hours

Explore similar jobs on Upwork

Software DeveloperHourly‐ Posted 7 months ago
ASP.NET MVC
Django
Python
AngularJS
JavaScript
jQuery
WordPress
Google Chrome Extension
React
CRM Development
Microsoft Dynamics 365
Microsoft Dynamics CRM
Microsoft Dynamics Development
Microsoft PowerApps
Single Sign-On
Build Marketplace on TokopediaHourly‐ Posted 4 weeks ago
PHP
HTML5
JavaScript
Web Development

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo