QA Engineer — Break the AI Operating System Before Our Clients Do

Posted 4 weeks ago

Worldwide

Summary

About This Role We're a transformation design studio building CXROS — an AI-powered operating system we run internally and install for clients. It's built with Claude, MCP (Model Context Protocol), and direct API integrations. No Zapier, no n8n, no Make. Our engineers are shipping fast. We need a dedicated QA person sitting alongside them — someone who reads the code well enough to know what should happen, then proves whether it actually does. You're not clicking buttons on a finished product. You're testing automations, MCP integrations, and AI agents that have to run unattended without breaking. This is QA for people who can debug. If you can only report "it doesn't work," this isn't the role. We need "it fails here, on this input, because of this — here's how to reproduce it." What You're Testing You'll be the quality layer under the Outcomes Machine — the operational backbone of CXROS: Transcript Intelligence — transcripts in, AI summaries out, filed to Notion. Does it file to the right place every time? Communications Intelligence — a Missive bridge unifying email, Slack, and Teams with AI-drafted replies. Does sync hold under real volume? Do drafts ever auto-send when they shouldn't? Routing Agent — AI content auto-filed by type, entity, and project. Where does it misroute, and why? Control Tower — operational health dashboard. Are the signals accurate or stale? PM Infrastructure — Notion databases for projects, tasks, time, and meetings. Detailed specs exist for all of it. Your job is to test against the spec, find where reality diverges, and hand the engineer a reproducible case. What We're Looking For: Must Have Reads Python or Node well enough to understand what's being tested · Systematic test design · Reproducible bug reports (steps, input, expected vs. actual) · Debugging — isolates root cause, not just symptoms · API testing (Postman, curl, or similar) Strong Signal Has tested AI/LLM or other non-deterministic output · Integration testing across multiple APIs · MCP or Claude familiarity · Regression discipline · Self-directed Valuable Reading logs and traces to localize a failure · OAuth / token-refresh edge cases · Clear written test docs · Pushes back when a spec is ambiguous Nice to Have Light scripting to automate repetitive test runs · Webhook / scheduled-job testing · CI familiarity The must-haves are a hard filter. We'd rather have a sharp debugger who's new to MCP than a checklist tester who's seen it. How to Apply Answer these — concise answers only: Walk us through a bug you found that others had missed. How did you isolate the root cause? You're testing an AI automation that drafts email replies — the output is different on every run. How do you decide whether a result is a pass or a fail? Show us how you'd write a bug report a developer can act on without asking you a single follow-up question. Use any real example. Project Details Rate: $10–$13/hr Hours: Part-time, minimum 15 hrs/week, ongoing Working with: Directly alongside the CXROS build engineers, under our Technical Lead pair. Location: LatinaAmerica and Pakistan preferred. Strong applicants in compatible timezones are welcome. Growth: More scope as more CXROS modules ship. About Us codeswitcher builds CXROS for ourselves and installs it for clients. Claude Code is our primary development environment. We write specs, build to them, and ship things that run without someone watching. Not demos — operations. QA is how we keep it that way.

  • Less than 30 hrs/week
    Hourly
  • 1-3 months
    Duration
  • Intermediate
    Experience Level
  • $10.00

    -

    $13.00

    Hourly
  • Remote Job
  • One-time project
    Project Type
Skills and Expertise
Mandatory skills
Annotated Screenshot
QA Engineering
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:2 weeks ago
  • Interviewing:
    0
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Aug 14, 2024
  • United States
    Santa Monica3:14 AM
  • $128K total spent
    79 hires, 33 active
  • 7,888 hours

Explore similar jobs on Upwork

Test Automation Framework
Automated Testing
JavaScript
Python
Auto-GPT
Desktop Application Testing
Web Testing
Bug Reports
Software Testing
Functional Testing
Product Stability
Manual Testing
Automated Testing
C#

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo