Terminal Bench Expert

Posted 3 weeks ago

Worldwide

Summary

Terminal Bench Expert Role- Terminal Bench Expert Employment Type - Remote 3-10 years of experience 3–10 years of experience in software engineering or relevant domains. Strong debugging, reasoning, and analytical skills Full-time. 40 hours per week with an overlap of 4 hours with PST. What does day-to-day look like: • Design high-quality Terminal-Bench task ideas and specifications. • Develop complex tasks requiring reasoning, investigation, and debugging. • Write clear task descriptions, solution approaches, and verification logic. • Define deterministic, outcome-based evaluation criteria. • Identify realistic failure modes, edge cases, and operational constraints. • Create tasks that challenge AI systems while remaining solvable by experts. • Collaborate with reviewers to refine task quality and difficulty. • Contribute expertise across one or more specialized domains. Required Skills: • 3–10 years of experience in software engineering or relevant domains. • Strong debugging, reasoning, and analytical skills. • Good understanding of system design, workflows, and dependencies. • Ability to analyze complex systems across multiple layers. • Experience with production systems, pipelines, or large-scale workflows. • Strong technical writing and documentation skills. • Exposure to LLMs, agentic systems, or AI evaluation frameworks. • Experience reviewing technical specifications or designing validation logic. Domains (Any of the following): • Software Engineering & Code Operations • Debugging & Codebase Navigation • System Administration & Shell Workflows • File & Text Processing Pipelines • Data Engineering (ETL & Data Pipelines) • Database & SQL Operations • Machine Learning Pipelines & MLOps • Post-training & Model Finetuning Workflows • AI Evaluation & Benchmarking Systems • Retrieval, Search & Ranking Systems • GPU / Systems Performance Optimization • Distributed Systems & Infrastructure • Cloud & Platform Engineering • DevOps & CI/CD Systems • Build & Dependency Management • Scientific & Numerical Computing • Simulation & Optimization Systems • Formal Methods & Theorem Proving • Document & Structured Data Processing (PDFs, Excel, etc.) • Media Processing (Video, Audio, Images via CLI tools) • Programmatic Graphics & Design (SVG, layout, rendering) • Data Visualization & Reporting Workflows • Geospatial & Spatial Data Processing • Time-series & Forecasting Systems • Security, Forensics & Reverse Engineering • Cybersecurity & Vulnerability Analysis • Networking & API Integration Workflows • Automation & Multi-step Toolchain Orchestration • CLI Tooling & Developer Tool Workflows • Version Control & Git Workflows • Observability, Logging & Monitoring • Storage Systems & File Systems • Finance & Accounting Workflows • Quantitative Finance & Risk Modeling • Legal & Compliance Workflows • Healthcare & Clinical Data Processing • Supply Chain & Logistics Operations • Marketing & Growth Analytics • CRM & Sales Operations • HR & Recruiting Analytics • Consulting & Strategy Modeling • Investment Workflows • Operations Research & Decision Optimization • Benchmark Infrastructure, Adapters & Harness

  • $1,052.00

    Fixed-price
  • Expert
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
LLMs
agentic systems
AI evaluation frameworks
Activity on this job
  • Proposals:5 to 10
  • Last viewed by client:3 weeks ago
  • Interviewing:
    5
  • Invites sent:
    20
  • Unanswered invites:
    14
About the client
Member since Jun 5, 2026
  • India
    12:16 PM

Explore similar jobs on Upwork

3D Design
Unreal Engine
3D Rigging
Character Design
3D Modeling
3D Visual Mockup for Apartment UnitsHourly‐ Renewed 9 months ago
3D Design
3D Rendering
Architectural Rendering
Visualization
3D Modeling

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo