Terminal Bench Expert
Worldwide
Terminal Bench Expert Role- Terminal Bench Expert Employment Type - Remote 3-10 years of experience 3–10 years of experience in software engineering or relevant domains. Strong debugging, reasoning, and analytical skills Full-time. 40 hours per week with an overlap of 4 hours with PST. What does day-to-day look like: • Design high-quality Terminal-Bench task ideas and specifications. • Develop complex tasks requiring reasoning, investigation, and debugging. • Write clear task descriptions, solution approaches, and verification logic. • Define deterministic, outcome-based evaluation criteria. • Identify realistic failure modes, edge cases, and operational constraints. • Create tasks that challenge AI systems while remaining solvable by experts. • Collaborate with reviewers to refine task quality and difficulty. • Contribute expertise across one or more specialized domains. Required Skills: • 3–10 years of experience in software engineering or relevant domains. • Strong debugging, reasoning, and analytical skills. • Good understanding of system design, workflows, and dependencies. • Ability to analyze complex systems across multiple layers. • Experience with production systems, pipelines, or large-scale workflows. • Strong technical writing and documentation skills. • Exposure to LLMs, agentic systems, or AI evaluation frameworks. • Experience reviewing technical specifications or designing validation logic. Domains (Any of the following): • Software Engineering & Code Operations • Debugging & Codebase Navigation • System Administration & Shell Workflows • File & Text Processing Pipelines • Data Engineering (ETL & Data Pipelines) • Database & SQL Operations • Machine Learning Pipelines & MLOps • Post-training & Model Finetuning Workflows • AI Evaluation & Benchmarking Systems • Retrieval, Search & Ranking Systems • GPU / Systems Performance Optimization • Distributed Systems & Infrastructure • Cloud & Platform Engineering • DevOps & CI/CD Systems • Build & Dependency Management • Scientific & Numerical Computing • Simulation & Optimization Systems • Formal Methods & Theorem Proving • Document & Structured Data Processing (PDFs, Excel, etc.) • Media Processing (Video, Audio, Images via CLI tools) • Programmatic Graphics & Design (SVG, layout, rendering) • Data Visualization & Reporting Workflows • Geospatial & Spatial Data Processing • Time-series & Forecasting Systems • Security, Forensics & Reverse Engineering • Cybersecurity & Vulnerability Analysis • Networking & API Integration Workflows • Automation & Multi-step Toolchain Orchestration • CLI Tooling & Developer Tool Workflows • Version Control & Git Workflows • Observability, Logging & Monitoring • Storage Systems & File Systems • Finance & Accounting Workflows • Quantitative Finance & Risk Modeling • Legal & Compliance Workflows • Healthcare & Clinical Data Processing • Supply Chain & Logistics Operations • Marketing & Growth Analytics • CRM & Sales Operations • HR & Recruiting Analytics • Consulting & Strategy Modeling • Investment Workflows • Operations Research & Decision Optimization • Benchmark Infrastructure, Adapters & Harness
$1,052.00
Fixed-price- ExpertExperience Level
- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:5 to 10
- Last viewed by client:3 weeks ago
- Interviewing:5
- Invites sent:20
- Unanswered invites:14
About the client
- India12:16 PM
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by