ML experts needed to run GPU benchmarks
Worldwide
We’re looking for an experienced ML engineer or GPU benchmarking specialist to help design and run practical benchmark tests for cloud GPU infrastructure. This project is focused on real-world AI workloads, not just synthetic hardware tests. We want to benchmark GPUs across use cases such as LLM inference, RAG workloads, long-context prompts, batch document processing, fine-tuning, and possibly image/video or TTS workloads. The goal is to produce reliable, reproducible benchmark data that can be used in public-facing technical content, comparison pages, and research-style reports. What You’ll Help With: - Design a clear benchmark methodology for cloud GPUs. - Run LLM inference tests using tools such as vLLM, SGLang, TGI, GenAI-Perf, or similar. - Measure practical metrics like TTFT, TPOT, P95/P99 latency, throughput, GPU utilization, VRAM usage, cost per 1M tokens, and cost per completed task. - Help create benchmark workflows for real workloads such as RAG assistants, batch summarization, agent-style tasks, and fine-tuning. - Capture clean run metadata: GPU model, driver/CUDA version, runtime version, model settings, pricing assumptions, region/provider details, and failed runs. - Package results into clean CSV/JSON outputs with notes that a non-ML audience can understand. Ideal Candidate: You should have hands-on experience with at least some of the following: - GPU benchmarking for LLMs or AI workloads - vLLM, SGLang, TGI, TensorRT-LLM, or similar serving frameworks - NVIDIA GPUs, CUDA, `nvidia-smi`, DCGM, PyTorch - LLM inference benchmarking, fine-tuning, or RAG evaluation - Benchmark methodology, reproducibility, and performance analysis - Cloud GPU platforms or distributed GPU environments You do not need to write polished marketing content, but you should be able to explain results clearly and help us avoid misleading or unfair benchmark claims. Deliverables: - Recommended benchmark methodology - Benchmark scripts or clear runbooks - Raw benchmark outputs in CSV/JSON - Summary tables and key findings - Notes on limitations, anomalies, and reproducibility - Optional: recommendations for future benchmark tests This is a hands-on technical project with potential for ongoing work. Please apply with examples of previous GPU, ML, LLM, or infrastructure benchmarking work, and mention which tools you would recommend for this type of project.
- Less than 30 hrs/weekHourly
- 1-3 monthsDuration
- IntermediateExperience Level
$15.00
-
$35.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:20 to 50
- Last viewed by client:2 weeks ago
- Interviewing:18
- Invites sent:0
- Unanswered invites:0
About the client
- MalaysiaKuala Lumpur7:45 PM
- $12K total spent44 hires, 6 active
- 429 hours
- Sales & MarketingIndividual client
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by