ML experts needed to run GPU benchmarks

Posted 2 weeks ago

Worldwide

Summary

We’re looking for an experienced ML engineer or GPU benchmarking specialist to help design and run practical benchmark tests for cloud GPU infrastructure. This project is focused on real-world AI workloads, not just synthetic hardware tests. We want to benchmark GPUs across use cases such as LLM inference, RAG workloads, long-context prompts, batch document processing, fine-tuning, and possibly image/video or TTS workloads. The goal is to produce reliable, reproducible benchmark data that can be used in public-facing technical content, comparison pages, and research-style reports. What You’ll Help With: - Design a clear benchmark methodology for cloud GPUs. - Run LLM inference tests using tools such as vLLM, SGLang, TGI, GenAI-Perf, or similar. - Measure practical metrics like TTFT, TPOT, P95/P99 latency, throughput, GPU utilization, VRAM usage, cost per 1M tokens, and cost per completed task. - Help create benchmark workflows for real workloads such as RAG assistants, batch summarization, agent-style tasks, and fine-tuning. - Capture clean run metadata: GPU model, driver/CUDA version, runtime version, model settings, pricing assumptions, region/provider details, and failed runs. - Package results into clean CSV/JSON outputs with notes that a non-ML audience can understand. Ideal Candidate: You should have hands-on experience with at least some of the following: - GPU benchmarking for LLMs or AI workloads - vLLM, SGLang, TGI, TensorRT-LLM, or similar serving frameworks - NVIDIA GPUs, CUDA, `nvidia-smi`, DCGM, PyTorch - LLM inference benchmarking, fine-tuning, or RAG evaluation - Benchmark methodology, reproducibility, and performance analysis - Cloud GPU platforms or distributed GPU environments You do not need to write polished marketing content, but you should be able to explain results clearly and help us avoid misleading or unfair benchmark claims. Deliverables: - Recommended benchmark methodology - Benchmark scripts or clear runbooks - Raw benchmark outputs in CSV/JSON - Summary tables and key findings - Notes on limitations, anomalies, and reproducibility - Optional: recommendations for future benchmark tests This is a hands-on technical project with potential for ongoing work. Please apply with examples of previous GPU, ML, LLM, or infrastructure benchmarking work, and mention which tools you would recommend for this type of project.

Less than 30 hrs/week
Hourly
1-3 months
Duration
Intermediate
Experience Level
$15.00
-
$35.00
Hourly
Remote Job
Ongoing project
Project Type

Skills and Expertise

Mandatory skills

MLOps

Model Testing & Optimization

Activity on this job

Proposals:20 to 50
Last viewed by client:2 weeks ago
Interviewing:
18
Invites sent:
0
Unanswered invites:
0

About the client

Member since Feb 22, 2016

Malaysia
Kuala Lumpur7:45 PM
$12K total spent
44 hires, 6 active
429 hours
Sales & Marketing
Individual client

Explore similar jobs on Upwork

Quantum Computing Consultant – High-Dimensional Combinatorial Opt…Hourly‐ Posted 3 weeks ago

Quantum Computing

Data Scientist (Mid-to-Senior) — Machine Learning & Predictive An…Hourly‐ Posted 3 weeks ago

Predictive Model

SQL

pandas

Data Science

Python

Machine Learning

Python Scikit-Learn

Deep Learning

Predictive Analytics

Data Analysis

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.