Off-Robot Inference Engineer

Posted 3 weeks ago

Worldwide

Summary

Off-Robot Inference Engineer The Role Dubai-based robotics company deploying a robot fleet into commercial super-user sites starting this summer. We run a tiered inference architecture. Low-latency work (motor control, SLAM, obstacle avoidance) runs on-robot. The heavy layer (VLA, LLM, speech, multi-camera reasoning) runs on a near-edge NVIDIA DGX Spark we're standing up now. This hire supports the off-robot side of that split. You will build the inference stack, benchmark the model layer on real hardware, and put structured numbers on the table so the team stops estimating and starts measuring. What You'll Actually Do Stand up DGX Spark as a production inference server. Ubuntu on Grace Blackwell, CUDA, TensorRT-LLM or Triton. Multi-model serving, KV-cache budgeting, health checks, metrics. Benchmark the Phase 1 model stack against real load. UnifoLM 7.5B, GR00T N1.5 3B, Qwen 2.5 3B, Whisper large-v3, YOLO26m-seg. Precision sweeps (FP16, INT8, FP4). Tokens per second, latency distribution at p50/p95/p99, memory headroom under concurrent load. Numbers end with units. Build the routing layer between the robots and the near-edge box. Decide what runs where, handle cloud fallback for the full 7B UnifoLM when connectivity is there, structure the telemetry we capture off every deployment. This scaffolding will be owned and extended by the core team; you build it clean and documented. Validate Isaac Sim on ARM. PhysX GPU is broken on GB10 Blackwell. Newton physics is the documented workaround. Get sim-to-real working for our simplest behavior before the humanoid arrives. Produce technical briefs when the team needs a call made. Gemini Robotics-ER 1.6 as the fleet supervisor layer. OpenMind OM1 as tool or threat. Qwen 2.5 vs Llama 3.2 for Arabic deployment. Briefs end in measurements and a recommendation, not summaries. What We Need Strong Python. C++ for ROS 2 nodes and latency-sensitive paths. Linux, CUDA, Docker, ROS 2. Production model-serving experience. vLLM, TensorRT-LLM, Triton, or equivalent. Not a tutorial. You've served multiple models concurrently, debugged OOM under load, owned a p99 latency target. Benchmarking discipline. You measure, you don't estimate. If the table has LOW or MED confidence on a line item with published weights, you treat that as a bug to fix. Writing that respects the reader's time. No filler. You notice CC BY-NC-SA 4.0 and flag it before it becomes a commercial problem. Nice to Have Isaac Sim hands-on (x86 is fine — ARM quirks are documented). NVIDIA Jetson hands-on (Orin NX, AGX Thor). Unitree SDK (Go2, G1). Published or open-source work in VLA, model serving, or robotics infrastructure. Do Not Apply If You Want A firmware role. A full-stack web role. A PhD research seat where you pick your own problem. Commitment 10-15 hours per week. Fully remote, any timezone. Path to expanded scope, higher rate, etc. First Deliverable Before hardware lands, produce a benchmark plan and harness for the Phase 1 model stack on DGX Spark. Precision sweeps, load profiles, memory accounting. One page of methodology, one page of expected numbers based on published benchmarks, one page of what we'll actually measure when the box boots. This is the filter.

More than 30 hrs/week
Hourly
6+ months
Duration
Intermediate
Experience Level
$25.00
-
$40.00
Hourly
Remote Job
Ongoing project
Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more

Skills and Expertise

Mandatory skills

C++

ros 2

Linux

Docker

Python

Activity on this job

Proposals:10 to 15
Last viewed by client:last week
Hires:
1
Interviewing:
2
Invites sent:
0
Unanswered invites:
0

About the client

Member since Mar 24, 2026

ARE
Dubai5:34 PM
$7.7K total spent
2 hires, 2 active
245 hours

Explore similar jobs on Upwork

Optics Expert Needed for Custom Lens Camera AssemblyHourly‐ Posted 3 weeks ago

Mechanical Engineering

Electrical Engineering

Optics

AR Filters & Lenses

Looking for a professional automotive expert experience in CAN LI…Hourly‐ Posted 3 weeks ago

ESP32

Hardware Design

Multilayer PCB

AVR Microcontroller

RF Design

Electronic Circuit Design

Automotive Design

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.