Off-Robot Inference Engineer

Posted 3 weeks ago

Worldwide

Summary

Off-Robot Inference Engineer The Role Dubai-based robotics company deploying a robot fleet into commercial super-user sites starting this summer. We run a tiered inference architecture. Low-latency work (motor control, SLAM, obstacle avoidance) runs on-robot. The heavy layer (VLA, LLM, speech, multi-camera reasoning) runs on a near-edge NVIDIA DGX Spark we're standing up now. This hire supports the off-robot side of that split. You will build the inference stack, benchmark the model layer on real hardware, and put structured numbers on the table so the team stops estimating and starts measuring. What You'll Actually Do Stand up DGX Spark as a production inference server. Ubuntu on Grace Blackwell, CUDA, TensorRT-LLM or Triton. Multi-model serving, KV-cache budgeting, health checks, metrics. Benchmark the Phase 1 model stack against real load. UnifoLM 7.5B, GR00T N1.5 3B, Qwen 2.5 3B, Whisper large-v3, YOLO26m-seg. Precision sweeps (FP16, INT8, FP4). Tokens per second, latency distribution at p50/p95/p99, memory headroom under concurrent load. Numbers end with units. Build the routing layer between the robots and the near-edge box. Decide what runs where, handle cloud fallback for the full 7B UnifoLM when connectivity is there, structure the telemetry we capture off every deployment. This scaffolding will be owned and extended by the core team; you build it clean and documented. Validate Isaac Sim on ARM. PhysX GPU is broken on GB10 Blackwell. Newton physics is the documented workaround. Get sim-to-real working for our simplest behavior before the humanoid arrives. Produce technical briefs when the team needs a call made. Gemini Robotics-ER 1.6 as the fleet supervisor layer. OpenMind OM1 as tool or threat. Qwen 2.5 vs Llama 3.2 for Arabic deployment. Briefs end in measurements and a recommendation, not summaries. What We Need Strong Python. C++ for ROS 2 nodes and latency-sensitive paths. Linux, CUDA, Docker, ROS 2. Production model-serving experience. vLLM, TensorRT-LLM, Triton, or equivalent. Not a tutorial. You've served multiple models concurrently, debugged OOM under load, owned a p99 latency target. Benchmarking discipline. You measure, you don't estimate. If the table has LOW or MED confidence on a line item with published weights, you treat that as a bug to fix. Writing that respects the reader's time. No filler. You notice CC BY-NC-SA 4.0 and flag it before it becomes a commercial problem. Nice to Have Isaac Sim hands-on (x86 is fine — ARM quirks are documented). NVIDIA Jetson hands-on (Orin NX, AGX Thor). Unitree SDK (Go2, G1). Published or open-source work in VLA, model serving, or robotics infrastructure. Do Not Apply If You Want A firmware role. A full-stack web role. A PhD research seat where you pick your own problem. Commitment 10-15 hours per week. Fully remote, any timezone. Path to expanded scope, higher rate, etc. First Deliverable Before hardware lands, produce a benchmark plan and harness for the Phase 1 model stack on DGX Spark. Precision sweeps, load profiles, memory accounting. One page of methodology, one page of expected numbers based on published benchmarks, one page of what we'll actually measure when the box boots. This is the filter.

  • More than 30 hrs/week
    Hourly
  • 6+ months
    Duration
  • Intermediate
    Experience Level
  • $25.00

    -

    $40.00

    Hourly
  • Remote Job
  • Ongoing project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
C++
ros 2
Linux
Docker
Python
Activity on this job
  • Proposals:10 to 15
  • Last viewed by client:last week
  • Hires:
    1
  • Interviewing:
    2
  • Invites sent:
    0
  • Unanswered invites:
    0
About the client
Member since Mar 24, 2026
  • ARE
    Dubai5:34 PM
  • $7.7K total spent
    2 hires, 2 active
  • 245 hours

Explore similar jobs on Upwork

Mechanical Engineering
Electrical Engineering
Optics
AR Filters & Lenses
ESP32
Hardware Design
Multilayer PCB
AVR Microcontroller
RF Design
Electronic Circuit Design
Automotive Design

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo