Senior Voice AI & GPU Engineer
Worldwide
Job Description: We are looking for a world-class Voice AI Engineer with deep NVIDIA GPU optimization experience to help us build and tune an ultra-low latency, conversational voice agent. This isn't a job for someone who just hooks up standard wrapper APIs. We need a specialist who understands audio streaming protocols, knows how to squeeze every millisecond out of a GPU pipeline, and can gracefully handle complex conversational problems like user interruptions (barge-ins) and awkward silences. If you have built end-to-end voice bots combining Speech-to-Text (STT) ➔ LLM ➔ Text-to-Speech (TTS) and deployed them on bare-metal or cloud GPUs, we want to talk to you. What You’ll Be Doing: End-to-End Voice Architecture: Architecting and deploying real-time voice agents using pipelines like Whisper/Deepgram (STT), open-source or API LLMs, and ElevenLabs/Cartesia/Kokoro (TTS). GPU Infrastructure & Performance: Setting up and optimizing models on NVIDIA hardware using frameworks like TensorRT-LLM, vLLM, or Triton to minimize Time-to-First-Token (TTFT). Streaming Audio Optimization: Handling WebRTC, LiveKit, or custom WebSocket pipelines to manage real-time audio bidirectional streams. Prompt Engineering & Conversational Tuning: Crafting bulletproof system prompts and logic engines to prevent hallucinations, reduce token waste, and keep the bot strictly on-script. Technical Stack We're Looking For: Languages: Python (Advanced) and C++ (for custom audio/inference bindings). GPU Architecture: Deep familiarity with NVIDIA CUDA ecosystem, quantization techniques (AWQ, GPTQ, GGUF), and vLLM orchestration. Audio Engineering: Real-time VAD (Voice Activity Detection), handling latency, packet loss, and user interruptions natively. ⚠️ HOW TO APPLY (Strict Screening Requirements): Generic, copy-pasted templates will be declined immediately. Please start your application with the phrase "Low Latency Audio" so we know you read this. To be considered, your proposal must answer these 4 questions regarding a project you have previously built: Show Us Your Work: Provide a brief overview of an advanced Voice AI project you’ve completed. Please attach/link a GitHub repo, a video demo, or a detailed technical breakdown of your code. The Blueprint: What exact models/engines did you choose for STT, LLM, and TTS in that project, and why? Prompting Strategy: How did you optimize your system prompts and structural logic to force the LLM to reply concisely (saving tokens and lowering latency) while keeping a natural verbal cadence? The Battle Scars: How did you solve the following three challenges in your build? Latency: What exact tricks did you use to drop the turnaround time to human-like levels? Pausing & Interruptions: How did you handle user barge-in so the bot stops speaking the moment the human cuts in? GPU Bottlenecks: What hardware/VRAM bottlenecks did you hit while streaming the audio pipeline, and how did you resolve them?
- More than 30 hrs/weekHourly
- 6+ monthsDuration
- ExpertExperience Level
- Remote Job
- Complex projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:4 weeks ago
- Hires:2
- Interviewing:35
- Invites sent:71
- Unanswered invites:21
About the client
- PanamaPanama City4:56 AM
- $11K total spent71 hires, 22 active
- 39 hours
- Tech & ITSmall company (2-9 people)
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by