AI Voice Agent Engineer for Initial Setup
Worldwide
We are looking for an elite AI infrastructure engineer to build, configure, and aggressively optimize a blazing-fast, open-source voice agent stack. This is an initial paid skill test. We are building advanced, ultra-low-latency voice agents for high-volume outreach and dynamic sales scripts. If you can crack this hybrid streaming challenge and hit our latency targets, there is no place to go but up—you will secure a long-term role as a core engineer on the massive next phase of our infrastructure build-out. The Objective Your goal is to configure a hybrid, fully open-source architecture that achieves a total, end-to-end conversational latency of under 600ms. Everything must stream natively without awkward pauses, buffering, or speech overlap issues. Tech Stack & Architecture Speech-to-Text (STT): Multiple-stream STT setup (e.g., Moonshine). LLM Engine: Hermes 3 8B and Hermes 4 14B. Text-to-Speech (TTS): Voxtrial TTS (must be configured for seamless, continuous streaming). Orchestration & Streaming: LiveKit. Hardware Split: True hybrid deployment. LiveKit must be configured to run lean on the CPU, while all heavy inference services (STT, LLM, TTS) are routed to run on local GPUs (expect to work with high-end hardware like NVIDIA L40s and H100s). Key Deliverables for the Skill Test Installation & Configuration: Deploy the full open-source stack from scratch. Native Streaming: Ensure all components are communicating via native streams rather than chunked, blocking API calls. Latency Optimization: Fix inherent LiveKit latency bottlenecks and resolve any speech truncation or interruption bugs. Total round-trip time (from the moment the user stops speaking to the moment the agent replies) must be heavily optimized to sub-600ms. Why Apply? This is not a one-off, low-impact gig. We have the enterprise hardware and the vision for a massive deployment. We are simply looking for the right engineering talent who knows how to squeeze every millisecond out of these open-source models. To Apply: Please start your proposal by briefly outlining your experience with LiveKit, local LLM/TTS streaming, and exactly how you plan to tackle the sub-600ms end-to-end latency requirement.
$100.00
Fixed-price- IntermediateExperience Level
- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:20 to 50
- Last viewed by client:4 weeks ago
- Hires:1
- Interviewing:0
- Invites sent:1
- Unanswered invites:0
About the client
- PanamaPanama City8:45 AM
- $11K total spent71 hires, 22 active
- 39 hours
- Tech & ITSmall company (2-9 people)
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by