AI Voice Agent Engineer for Initial Setup

Posted 4 weeks ago

Worldwide

Summary

We are looking for an elite AI infrastructure engineer to build, configure, and aggressively optimize a blazing-fast, open-source voice agent stack. This is an initial paid skill test. We are building advanced, ultra-low-latency voice agents for high-volume outreach and dynamic sales scripts. If you can crack this hybrid streaming challenge and hit our latency targets, there is no place to go but up—you will secure a long-term role as a core engineer on the massive next phase of our infrastructure build-out. The Objective Your goal is to configure a hybrid, fully open-source architecture that achieves a total, end-to-end conversational latency of under 600ms. Everything must stream natively without awkward pauses, buffering, or speech overlap issues. Tech Stack & Architecture Speech-to-Text (STT): Multiple-stream STT setup (e.g., Moonshine). LLM Engine: Hermes 3 8B and Hermes 4 14B. Text-to-Speech (TTS): Voxtrial TTS (must be configured for seamless, continuous streaming). Orchestration & Streaming: LiveKit. Hardware Split: True hybrid deployment. LiveKit must be configured to run lean on the CPU, while all heavy inference services (STT, LLM, TTS) are routed to run on local GPUs (expect to work with high-end hardware like NVIDIA L40s and H100s). Key Deliverables for the Skill Test Installation & Configuration: Deploy the full open-source stack from scratch. Native Streaming: Ensure all components are communicating via native streams rather than chunked, blocking API calls. Latency Optimization: Fix inherent LiveKit latency bottlenecks and resolve any speech truncation or interruption bugs. Total round-trip time (from the moment the user stops speaking to the moment the agent replies) must be heavily optimized to sub-600ms. Why Apply? This is not a one-off, low-impact gig. We have the enterprise hardware and the vision for a massive deployment. We are simply looking for the right engineering talent who knows how to squeeze every millisecond out of these open-source models. To Apply: Please start your proposal by briefly outlining your experience with LiveKit, local LLM/TTS streaming, and exactly how you plan to tackle the sub-600ms end-to-end latency requirement.

$100.00
Fixed-price
Intermediate
Experience Level
Remote Job
Ongoing project
Project Type

Skills and Expertise

Mandatory skills

AI Development

AI Instruction

Activity on this job

Proposals:20 to 50
Last viewed by client:4 weeks ago
Hires:
1
Interviewing:
0
Invites sent:
1
Unanswered invites:
0

About the client

Member since Sep 19, 2020

Panama
Panama City8:45 AM
$11K total spent
71 hires, 22 active
39 hours
Tech & IT
Small company (2-9 people)

Explore similar jobs on Upwork

LLM and Prompt Engineering ExpertHourly‐ Posted 4 weeks ago

Python

LLM Prompt Engineering

Need Python/SQL/AI Tutor: Build multi Agent scrapper & matching s…Fixed-price‐ Posted 1 month ago

Python

Machine Learning

AI Agent Development

OpenAI Codex

Agent GPT

Artificial Intelligence

Data Science

Data Analysis

Data Scraping

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.