AI Voice Agent Engineer for Initial Setup

Posted 4 weeks ago

Worldwide

Summary

We are looking for an elite AI infrastructure engineer to build, configure, and aggressively optimize a blazing-fast, open-source voice agent stack. This is an initial paid skill test. We are building advanced, ultra-low-latency voice agents for high-volume outreach and dynamic sales scripts. If you can crack this hybrid streaming challenge and hit our latency targets, there is no place to go but up—you will secure a long-term role as a core engineer on the massive next phase of our infrastructure build-out. The Objective Your goal is to configure a hybrid, fully open-source architecture that achieves a total, end-to-end conversational latency of under 600ms. Everything must stream natively without awkward pauses, buffering, or speech overlap issues. Tech Stack & Architecture Speech-to-Text (STT): Multiple-stream STT setup (e.g., Moonshine). LLM Engine: Hermes 3 8B and Hermes 4 14B. Text-to-Speech (TTS): Voxtrial TTS (must be configured for seamless, continuous streaming). Orchestration & Streaming: LiveKit. Hardware Split: True hybrid deployment. LiveKit must be configured to run lean on the CPU, while all heavy inference services (STT, LLM, TTS) are routed to run on local GPUs (expect to work with high-end hardware like NVIDIA L40s and H100s). Key Deliverables for the Skill Test Installation & Configuration: Deploy the full open-source stack from scratch. Native Streaming: Ensure all components are communicating via native streams rather than chunked, blocking API calls. Latency Optimization: Fix inherent LiveKit latency bottlenecks and resolve any speech truncation or interruption bugs. Total round-trip time (from the moment the user stops speaking to the moment the agent replies) must be heavily optimized to sub-600ms. Why Apply? This is not a one-off, low-impact gig. We have the enterprise hardware and the vision for a massive deployment. We are simply looking for the right engineering talent who knows how to squeeze every millisecond out of these open-source models. To Apply: Please start your proposal by briefly outlining your experience with LiveKit, local LLM/TTS streaming, and exactly how you plan to tackle the sub-600ms end-to-end latency requirement.

  • $100.00

    Fixed-price
  • Intermediate
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
AI Development
AI Instruction
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:4 weeks ago
  • Hires:
    1
  • Interviewing:
    0
  • Invites sent:
    1
  • Unanswered invites:
    0
About the client
Member since Sep 19, 2020
  • Panama
    Panama City8:45 AM
  • $11K total spent
    71 hires, 22 active
  • 39 hours
  • Tech & IT
    Small company (2-9 people)

Explore similar jobs on Upwork

LLM and Prompt Engineering ExpertHourly‐ Posted 4 weeks ago
Python
LLM Prompt Engineering
Python
Machine Learning
AI Agent Development
OpenAI Codex
Agent GPT
Artificial Intelligence
Data Science
Data Analysis
Data Scraping

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo