Real-Time Voice Conversion & Low-Latency Streaming

Posted last month

Worldwide

Summary

## Overview We are looking for a highly skilled AI Audio Engineer to take full ownership of the design, implementation, and deployment of a production-grade real-time voice conversion system. The objective is to enable a speaker's voice to be transformed in real time into a target voice during professional online meetings, while maintaining natural conversation flow, low latency, and high reliability. This is not a research project. The goal is to build a robust solution that can be used regularly in production environments. The final system should: - Convert Speaker A's voice into Voice X in real time. - Work at the operating system level on Windows, using the computer's microphone and audio devices. (Operate under Teams, google meet, Virtual Machines, and anywhere). - Be reliable enough for repeated professional use. - Maintain natural conversation quality. - Keep latency below approximately 300 ms whenever possible. - Require minimal intervention from the end user. ## Existing Assets * Several hours of high-quality recordings of the target voice are available. * The target language is English. * The primary environment is Windows. * One user will operate the system. ## Scope of Work The selected freelancer will be expected to: ### Architecture & Technical Design * Evaluate existing voice cloning and voice conversion technologies. * Recommend the most suitable architecture. * Identify technical risks, limitations, and mitigation strategies. * Define the fastest path toward a production-ready solution. ### Prototype Development * Build a proof of concept demonstrating real-time voice conversion. * Measure latency, stability, and voice quality. * Test compatibility with Microsoft Teams inside a Virtual Machine ### Production Implementation * Improve reliability and audio quality. * Optimize latency. * Implement all required audio routing and virtual device configurations. * Deliver a solution suitable for regular professional usage. ### Troubleshooting & Ownership We are specifically looking for someone who enjoys solving difficult technical problems and taking ownership. The ideal candidate should be comfortable dealing with: * Audio routing challenges * Windows audio stack issues * Virtual microphone configurations * Real-time streaming constraints * Latency optimization * Voice model tuning * Unexpected production issues The expectation is not simply to write code or find the right technology, but to make the project successful regardless of obstacles encountered along the way. ## Required Experience Strong experience with some of the following: * Real-time voice conversion * Voice cloning * Speech-to-speech AI systems * Audio DSP * WebRTC * Low-latency streaming systems * Windows audio systems * Python (if required) * Machine Learning (if required) * LLM and speech technologies Experience with production-grade audio applications is highly preferred. ## Deliverables * Working prototype * Production-ready implementation * Installation and deployment instructions * Technical documentation * Risk assessment and recommendations * Ongoing support during stabilization phase ## Ideal Profile We are looking for a builder and troubleshooter rather than a researcher. Someone who: * Takes full ownership. * Is highly pragmatic. * Finds solutions instead of blockers. * Can challenge assumptions and provide honest recommendations. * Is comfortable making difficult projects work in real-world conditions.

  • Less than 30 hrs/week
    Hourly
  • 1-3 months
    Duration
  • Expert
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more
Skills and Expertise
Mandatory skills
AI Model Integration
Audio Engineering
Activity on this job
  • Proposals:20 to 50
  • Last viewed by client:4 weeks ago
  • Interviewing:
    33
  • Invites sent:
    37
  • Unanswered invites:
    7
About the client
Member since Feb 11, 2023
  • Morocco
    Rabat5:40 PM
  • $20K total spent
    77 hires, 16 active
  • 150 hours
  • Tech & IT
    Small company (2-9 people)

Explore similar jobs on Upwork

Noloco specialistHourly‐ Posted 3 weeks ago
Ai Agentic agentsFixed-price‐ Posted 1 month ago
Artificial Intelligence
Machine Learning
Natural Language Processing

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo