AI Voice Model Consultant based on the real experience of Voice AI & Audio Synthesis

Posted 2 weeks ago

Worldwide

Summary

We are a forward-looking team building the next generation of creative AI audio tools. Our mission is to create personalized, high-fidelity AI voice experiences that empower artistic expression. We are seeking a seasoned AI Voice Model Expert to architect and build a groundbreaking system for AI-generated singing and voice interaction.

The Core Project:
Your primary mission will be to design and develop a system that allows a user to sing any song in their own voice. This involves capturing a user's unique vocal tone, timbre, and singing style from recordings and applying it to generate high-quality vocal covers for any chosen song.

Key Responsibilities:
You will be responsible for the end-to-end development lifecycle, from research and prototyping to deployment and MLOps.

1. Voice Cloning & Singing Voice Conversion (SVC/RVC):
-   Research, adapt, and implement state-of-the-art models for voice cloning and singing voice conversion (e.g., RVC, So-VITS-SVC, DiffSinger, DDiFFSinger, VoiceLab).
-   Build robust pipelines for dataset preprocessing, feature extraction (e.g., F0, hubert features), and model training on user-provided voice data.
-   Focus on achieving exceptional quality, capturing nuances like vocal inflection, vibrato, and emotional delivery.

2. Text-to-Speech & Speech-to-Text (TTS/STT):
-   Integrate or develop TTS components for voice agent functionalities, ensuring natural prosody and alignment with the user's cloned voice characteristics.
-   Utilize STT systems for potential lyric alignment, transcription, or interactive voice commands.

3. Voice Agent & Interactive Systems:
-   Design the architecture for a voice agent that can handle user requests (e.g., "generate a cover of Song X," "make the voice sound more powerful").
-   Create a seamless workflow for song input, lyric synchronization, and vocal generation.

4. Music & Lyrics AI:
-   Explore and integrate models for music source separation (e.g., Demucs) to isolate instrumental backing tracks and original vocals.
-   Investigate AI for lyrical analysis, alignment, and potentially even lyric generation or style transfer to match a user's style.

5. MLOps & Engineering Excellence:
-   Architect, build, and maintain scalable ML pipelines for training, fine-tuning, and inference.
-   Implement model versioning, monitoring, and automated retraining pipelines.
-   Containerize models (Docker) and deploy them on scalable cloud infrastructure (e.g., AWS, GCP, Azure).
-   Ensure the entire system is reliable, efficient, and maintainable.

Deliverables & Outcomes:
-   A scalable, cloud-based platform for user voice model training and inference.
-   A well-documented API and/or a simple, user-friendly interface for end-users to generate new vocals.
-   A library of trained, high-quality voice models, starting with the founder's voice.
-   A robust MLOps foundation for continuous improvement and scaling of the AI capabilities.

Must-Have:
    -   5+ years of experience in ML engineering with a strong focus on generative AI and deep learning.
    -   Proven expertise in digital signal processing (DSP) and audio/music processing.
    -   Hands-on experience with SVC/RVC frameworks and a deep understanding of the underlying architectures (e.g., GANs, VAEs, Diffusion Models).
    -   Strong proficiency in Python and core ML libraries (PyTorch, TensorFlow).
    -   Solid experience with MLOps tools (e.g., MLflow, Kubeflow, Weights & Biases) and cloud deployment.
    -   A strong portfolio or examples of past projects in AI voice synthesis, music generation, or a closely related field.
Highly Desirable:
    -   Experience with TTS/STT systems (e.g., Tacotron, WaveNet, Whisper, VALL-E).
    -   Knowledge of music information retrieval (MIR) and lyrics processing.
    -   Experience building interactive voice/AI agents.
    -   Understanding of ethical AI principles, especially concerning voice cloning and deepfakes.

How to Apply:
Please submit the following:
1.  Your resume/CV.
2.  A link to your portfolio, GitHub, or demos of relevant work (e.g., AI-generated vocals, SVC models, deployed ML systems).
3.  A brief cover letter describing your specific approach to building a custom singing voice conversion system, including the tools and models you would prioritize and why.
4.  [Optional] Any open-source contributions or research papers in related fields.

$5.00
Fixed-price
Intermediate
Experience Level
Remote Job
One-time project
Project Type

Skills and Expertise

Mandatory skills

Voice Acting

Voice-Over

Nice-to-have skills

Deep Learning

Model Deployment

Tools

Eleven Labs

Google Cloud AI

Activity on this job

Proposals:Less than 5
Last viewed by client:2 weeks ago
Hires:
1
Interviewing:
0
Invites sent:
0
Unanswered invites:
0

About the client

Member since Aug 31, 2025

United States
Bluefield9:50 AM
$1.5K total spent
132 hires, 13 active

Explore similar jobs on Upwork

Expert AI Vision Engineer for Restaurant Analytics PlatformFixed-price‐ Posted 1 month ago

Object Detection

Python

OpenCV

TensorRT

NVIDIA Jetson

PyTorch

Docker

Git

Analysis using CICDDoS2019 datasetsHourly‐ Posted 1 month ago

Deep Neural Network

Deep Learning

Deep Learning Modeling

TensorFlow

Keras

Neural Network

Python

Machine Learning

Data Interpretation

Data Analysis

Data Visualization

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.