AI Voice Model Consultant based on the real experience of Voice AI & Audio Synthesis
Worldwide
We are a forward-looking team building the next generation of creative AI audio tools. Our mission is to create personalized, high-fidelity AI voice experiences that empower artistic expression. We are seeking a seasoned AI Voice Model Expert to architect and build a groundbreaking system for AI-generated singing and voice interaction.
The Core Project:
Your primary mission will be to design and develop a system that allows a user to sing any song in their own voice. This involves capturing a user's unique vocal tone, timbre, and singing style from recordings and applying it to generate high-quality vocal covers for any chosen song.
Key Responsibilities:
You will be responsible for the end-to-end development lifecycle, from research and prototyping to deployment and MLOps.
1. Voice Cloning & Singing Voice Conversion (SVC/RVC):
- Research, adapt, and implement state-of-the-art models for voice cloning and singing voice conversion (e.g., RVC, So-VITS-SVC, DiffSinger, DDiFFSinger, VoiceLab).
- Build robust pipelines for dataset preprocessing, feature extraction (e.g., F0, hubert features), and model training on user-provided voice data.
- Focus on achieving exceptional quality, capturing nuances like vocal inflection, vibrato, and emotional delivery.
2. Text-to-Speech & Speech-to-Text (TTS/STT):
- Integrate or develop TTS components for voice agent functionalities, ensuring natural prosody and alignment with the user's cloned voice characteristics.
- Utilize STT systems for potential lyric alignment, transcription, or interactive voice commands.
3. Voice Agent & Interactive Systems:
- Design the architecture for a voice agent that can handle user requests (e.g., "generate a cover of Song X," "make the voice sound more powerful").
- Create a seamless workflow for song input, lyric synchronization, and vocal generation.
4. Music & Lyrics AI:
- Explore and integrate models for music source separation (e.g., Demucs) to isolate instrumental backing tracks and original vocals.
- Investigate AI for lyrical analysis, alignment, and potentially even lyric generation or style transfer to match a user's style.
5. MLOps & Engineering Excellence:
- Architect, build, and maintain scalable ML pipelines for training, fine-tuning, and inference.
- Implement model versioning, monitoring, and automated retraining pipelines.
- Containerize models (Docker) and deploy them on scalable cloud infrastructure (e.g., AWS, GCP, Azure).
- Ensure the entire system is reliable, efficient, and maintainable.
Deliverables & Outcomes:
- A scalable, cloud-based platform for user voice model training and inference.
- A well-documented API and/or a simple, user-friendly interface for end-users to generate new vocals.
- A library of trained, high-quality voice models, starting with the founder's voice.
- A robust MLOps foundation for continuous improvement and scaling of the AI capabilities.
Must-Have:
- 5+ years of experience in ML engineering with a strong focus on generative AI and deep learning.
- Proven expertise in digital signal processing (DSP) and audio/music processing.
- Hands-on experience with SVC/RVC frameworks and a deep understanding of the underlying architectures (e.g., GANs, VAEs, Diffusion Models).
- Strong proficiency in Python and core ML libraries (PyTorch, TensorFlow).
- Solid experience with MLOps tools (e.g., MLflow, Kubeflow, Weights & Biases) and cloud deployment.
- A strong portfolio or examples of past projects in AI voice synthesis, music generation, or a closely related field.
Highly Desirable:
- Experience with TTS/STT systems (e.g., Tacotron, WaveNet, Whisper, VALL-E).
- Knowledge of music information retrieval (MIR) and lyrics processing.
- Experience building interactive voice/AI agents.
- Understanding of ethical AI principles, especially concerning voice cloning and deepfakes.
How to Apply:
Please submit the following:
1. Your resume/CV.
2. A link to your portfolio, GitHub, or demos of relevant work (e.g., AI-generated vocals, SVC models, deployed ML systems).
3. A brief cover letter describing your specific approach to building a custom singing voice conversion system, including the tools and models you would prioritize and why.
4. [Optional] Any open-source contributions or research papers in related fields.
$5.00
Fixed-price- IntermediateExperience Level
- Remote Job
- One-time projectProject Type
Skills and Expertise
Activity on this job
- Proposals:Less than 5
- Last viewed by client:2 weeks ago
- Hires:1
- Interviewing:0
- Invites sent:0
- Unanswered invites:0
About the client
- United StatesBluefield9:50 AM
- $1.5K total spent132 hires, 13 active
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by
Other open jobs by this Client (94)
- WordPress website development(WordPress website developer) requiredFixed-price
- DMCA Assistant Needed for Leaked Content RemovalFixed-price
- Automation Tester Needed – Must Have Strong Experience with Playwright Script WritingFixed-price
- Video Editor YouTube & Social Media Viral Shorts Reels Editing & AdsFixed-price
- Wix Landing Page Designer Needed to Create a Clean One-Page SiteFixed-price