Senior Audio ML Engineer (Long Term Contract)

Posted 4 weeks ago

Worldwide

Summary

We're a software consulting firm staffing a specialized audio ML role for one of our clients — a profitable consumer audio product offering creative tools for musicians and content creators: voice conversion, AI music generation, and stem/vocal separation. The product is built entirely on widely used open-source models the client deploys and operates themselves, and serves tens of thousands of paying users with seven-figure ARR. All voice features are consent-based and user-driven — creators build and use their own voice models for music production. What's missing is engineering maturity in the AI layer: no test suite, no evaluation harness, no objective quality measurement. That's the gap this role closes. You'd be embedded as the senior AI authority on the client's audio stack. The work ships to live customers weekly — this is not research-only, and not a junior role. Scope of the engagement The roadmap is already concrete: resolve known audio-quality issues, stand up evaluation infrastructure and a first test suite, run an open-source stack benchmark (voice and music-generation alternatives), and drive a build-vs-buy decision on training a proprietary voice model from the client's existing consented user corpus. You'll be a strong fit if you can say yes to most of these - 5+ years shipping production ML, with 2+ years in audio/speech specifically - Deep familiarity with the voice-conversion model family — F0 extractors (RMVPE / PM / Crepe), the index_rate and protect tradeoffs, what a clean 30-minute training run looks like - Hands-on with Meta's AudioCraft / MusicGen, including the continuation API and melody variant - Source separation at a working level (Demucs / UVR ecosystem) - Serverless GPU inference in production — Modal ideally, or RunPod / Replicate / Beam / Banana and ready to pick up Modal fast - Python is a genuine strength: FastAPI, PyTorch, asyncio, and the audio stack (librosa, torchaudio, soundfile) - You've built evaluation rigs before — FAD, SI-SDR, speaker similarity, MOS panels - You've cut GPU costs through batching, keep-warm tuning, and cold-start mitigation - Give you a paper and a repo and you'll have a working benchmark inside a week Bonus points - You've trained a foundation audio model, not just fine-tuned one (VITS family, HuBERT, EnCodec) LoRA / PEFT for audio domain adaptation - Enough React / Next.js to trace a bug through the whole customer flow - Comfort around Cloudflare R2, Supabase, and Stripe billing - A real audio-engineering ear — actual DAW work, not just ML - Open-source contributions to any of the projects above

More than 30 hrs/week
Hourly
3-6 months
Duration
Expert
Experience Level
$20.00
-
$30.00
Hourly
Remote Job
Ongoing project
Project Type

Contract-to-hire opportunity

This lets talent know that this job could become full time.
Learn more

Skills and Expertise

Mandatory skills

Machine Learning

Deep Learning

Activity on this job

Proposals:20 to 50
Last viewed by client:4 weeks ago
Hires:
1
Interviewing:
1
Invites sent:
0
Unanswered invites:
0

About the client

Member since Mar 18, 2026

USA
Valrico7:19 AM
$472 total spent
5 hires, 0 active
2 hours

Explore similar jobs on Upwork

Quantum Computing Consultant – High-Dimensional Combinatorial Opt…Hourly‐ Posted 3 weeks ago

Quantum Computing

Data Scientist (Mid-to-Senior) — Machine Learning & Predictive An…Hourly‐ Posted 4 weeks ago

Predictive Model

SQL

pandas

Data Science

Python

Machine Learning

Python Scikit-Learn

Deep Learning

Predictive Analytics

Data Analysis

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.