Senior Audio ML Engineer (Long Term Contract)
Worldwide
We're a software consulting firm staffing a specialized audio ML role for one of our clients — a profitable consumer audio product offering creative tools for musicians and content creators: voice conversion, AI music generation, and stem/vocal separation. The product is built entirely on widely used open-source models the client deploys and operates themselves, and serves tens of thousands of paying users with seven-figure ARR. All voice features are consent-based and user-driven — creators build and use their own voice models for music production. What's missing is engineering maturity in the AI layer: no test suite, no evaluation harness, no objective quality measurement. That's the gap this role closes. You'd be embedded as the senior AI authority on the client's audio stack. The work ships to live customers weekly — this is not research-only, and not a junior role. Scope of the engagement The roadmap is already concrete: resolve known audio-quality issues, stand up evaluation infrastructure and a first test suite, run an open-source stack benchmark (voice and music-generation alternatives), and drive a build-vs-buy decision on training a proprietary voice model from the client's existing consented user corpus. You'll be a strong fit if you can say yes to most of these - 5+ years shipping production ML, with 2+ years in audio/speech specifically - Deep familiarity with the voice-conversion model family — F0 extractors (RMVPE / PM / Crepe), the index_rate and protect tradeoffs, what a clean 30-minute training run looks like - Hands-on with Meta's AudioCraft / MusicGen, including the continuation API and melody variant - Source separation at a working level (Demucs / UVR ecosystem) - Serverless GPU inference in production — Modal ideally, or RunPod / Replicate / Beam / Banana and ready to pick up Modal fast - Python is a genuine strength: FastAPI, PyTorch, asyncio, and the audio stack (librosa, torchaudio, soundfile) - You've built evaluation rigs before — FAD, SI-SDR, speaker similarity, MOS panels - You've cut GPU costs through batching, keep-warm tuning, and cold-start mitigation - Give you a paper and a repo and you'll have a working benchmark inside a week Bonus points - You've trained a foundation audio model, not just fine-tuned one (VITS family, HuBERT, EnCodec) LoRA / PEFT for audio domain adaptation - Enough React / Next.js to trace a bug through the whole customer flow - Comfort around Cloudflare R2, Supabase, and Stripe billing - A real audio-engineering ear — actual DAW work, not just ML - Open-source contributions to any of the projects above
- More than 30 hrs/weekHourly
- 3-6 monthsDuration
- ExpertExperience Level
$20.00
-
$30.00
Hourly- Remote Job
- Ongoing projectProject Type
Skills and Expertise
Activity on this job
- Proposals:20 to 50
- Last viewed by client:4 weeks ago
- Hires:1
- Interviewing:1
- Invites sent:0
- Unanswered invites:0
About the client
- USAValrico7:19 AM
- $472 total spent5 hires, 0 active
- 2 hours
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by