Cinematic Audio Producer (AI Voices + RVC + Emotional FX + Sound Design) Long-Term
Worldwide
We are producing cinematic, multi-character dark-romance audiobooks for a premium entertainment platform.
We are hiring one specialist OR a small team who can create high-quality, emotionally immersive audio using a combination of:
AI TTS
RVC (Real-Time Voice Conversion)
Emotional reaction FX
Cinematic ambience & SFX
Professional mixing
This role requires no human recording.
All voices must be created using AI + RVC.
This job is long-term, with guaranteed monthly work.
MANDATORY TOOLSET
1. One Clean TTS Tool (for generating all base dialogue)
You must use ONE of the following:
Fish Audio
Revoicer
ElevenLabs (allowed, but be aware of line restrictions)
or any similar clean TTS tool that outputs non-distorted speech
2. RVC (Real-Time Voice Conversion v2)
Used to convert all base TTS lines into:
the correct character voice
the correct tone
the correct emotional delivery
consistent voice identity across all chapters
3. Audio Editing Software
Use one of:
Adobe Audition
Audacity
For trimming, noise removal, preparing dialogue.
4. Mixing Software
Use one of:
DaVinci Resolve
Adobe Audition
Adobe Premiere Pro
For full cinematic mixing.
5. SFX Library
Use Soundly or your own high-quality SFX library for:
ambience (subtle)
environment
footsteps
fabric/bed movement
keys, doors
atmosphere layers
VOICE MODELS REQUIREMENT
Character Voice Models (RVC)
We provide:
character descriptions
tone
emotion style
You must create or select RVC models that match each character.
Once created, these voices are reused across many books. Once we have most of the cast like multiple male voices and female voice we eventually wont need to make new ones anymore. Probably after some audiobooks. As we have around 48+ books for the upcoming year that need to be made into audiobooks.
Reaction Voice Models (RVC)
For emotional realism, you must use reaction-focused RVC models to convert safe reaction FX such as:
breaths
gasps
sighs
soft vocal reactions
strained / tension-based vocalizations
emotional vocal textures
These must be layered under dialogue for cinematic performance.
BEST WORKFLOW FOR REACTIONS (MANDATORY)
To keep production fast and realistic, you must
1. Take emotional FX (breaths, gasps, sighs, tension sounds, emotional vocalizations) from a high-quality FX library
2. Run them through the character’s RVC model
3. Layer them under the dialogue during mixing
This produces:
Character-style breaths,emotional gasps, tension exhales etc
All matching the character’s voice identity.
MANDATORY HARDWARE
Windows PC
NVIDIA GPU (RTX 2060 minimum)
Good monitoring headphones
Stable internet
WORKFLOW (Step-by-Step)
1. Base Line Generation (TTS)
Generate each line of dialogue from the script.
2. Convert to Character Voice (RVC)
Every line must be processed through the correct character model.
3. Add Emotional FX (RVC Reaction Layer)
Use safe emotional FX → convert in RVC → layer under main dialogue.
4. Cinematic SFX + Ambience
Add room tone, subtle ambience, movement, object interactions.
5. Mix & Master
Blend all layers
Add EQ, reverb, compression
Export polished chapter audio.
6. Deliver Chapter-by-Chapter
Not all at once.
Ensures perfect quality control.
TIMELINE REQUIREMENT
Our production pace requires a team or a solo freelancer with colleagues.
You must deliver:
2 complete audiobooks every 10 days
Each audiobook:
6–7 hours
40–90 chapters
40k–70k words
fully parsed and ready for production
Monthly pace = 4 audiobooks (minimum 4 guaranteed).
BUDGET
$400 per completed audiobook
Paid per approved audiobook or chapter batch milestone. Whatever you prefer.
Guaranteed Minimum: 4audiobooks/month = $1600
(Teams who deliver faster can earn more.)
Budget Increase After 3 Months
After three months of consistent, high-quality delivery,
rate per audiobook will be increased significantly as part of long-term collaboration.
NDA REQUIREMENT
Before receiving any:
scripts
character references
voice models
production assets
ALL members of your team must sign an NDA.
All materials are confidential and IP of our company.
TO APPLY, PLEASE SEND:
1. Audio samples showing AI TTS → RVC conversion
2. Proof of your mixing ability (cinematic style)
3. Description of your workflow
4. Confirmation you use ALL mandatory tools
5. Hardware specs (GPU required)
6. Your availability per month
7. Whether you are a team or solo with colleagues
8. Confirmation you accept $400/book for first 4 audiobooks + guaranteed monthly work
$1,600.00
Fixed-price- ExpertExperience Level
- Remote Job
- Complex projectProject Type
Skills and Expertise
Activity on this job
- Proposals:Less than 5
- Last viewed by client:6 hours ago
- Interviewing:2
- Invites sent:20
- Unanswered invites:9
About the client
- NLDBergen Op Zoom 8:38 PM
- $3.8K total spent21 hires, 2 active
- Media & EntertainmentSmall company (2-9 people)
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by
Other open jobs by this Client (8)
- Cinematic 2D / 2.5D / 3D Animator + Movie Assembly Using Existing Audiobook AudioFixed-price
- Cinematic Audio Producer (AI Voices + RVC + Emotional FX + Sound Design)Fixed-price
- Cinematic Audio Producer (AI Voices + RVC + Emotional FX + Sound Design) Long-TermFixed-price
- Semi-Realistic 2.5D Cinematic Animator Needed for Full-Length Story Movie (1.5–2 Hours)Fixed-price
- Steamy Romance & Dark-Romance Writer or Writing Team (AI-Assisted + Creative Writing)Fixed-price