Cinematic Audio Producer (AI Voices + RVC + Emotional FX + Sound Design)

Posted 1 hour ago

Worldwide

Summary

Long-Term — Premium Dark-Romance Audiobooks

We are producing cinematic, multi-character dark-romance audiobooks for a premium entertainment platform.
We are hiring one specialist OR a small team who can create high-quality, emotionally immersive audio using a combination of:

AI TTS

RVC (Real-Time Voice Conversion)

Emotional reaction FX

Cinematic ambience & SFX

Professional mixing


This role requires no human recording.
All voices must be created using AI + RVC.

This job is long-term, with guaranteed monthly work.


---

MANDATORY TOOLSET

1. Clean TTS Tool (for generating base dialogue)

You must use ONE of the following:

Fish Audio

Revoicer

ElevenLabs (allowed, but aware of line restrictions)

Or any similar clean TTS tool with non-distorted speech


2. RVC (Real-Time Voice Conversion v2)

Used to convert all base TTS lines into:

the correct character voice

the correct tone

the correct emotional delivery

consistent identity across all chapters


3. Audio Editing Software

Use one of:

Adobe Audition

Audacity


4. Mixing Software

Use one of:

DaVinci Resolve

Adobe Audition

Adobe Premiere Pro


5. SFX Library

Use Soundly or your own high-quality SFX library for:

ambience (subtle)

environment

footsteps

fabric/bed movement

keys, doors

atmosphere layers



---

VOICE MODEL REQUIREMENTS

Character Voice Models (RVC)

We provide:

character descriptions

tone

emotion style


You must create or select RVC models that match each character.

Once created, these voices are reused across many books. After a few audiobooks, we will no longer need to create new ones since we have 48+ books in the production pipeline.

Reaction Voice Models (RVC)

For emotional realism, you must convert safe reaction FX such as:

breaths

gasps

sighs

soft vocal reactions

strained/tension sounds

emotional vocal textures


These must be layered under the dialogue.


---

MANDATORY WORKFLOW FOR REACTIONS

1. Take emotional FX (breaths, gasps, tension sounds, etc.)


2. Run them through the character’s RVC model


3. Layer them under the dialogue



This ensures:

character-style breaths

emotional realism

matching vocal identity



---

MANDATORY HARDWARE

Windows PC

NVIDIA GPU (RTX 2060 minimum)

Good monitoring headphones

Stable internet



---

WORKFLOW (Step-by-Step)

1. Base Line Generation (TTS)


2. Convert to Character Voice (RVC)


3. Add Emotional FX (RVC Reaction Layer)


4. Cinematic SFX + Ambience


5. Mix & Master


6. Deliver Chapter-by-Chapter (for quality control)




---

UPDATED TIMELINE / DELIVERY OPTIONS

We now accept:

Option A — Solo Specialist

1 audiobook every 14 days
(≈ 6–7 hours, 40–90 chapters)
= 2 audiobooks per month maximum


OR

1 audiobook per month
(if you prefer a lower workload)


Option B — Teams / Specialists With Colleagues

Faster delivery is welcome

Can deliver up to 4 audiobooks per month


Both options are acceptable as long as the quality is consistent.


---

BUDGET

$400 per completed audiobook

Paid per approved:

full audiobook
or

chapter batch milestone


Guaranteed Minimum:

Solo worker (1 per month): $400/month

Solo worker (2 per month): $800/month

Teams (4 per month): $1600/month guaranteed


Budget Increase After 3 Months

After three months of consistent, high-quality delivery,
the rate per audiobook will increase significantly.


---

NDA REQUIREMENT

Before receiving:

scripts

character references

voice models

production assets


ALL team members must sign an NDA.


---

TO APPLY, PLEASE SEND:

1. Audio samples showing AI TTS → RVC conversion


2. Proof of your cinematic-style mixing


3. Description of your workflow


4. Confirmation you use ALL mandatory tools


5. Hardware specs (GPU required)


6. Your monthly availability


7. Whether you are solo or team


8. Confirmation you accept $400/book  + guaranteed monthly work

  • $1,600.00

    Fixed-price
  • Expert
    Experience Level
  • Remote Job
  • Ongoing project
    Project Type
Skills and Expertise
Mandatory skills
AI TTS generation
Real-Time Voice Conversion
Activity on this job
  • Proposals:5 to 10
  • Last viewed by client:1 hour ago
  • Interviewing:
    2
  • Invites sent:
    19
  • Unanswered invites:
    16
About the client
Member since Apr 9, 2025
  • NLD
    Bergen Op Zoom 8:44 PM
  • $3.8K total spent
    21 hires, 2 active
  • Media & Entertainment
    Small company (2-9 people)

Explore similar jobs on Upwork

Italian Voice Needed for AI Cloning ProjectFixed-price‐ Posted 2 weeks ago
English
Italian
Suno AI Music Creator for YouTube ChannelFixed-price‐ Posted 6 days ago
YouTube Marketing
YouTube Development
Music
Music Production

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo