Cinematic Audio Producer (AI Voices + RVC + Emotional FX + Sound Design)

Posted 1 hour ago

Worldwide

Summary

Long-Term — Premium Dark-Romance Audiobooks

We are producing cinematic, multi-character dark-romance audiobooks for a premium entertainment platform.
We are hiring one specialist OR a small team who can create high-quality, emotionally immersive audio using a combination of:

AI TTS

RVC (Real-Time Voice Conversion)

Emotional reaction FX

Cinematic ambience & SFX

Professional mixing

This role requires no human recording.
All voices must be created using AI + RVC.

This job is long-term, with guaranteed monthly work.

---

MANDATORY TOOLSET

1. Clean TTS Tool (for generating base dialogue)

You must use ONE of the following:

Fish Audio

Revoicer

ElevenLabs (allowed, but aware of line restrictions)

Or any similar clean TTS tool with non-distorted speech

2. RVC (Real-Time Voice Conversion v2)

Used to convert all base TTS lines into:

the correct character voice

the correct tone

the correct emotional delivery

consistent identity across all chapters

3. Audio Editing Software

Use one of:

Adobe Audition

Audacity

4. Mixing Software

Use one of:

DaVinci Resolve

Adobe Audition

Adobe Premiere Pro

5. SFX Library

Use Soundly or your own high-quality SFX library for:

ambience (subtle)

environment

footsteps

fabric/bed movement

keys, doors

atmosphere layers

---

VOICE MODEL REQUIREMENTS

Character Voice Models (RVC)

We provide:

character descriptions

tone

emotion style

You must create or select RVC models that match each character.

Once created, these voices are reused across many books. After a few audiobooks, we will no longer need to create new ones since we have 48+ books in the production pipeline.

Reaction Voice Models (RVC)

For emotional realism, you must convert safe reaction FX such as:

breaths

gasps

sighs

soft vocal reactions

strained/tension sounds

emotional vocal textures

These must be layered under the dialogue.

---

MANDATORY WORKFLOW FOR REACTIONS

1. Take emotional FX (breaths, gasps, tension sounds, etc.)

2. Run them through the character’s RVC model

3. Layer them under the dialogue

This ensures:

character-style breaths

emotional realism

matching vocal identity

---

MANDATORY HARDWARE

Windows PC

NVIDIA GPU (RTX 2060 minimum)

Good monitoring headphones

Stable internet

---

WORKFLOW (Step-by-Step)

1. Base Line Generation (TTS)

2. Convert to Character Voice (RVC)

3. Add Emotional FX (RVC Reaction Layer)

4. Cinematic SFX + Ambience

5. Mix & Master

6. Deliver Chapter-by-Chapter (for quality control)

---

UPDATED TIMELINE / DELIVERY OPTIONS

We now accept:

Option A — Solo Specialist

1 audiobook every 14 days
(≈ 6–7 hours, 40–90 chapters)
= 2 audiobooks per month maximum

OR

1 audiobook per month
(if you prefer a lower workload)

Option B — Teams / Specialists With Colleagues

Faster delivery is welcome

Can deliver up to 4 audiobooks per month

Both options are acceptable as long as the quality is consistent.

---

BUDGET

$400 per completed audiobook

Paid per approved:

full audiobook
or

chapter batch milestone

Guaranteed Minimum:

Solo worker (1 per month): $400/month

Solo worker (2 per month): $800/month

Teams (4 per month): $1600/month guaranteed

Budget Increase After 3 Months

After three months of consistent, high-quality delivery,
the rate per audiobook will increase significantly.

---

NDA REQUIREMENT

Before receiving:

scripts

character references

voice models

production assets

ALL team members must sign an NDA.

---

TO APPLY, PLEASE SEND:

1. Audio samples showing AI TTS → RVC conversion

2. Proof of your cinematic-style mixing

3. Description of your workflow

4. Confirmation you use ALL mandatory tools

5. Hardware specs (GPU required)

6. Your monthly availability

7. Whether you are solo or team

8. Confirmation you accept $400/book + guaranteed monthly work

$1,600.00
Fixed-price
Expert
Experience Level
Remote Job
Ongoing project
Project Type

Skills and Expertise

Mandatory skills

AI TTS generation

Real-Time Voice Conversion

Activity on this job

Proposals:5 to 10
Last viewed by client:1 hour ago
Interviewing:
2
Invites sent:
19
Unanswered invites:
16

About the client

Member since Apr 9, 2025

NLD
Bergen Op Zoom 8:44 PM
$3.8K total spent
21 hires, 2 active
Media & Entertainment
Small company (2-9 people)

Explore similar jobs on Upwork

Italian Voice Needed for AI Cloning ProjectFixed-price‐ Posted 2 weeks ago

English

Italian

Suno AI Music Creator for YouTube ChannelFixed-price‐ Posted 6 days ago

YouTube Marketing

YouTube Development

Music

Music Production

How it works

Create your free profile
Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you want
Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securely
From contract to payment, we help you work safely and get paid securely.