Audio Quality Review
Worldwide
We're building AI voice agents that handle real phone conversations — think customer support, scheduling, logistics. To power these agents, we first collect voice samples from human speakers, then use those samples to create synthetic voices. Your job is to review those human recordings and decide which ones are good enough to move forward. What you'll be doing You'll listen to short voice recordings and score them across a set of criteria. For each sample, you'll also write practical coaching feedback — not just "good" or "too monotone," but specific notes that tell the speaker exactly what to change in their next recording. This is selective work. For each language and gender, we're typically looking to advance only one or two speakers to full recording. Most samples will not make the cut, and that's expected — your job is to uphold a high standard, not to pass people through. What we're looking for in voice samples Conversational delivery — sounds like a real exchange, not a script being read aloud Authenticity — human and unforced, not theatrical, robotic, or overly polished Tone & emotional range — natural variation in pitch and energy; flat delivery is a disqualifier Pacing — clear and brisk with natural pauses, not rushed or unnaturally slow Audio quality — strict yes/no; any noise, distortion, or echo is an automatic fail Small hesitations, brief restarts, and natural breaths are fine — these make a voice sound real. What we penalize is performed casualness: imperfections that feel intentional, theatrical, or distracting. What good feedback looks like Brief comments like "good delivery" or "too robotic" are not useful. We need actionable coaching: what specifically should the speaker change, and how would that change make them sound more like a real person on a real call? The bar for "proceed to full recording" is high. It is reserved for samples that are genuinely ready, and is usually safest after a second round where the speaker has already applied your first-round feedback. Who we're looking for Native or near-native speaker of the target language(s) Strong ear for natural speech — you can tell the difference between someone reading and someone talking Ability to write clear, specific, actionable feedback in English Detail-oriented and consistent across a batch of samples Experience in voice acting, linguistics, speech coaching, or AI data work is a plus — not required To apply Tell us which language(s) you're evaluating for, and briefly describe what you listen for when deciding if a voice sounds natural versus performed. Applications without this will not be reviewed. Compensation is per batch. Details shared with shortlisted candidates. WE ARE LOOKING FOR THESE LANGUAGES: Arabic, Brazilian Portuguese, Croatian, Dutch, English, Finnish, Hungarian, Indonesian, Korean, Polish, Romanian, Thai, Turkish.
$300.00
Fixed-price- IntermediateExperience Level
- Remote Job
- One-time projectProject Type
Skills and Expertise
Activity on this job
- Proposals:50+
- Last viewed by client:5 days ago
- Hires:3
- Interviewing:1
- Invites sent:0
- Unanswered invites:0
About the client
- USASan Francisco12:00 PM
- $1.8K total spent25 hires, 25 active
Explore similar jobs on Upwork
How it works
Create your free profileHighlight your skills and experience, show your portfolio, and set your ideal pay rate.
Work the way you wantApply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
Get paid securelyFrom contract to payment, we help you work safely and get paid securely.
About Upwork
- 4.9/5(Average rating of clients by professionals)
- G2 2021#1 freelance platform
- 49,000+Signed contract every week
- $2.3BFreelancers earned on Upwork in 2020
Find the best freelance jobs
Growing your career is as easy as creating a free profile and finding work like this that fits your skills.
Trusted by