AI Developer — Build a Whisper-Based Subtitle Extraction Tool from Scratch (Python/Flask)

Posted yesterday

Worldwide

Summary

We need an AI-powered subtitle extraction tool built from the ground up — a tool that takes any video file and automatically generates accurate subtitles in the video's original spoken language, using local AI speech recognition (no cloud APIs, no translation). What it should do: Accept video/audio file uploads through a simple web UI Extract audio and clean it up (remove silence/noise) before transcription Use an AI speech-to-text model (Whisper) to transcribe speech — automatically detecting the spoken language (English, Korean, Russian, Turkish, etc. — any language) Merge raw transcription chunks into natural, well-timed subtitle sentences Output a clean, properly formatted .srt file, downloadable from the browser Support queued processing — multiple videos can be uploaded and processed one after another in the background Allow canceling a queued job before it starts Tech we expect you to use: Whisper (AI speech recognition model) — for transcription, ideally via mlx-whisper for Apple Silicon performance, or openai-whisper / faster-whisper as alternatives Voice Activity Detection (AI model) — e.g. Silero VAD — to strip silence before transcription for better accuracy Python backend (Flask or similar) with a background job queue ffmpeg for audio extraction Simple, clean web frontend (HTML/CSS/JS) — no framework required Why this is an AI project: This isn't just file handling — the core value is two AI models working together (speech detection + speech-to-text transcription) to turn raw audio into readable, well-timed subtitles automatically, without any manual transcription or paid translation API. Requirements: Proven experience with Whisper or similar ASR (Automatic Speech Recognition) models Comfortable with audio preprocessing (ffmpeg, sample rates, normalization) Python backend experience (Flask/FastAPI) Bonus: experience with MLX (Apple Silicon ML framework) or CUDA-accelerated inference Deliverable: Fully working app, source code, brief documentation on setup/running it locally.

  • $30.00

    Fixed-price
  • Expert
    Experience Level
  • Remote Job
  • One-time project
    Project Type
Skills and Expertise
Mandatory skills
Python
API
Machine Learning
Activity on this job
  • Proposals:10 to 15
  • Last viewed by client:4 hours ago
  • Interviewing:
    1
  • Invites sent:
    1
  • Unanswered invites:
    0
About the client
Member since Jun 26, 2026
  • KOR
    Dangjin9:47 PM
  • Tech & IT
    Small company (2-9 people)

Explore similar jobs on Upwork

Paid Interview: LangSmith Fleet UsersHourly‐ Posted 2 weeks ago
LangChain
Asset Production SuiteFixed-price‐ Posted 3 weeks ago
API Integration
PHP
Python
JavaScript

How it works

  • Post a job icon
    Create your free profile
    Highlight your skills and experience, show your portfolio, and set your ideal pay rate.
  • Talent comes to you icon
    Work the way you want
    Apply for jobs, create easy-to-by projects, or access exclusive opportunities that come to you.
  • Payment simplified icon
    Get paid securely
    From contract to payment, we help you work safely and get paid securely.
Want to get started? Create a profile

About Upwork

  • Rating is 4.9 out of 5.
    4.9/5
    (Average rating of clients by professionals)
  • G2 2021
    #1 freelance platform
  • 49,000+
    Signed contract every week
  • $2.3B
    Freelancers earned on Upwork in 2020

Find the best freelance jobs

Growing your career is as easy as creating a free profile and finding work like this that fits your skills.

Trusted by

  • Microsoft Logo
  • Airbnb Logo
  • Bissell Logo
  • GoDaddy Logo