Write an interface for the Google speech recognition API

Web, Mobile & Software Dev Scripts & Utilities Posted 2 years ago

Hourly Job

Hours to be determined
Less than 1 week
$$$

Expert Level

I am willing to pay higher rates for the most experienced freelancers

Details

We need someone to write an interface to the Google Speech-To-Text API (a.k.a. the Chrome web speech kit) that works for long audio files.

We have a Linux script that sends audio files to the Google speech recognition URL:

http://www.google.com/speech-api/v1/recognize

Our script receives the transcribed text back from the Google API.  The script is in Linux shell and uses a curl command to send the audio.  It works reasonably well for short (around four seconds) files, but fails most of the time on longer files such as voicemail messages (1-2 minutes).

We suspect that our simple interface is doing something critically different from what Chrome does, since the Google interface presumably works reliably within Chrome.  It may be that the Google recognizer chokes on our file, which is sent all at once instead of streaming in real-time as would happen with Chrome, but this is just a guess.

Your job is to write a script that reliably uses the Google API to transcribe short and long audio files.  We will share a copy of our script and the long files that fail.  The files are in .WAV but you may need to convert them to the FLAC format, so audio conversion experience is necessary.

It would be great if you have experience using the Google speech recognition service, but that is optional.  It would be convenient if you wrote your solution in Linux shell and/or Python, since it would then integrate easily with our other scripts, but that may not be possible.  The chrome API uses AJAX to display partial results and recognition status updates, so knowledge of AJAX and audio streaming would be useful.

One measure of success will be that your script successfully gets a transcription back from Google for the files we provide without repeatedly resubmitting the audio.  The developer who needs the code is on vacation till August 20th, so we would like to have your code ready for him by then.

Here are a few sites that explain the API:

https://gist.github.com/alotaiba/1730160
http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API
http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi-errata.html

Skills Required:

Client Activity on this Job

Last Viewed: 1 year ago

Proposals: 3

Hired: 3


About the Client

(4.97) 52 reviews

United States
Piscataway 12:28 PM

20 Jobs Posted
85% Hire Rate, 1 Open Job

Over $40,000 Total Spent
88 Hires, 0 Active

$17.29/hr Avg Hourly Rate Paid
2,633 Hours

Member Since Jul 26, 2013