We need someone to write an interface to the Google Speech-To-Text API (a.k.a. the Chrome web speech kit) that works for long audio files.
We have a Linux script that sends audio files to the Google speech recognition URL:
Our script receives the transcribed text back from the Google API. The script is in Linux shell and uses a curl command to send the audio. It works reasonably well for short (around four seconds) files, but fails most of the time on longer files such as voicemail messages (1-2 minutes).
We suspect that our simple interface is doing something critically different from what Chrome does, since the Google interface presumably works reliably within Chrome. It may be that the Google recognizer chokes on our file, which is sent all at once instead of streaming in real-time as would happen with Chrome, but this is just a guess.
Your job is to write a script that reliably uses the Google API to transcribe short and long audio files. We will share a copy of our script and the long files that fail. The files are in .WAV but you may need to convert them to the FLAC format, so audio conversion experience is necessary.
It would be great if you have experience using the Google speech recognition service, but that is optional. It would be convenient if you wrote your solution in Linux shell and/or Python, since it would then integrate easily with our other scripts, but that may not be possible. The chrome API uses AJAX to display partial results and recognition status updates, so knowledge of AJAX and audio streaming would be useful.
One measure of success will be that your script successfully gets a transcription back from Google for the files we provide without repeatedly resubmitting the audio. The developer who needs the code is on vacation till August 20th, so we would like to have your code ready for him by then.
Here are a few sites that explain the API: