Are you an academic researcher or podcaster looking for a free-to-use transcription tool that does not use US-based cloud processing to do its work? If that sounds like you, then I’d like to introduce you to Whisper.ai!
Note: that the Whisper.ai link above is to the open-source code for the project which is quite difficult for non-programmers to use... don't worry I'll provide links to more user-friendly versions of the software below.
For example, I installed the Whisper Transcription software (for Mac’s which is a graphical wrapper on the open-source Whisper.ai command line tools) on my 14-inch M1 MacBook Pro, and it transcribed a 30-minute podcast interview in 1 minute and 15 seconds! Not only did it transcribe the interview, but it also gave me the option of grouping the transcribed text into paragraphs for each speaker, to make it easier for me to know who was talking at different points in the recording.
For Windows users, a good option is GoWhisper which also uses a freminum model to fund the development of their Whisper.ai graphical interface.
Whisper.ai is an openly licensed project (MIT license) from the folks at OpenAI, however unlike ChatGPT which runs on cloud-based servers, and ingests your prompts as training data, Whisper.ai can be installed locally on your laptop and does all the processing on your computer. This eliminates the privacy concerns inherent in Otter.ai for example, which for Canadian-based researchers means that they do not have to write cloud-based computing into their research ethics proposals, and cloud-based storage consent into the consent forms that their research participants need to read and sign.
While there are a lot of paid tools available for transcribing and translating speech to text like Otter.ai, and some free tools like YouTube that work fairly well, almost all of them use cloud-based services to process the transcriging. up until relatively recently, I have not encountered any free tools that will transcribe audio locally on my laptop without doing any processing in the cloud. Enter Whisper.ai from Open AI, the makers of ChatGPT!
Whisper.ai is quite reliable in my experience, but its reliability does vary by language. Below are charts provided by OpenAI outlining its reliability for various languages, and available language models. Note: The smaller bars on the charts below indicate more reliable transcription.
Whisper.ai Pros and Cons
Why should you consider using a Whisper.ai based tool to transcribe your audio files?
Whisper.ai is free to use
Whisper.ai uses your computer’s processing power for transcription, so there are no worries about personally identifiable or confidential information stored in the cloud or potentially being used as training data for other AI tools
The graphical wrappers on Whisper.ai – Whisper Transcription & GoWhisper – are very easy to use
Whisper.ai relatively fast
The Mac-based Whisper Transcription software creates separate paragraphs for different speakers (which can make analyzing interviews easier for researchers)
Creates files for closed captioning video
What are some potential drawbacks to Whisper.ai?
Whisper.ai is slower than Otter.ai (instead of processing a 30-minute interview in 20 seconds, Whisper.ai took 1 minute and 15 seconds on my M1 MacBook Pro).
The Free versions of both Whisper Transcription & GoWhisper – allow users to use smaller models for analyzing the audio, and require users to pay a licensing fee to enable the software to use the larger and more accurate models (but also slower)
Older computers will take longer to transcribe audio than newer faster computers