Monday, March 20, 2023

Speech-to-text with Whisper

Whisper is another neat productivity tool that has been translated to a high-performance model that can be run without specialized hardware, even on your phone!

The speed and accuracy is remarkable - it takes only a few minutes to create a transcript of an hour-long seminar. While these capabilities have been around for some time (e.g. subtitle options in Youtube and video conferencing programs), it is great there are now fast, open source tools that can be run locally, without an internet connection or the privacy risks of sending your data to some untrusted server on the cloud.

Some potential applications in research:

  • Brainstorming - discussions can be transcribed to a text format that can be more easily reviewed later (e.g. searching for keywords).
  • Paper drafting - often when writing or typing we fall into the habit of writing long convoluted sentences that need heavy editing to make them more readable and digestible. Dictating parts of a paper might be a better way to come up with clear and concise text.
  • Converting recordings of conference / workshop talks into full-length paper drafts or conference proceedings. I am trying this one out on one of the online talks I gave during covid.

The ability to quickly and accurately convert research between different formats (text, audio, visual, different languages, etc.) will ultimately improve the accessibility of research, ensuring that it is open for all to use and build on. Further reading on this important initiative can be found on the arXiv blog.


No comments:

Post a Comment