Javascript Speech To Text (Transcribe Audio In Browser)

Once upon a time, an angry student came to Master Coffee. She complained about “stupid rules disallowing taking videos as notes during lectures in the digital age”, and Master Coffee offered her a “smarter than the rules” solution – Transcribe audio to text in class.

Yes, it’s the digital age. There are many technologies to solve stupid problems and restrictions. The rules didn’t say it’s illegal to transcribe audio in class, so let Master Coffee walk you through a simple example – Let’s go.

CODE DOWNLOAD

ZIP File

I have released this under the MIT license, feel free to use it in your own project – Personal or commercial. Some form of credits will be nice though. 🙂

VIDEO TUTORIAL

TRANSCRIBE AUDIO DEMO

Click the above button to start, and speak something into the mic. Take note that it requires the Speech Recognition API, and it is only supported in some browsers at the time of writing.

1) THE HTML

transcribe.html

<textarea id="result"></textarea>
<input type="button" id="toggle" value="Loading" onclick="transcribe.toggle()" disabled>

<textarea> To output the speech-to-text result.

<input type="button"> A button to toggle start/stop transcribe.

2) THE JAVASCRIPT

transcribe.js

var transcribe = {
  // (PART A) PROPERTIES & FLAGS
  hres : null, // html textarea
  htog : null, // html toggle button
  sr : null, // speech recognition object
  listening : false, // speech recognition in progress

  // (PART B) INIT
  init : () => {
    // (B1) GET HTML ELEMENTS
    transcribe.hres = document.getElementById("result");
    transcribe.htog = document.getElementById("toggle");
    transcribe.htog.value = "Click to start";
    transcribe.htog.disabled = false;

    // (B2) INIT SPEECH RECOGNITION
    const SR = window.SpeechRecognition || window.webkitSpeechRecognition;
    transcribe.sr = new SR();
    transcribe.sr.lang = "en-US";
    transcribe.sr.continuous = true;
    transcribe.sr.interimResults = false;

    // (B3) OUTPUT RESULT
    transcribe.sr.onresult = e => {
      let said = e.results[e.results.length-1][0].transcript.trim();
      said = said.charAt(0).toUpperCase() + said.slice(1) + ".";
      document.getElementById("result").value += said + "\n";
    };

    // (B4) ON ERROR
    transcribe.sr.onerror = e => {
      console.error(e);
      transcribe.htog.value = "ERROR";
      transcribe.htog.disabled = true;
      alert("Make sure a mic is attached and permission is granted.");
    };
  },

  // (PART C) TOGGLE START/STOP RECOGNITION
  toggle : () => {
    if (transcribe.listening) {
      transcribe.sr.stop();
      transcribe.htog.value = "Click to start";
    } else {
      transcribe.sr.start();
      transcribe.htog.value = "Click to stop";
    }
    transcribe.listening = !transcribe.listening;
  }
};

// (PART D) START
window.addEventListener("load", transcribe.init);

Keep calm, drink some coffee, and it is easier to trace in this order:

(B & D) On window load, we initialize the “transcriber app” with transcribe.init().
(B) Long-winded but straightforward initialize process.
- (B1) Get the HTML text area and toggle button.
- (B2) Create a new speech recognition object.
- (B3) On successful transcribing, output the text into the HTML text area.
- (B4) On errors, show the error.
(C) Self-explanatory. Toggle start and stop speech recognition.

MICROPHONE PERMISSION

Take note, the user needs to give the microphone permission for this to work. If the user denies access permission, the only way is to enable it manually – In most browsers, click on the icon beside the URL and allow “microphone”.

THE END – TRANSCRIBE RESTRICTION

That’s all for this short tutorial and sharing. Just a few “small notes” to end this one:

How accurate the transcription is depends on the browser and platform. Although, you can change the language in (B2) and even tweak the output in (B3).
A small worry I have is with (B3) let said = e.results[e.results.length-1][0].transcript. This array seems to grow infinitely long, and can be a potential problem if you transcribe for hours.
You may want to stop at a certain limit and save the results to persistent storage – if (e.results.length==100) { STOP TRANSCRIBE & SAVE }.

If you want to save the transcribed text somewhere, check out my tutorial on storing data in Javascript.
Lastly, there is seemingly no way to transcribe an audio file directly… Play the audio file, and put your microphone beside the speaker.