Extract Text From Image In HTML CSS (With OCR)

Can Javascript extract text from an image? Directly in the browser without crazy AI stuff? Yes, it can be done with Optical Character Recognition (OCR), a technology that exists before the ChatGPT that we know today. In this tutorial, let us walk through 2 simple examples of using OCR with Javascript, let’s go.

CODE DOWNLOAD

On GitHub
As a zip
Or just git clone https://github.com/dev-n-coffee/hml-js-image-to-text.git

I have released this under the MIT license, feel free to use it in your own project – Personal or commercial. Some form of credits will be nice though. 🙂

VIDEO TUTORIAL

EXAMPLE 1) IMAGE TO TEXT

1A) THE HTML

1-img-txt.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Image To Text Converter</title>
  <link rel="stylesheet" href="0-dummy.css">
  <script defer src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/6.0.0/tesseract.min.js"></script>
  <script defer src="1-img-txt.js"></script>
</head>
<body>
  <input type="file" accept="image/*" id="fileIn">
  <textarea id="textOut"></textarea>
</body>
</html>

<input type="file" accept="image/*" id="fileIn"> File picker to select an image.
<textarea id="textOut"> To output the text.

The OCR library that we will be using is Tesseract JS.
- You can download and host it on your own server.
- Or load Tesseract JS from a CDN.

P.S. The CSS is not important, just some cosmetics.

1B) THE JAVASCRIPT

1-img-txt.js

window.addEventListener("load", async () => {
  // (A) INIT - GET FILE PICKER, TEXT AREA, CREATE TESSERACT WORKER
  var fileIn = document.getElementById("fileIn"),
      textOut = document.getElementById("textOut"),
      worker = await Tesseract.createWorker("eng");

  // (B) EXTRACT TEXT FROM SELECTED IMAGE
  fileIn.onchange = async () => {
    // (B1) TESERRACT I.I.I.IMAGE.E.E TO T.T.TEXT.T.T
    let img = fileIn.files[0], // selected image file
        res = await worker.recognize(img); // use tesseract to extract text

    // (B2) OUTPUT TEXT
    textOut.value = res.data.text; // put text into text area
    navigator.clipboard.writeText(res.data.text); // optional - put into clipboard
  };
});

First, wait for the page and library to load.
(A) Get the file picker, text box, and create an English Tesseract worker. Yes, Tesseract works with other languages too. Just check out their documentation for the entire list.
(B) When the user selects an image file…
- (B1) Get the selected image file, pass it to Tesseract to process image to text.
- (B2) Set the extracted text in the text box, we can also output to the clipboard.

EXAMPLE 2) WEBCAM TO TEXT

2A) THE HTML

2-cam-txt.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Webcam To Text Converter</title>
  <link rel="stylesheet" href="0-dummy.css">
  <script defer src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/6.0.0/tesseract.min.js"></script>
  <script defer src="2-cam-txt.js"></script>
</head>
<body>
  <video id="camFeed" autoplay></video>
  <button id="camGo">Read Text</button>
  <textarea id="textOut"></textarea>
</body>
</html>

Pretty much the same as above, except:

<video id="camFeed" autoplay> To display the webcam live feed.

<button id="camGo"> Toggle “extract text from current webcam frame”.
<textarea id="textOut"> Same. Output text here.

2B) THE JAVASCRIPT

2-cam-txt.js

window.addEventListener("load", async () => {
  // (A) INIT - GET VIDEO, BUTTON, TEXT AREA, CREATE TESSERACT WORKER
  var camFeed = document.getElementById("camFeed"),
      camGo = document.getElementById("camGo"),
      textOut = document.getElementById("textOut"),
      worker = await Tesseract.createWorker("eng");

  // (B) GET WEBCAM ACCESS
  navigator.mediaDevices.getUserMedia({ video: true })
  .then(async (stream) => {
    // (B1) PUT WEBCAM LIVE STREAM INTO <VIDEO>
    camFeed.srcObject = stream;

    // (B2) "READ TEXT" BUTTON ACTION
    camGo.onclick = async () => {
      // (B2-1) CREATE AN EMPTY <CANVAS>
      let canvas = document.createElement("canvas");
      canvas.width = camFeed.videoWidth;
      canvas.height = camFeed.videoHeight;

      // (B2-2) CAPTURE CURRENT WEBCAM FRAME ONTO <CANVAS>
      canvas.getContext("2d").drawImage(
        camFeed, 0, 0, camFeed.videoWidth, camFeed.videoHeight
      );

      // (B2-3) PASS THE CURRENT WEBCAM FRAME TO TESSERACT
      let img = canvas.toDataURL("image/png"), // current webcam frame to png image
          res = await worker.recognize(img); // use tesseract to extract text

      // (B2-4) OUTPUT TEXT
      textOut.value = res.data.text; // put text into text area
      navigator.clipboard.writeText(res.data.text); // optional - put into clipboard
    };
  })
  .catch(err => console.error(err));
});

Keep calm and let’s walk through this part-by-part:

(A) The usual – Get HTML video, button, text box. Create an English Tesseract worker.

(B) Get access permission to the webcam.
(B1) On obtaining permission, we patch the webcam livefeed into the <video> tag.
(B2) When the user clicks on the “Read Text” button. We cannot directly send the webcam feed into Tesseract, thus the entire section of confusion.
- (B2-1) Create an empty <canvas> first.
- (B2-2) Capture the current webcam frame onto the <canvas>.
- (B2-3) Convert the <canvas> into a PNG image, send it to Tesseract.
- (B2-4) Output the extracted text.

THE END

That’s all for this tutorial, but before we end – Take note of the limitations. OCR is not an “hardcore AI”. It cannot read complicated images and automatically filter out the “irrelavant bits”. You will have to feed a “clean text image” to Tesseract if you want good results.