Can Javascript extract text from an image? Directly in the browser without crazy AI stuff? Yes, it can be done with Optical Character Recognition (OCR), a technology that exists before the ChatGPT that we know today. In this tutorial, let us walk through 2 simple examples of using OCR with Javascript, let’s go.
CODE DOWNLOAD
I have released this under the MIT license, feel free to use it in your own project – Personal or commercial. Some form of credits will be nice though. 🙂
VIDEO TUTORIAL
EXAMPLE 1) IMAGE TO TEXT
1A) THE HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Image To Text Converter</title>
<link rel="stylesheet" href="0-dummy.css">
<script defer src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/6.0.0/tesseract.min.js"></script>
<script defer src="1-img-txt.js"></script>
</head>
<body>
<input type="file" accept="image/*" id="fileIn">
<textarea id="textOut"></textarea>
</body>
</html>
<input type="file" accept="image/*" id="fileIn">
File picker to select an image.<textarea id="textOut">
To output the text.- The OCR library that we will be using is Tesseract JS.
- You can download and host it on your own server.
- Or load Tesseract JS from a CDN.
P.S. The CSS is not important, just some cosmetics.
1B) THE JAVASCRIPT
window.addEventListener("load", async () => {
// (A) INIT - GET FILE PICKER, TEXT AREA, CREATE TESSERACT WORKER
var fileIn = document.getElementById("fileIn"),
textOut = document.getElementById("textOut"),
worker = await Tesseract.createWorker("eng");
// (B) EXTRACT TEXT FROM SELECTED IMAGE
fileIn.onchange = async () => {
// (B1) TESERRACT I.I.I.IMAGE.E.E TO T.T.TEXT.T.T
let img = fileIn.files[0], // selected image file
res = await worker.recognize(img); // use tesseract to extract text
// (B2) OUTPUT TEXT
textOut.value = res.data.text; // put text into text area
navigator.clipboard.writeText(res.data.text); // optional - put into clipboard
};
});
- First, wait for the page and library to load.
- (A) Get the file picker, text box, and create an English Tesseract worker. Yes, Tesseract works with other languages too. Just check out their documentation for the entire list.
- (B) When the user selects an image file…
- (B1) Get the selected image file, pass it to Tesseract to process image to text.
- (B2) Set the extracted text in the text box, we can also output to the clipboard.
EXAMPLE 2) WEBCAM TO TEXT
2A) THE HTML
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Webcam To Text Converter</title>
<link rel="stylesheet" href="0-dummy.css">
<script defer src="https://cdnjs.cloudflare.com/ajax/libs/tesseract.js/6.0.0/tesseract.min.js"></script>
<script defer src="2-cam-txt.js"></script>
</head>
<body>
<video id="camFeed" autoplay></video>
<button id="camGo">Read Text</button>
<textarea id="textOut"></textarea>
</body>
</html>
Pretty much the same as above, except:
<video id="camFeed" autoplay>
To display the webcam live feed.<button id="camGo">
Toggle “extract text from current webcam frame”.<textarea id="textOut">
Same. Output text here.
2B) THE JAVASCRIPT
window.addEventListener("load", async () => {
// (A) INIT - GET VIDEO, BUTTON, TEXT AREA, CREATE TESSERACT WORKER
var camFeed = document.getElementById("camFeed"),
camGo = document.getElementById("camGo"),
textOut = document.getElementById("textOut"),
worker = await Tesseract.createWorker("eng");
// (B) GET WEBCAM ACCESS
navigator.mediaDevices.getUserMedia({ video: true })
.then(async (stream) => {
// (B1) PUT WEBCAM LIVE STREAM INTO <VIDEO>
camFeed.srcObject = stream;
// (B2) "READ TEXT" BUTTON ACTION
camGo.onclick = async () => {
// (B2-1) CREATE AN EMPTY <CANVAS>
let canvas = document.createElement("canvas");
canvas.width = camFeed.videoWidth;
canvas.height = camFeed.videoHeight;
// (B2-2) CAPTURE CURRENT WEBCAM FRAME ONTO <CANVAS>
canvas.getContext("2d").drawImage(
camFeed, 0, 0, camFeed.videoWidth, camFeed.videoHeight
);
// (B2-3) PASS THE CURRENT WEBCAM FRAME TO TESSERACT
let img = canvas.toDataURL("image/png"), // current webcam frame to png image
res = await worker.recognize(img); // use tesseract to extract text
// (B2-4) OUTPUT TEXT
textOut.value = res.data.text; // put text into text area
navigator.clipboard.writeText(res.data.text); // optional - put into clipboard
};
})
.catch(err => console.error(err));
});
Keep calm and let’s walk through this part-by-part:
- (A) The usual – Get HTML video, button, text box. Create an English Tesseract worker.
- (B) Get access permission to the webcam.
- (B1) On obtaining permission, we patch the webcam livefeed into the
<video>
tag. - (B2) When the user clicks on the “Read Text” button. We cannot directly send the webcam feed into Tesseract, thus the entire section of confusion.
- (B2-1) Create an empty
<canvas>
first. - (B2-2) Capture the current webcam frame onto the
<canvas>
. - (B2-3) Convert the
<canvas>
into a PNG image, send it to Tesseract. - (B2-4) Output the extracted text.
- (B2-1) Create an empty
THE END
That’s all for this tutorial, but before we end – Take note of the limitations. OCR is not an “hardcore AI”. It cannot read complicated images and automatically filter out the “irrelavant bits”. You will have to feed a “clean text image” to Tesseract if you want good results.