Your First OCR Project with Tesseract and Python

The first time I ever used the Tesseract optical character recognition (OCR) engine was in my college undergraduate years.

I was taking my first course on computer vision. Our professor wanted us to research a challenging computer vision topic for our final project, extend existing research, and then write a formal paper on our work. I had trouble deciding on a project, so I went to see the professor, a Navy researcher who often worked on medical applications of computer vision and machine learning. He advised me to work on automatic prescription pill identification, the process of automatically recognizing prescription pills in an image. I considered the problem for a few moments and then replied:

Couldn’t you just OCR the imprints on the pill to recognize it?

To learn how to conduct OCR on your first project, just keep reading.

Looking for the source code to this post?

Your First OCR Project with Tesseract and Python

I still remember the look on my professor’s face.

He smiled, a small smirk appearing on the left corner of his mouth. Knowing the problems I was going to encounter, he replied with “If only it were that simple. But you’ll find out soon enough.”

I then went home and immediately started playing with the Tesseract library, reading the manual/documentation, and attempting to OCR some example images via the command line. But I found myself struggling. Some images were being OCR’d correctly, while others were returning complete nonsense.

Why was OCR so hard? And why was I struggling so much?

I spent the evening, staying up late into the night, continuing to test Tesseract with various images — for the life of me, I couldn’t discern the pattern between images that Tesseract could correctly OCR versus the ones it could fail on. What black magic was going on here?!

Unfortunately, this is the same feeling I see many computer vision practitioners having when first starting to learn OCR — perhaps you have even felt it yourself:

You install Tesseract on your machine
You follow a few basic examples on a tutorial you found via a Google search
The examples return the correct results
… but when you apply the same OCR technique to your images, you get incorrect results back

Sound familiar?

The problem is that these tutorials don’t teach OCR systematically. They’ll show you the how, but they won’t show you the why — that critical piece of information that allows you to discern patterns in OCR problems, allowing you to solve them correctly.

In this tutorial, you’ll be building your very first OCR project. It will serve as the “bare bones” Python script you need to perform OCR. In future posts, we’ll build on what you learn here.

By the end of this tutorial, you’ll be confident in your ability to apply OCR to your projects.

Let’s get started.

Learning Objectives

In this tutorial, you will:

Gain hands-on experience using Tesseract to OCR an image
Learn how to import the pytesseract package into your Python scripts
Use OpenCV to load an input image from disk
Pass the image into the Tesseract OCR engine via the pytesseract library
Display the OCR’d text results on our terminal

Configuring your development environment

To follow this guide, you need to have the OpenCV library installed on your system.

Luckily, OpenCV is pip-installable:

$ pip install opencv-contrib-python

If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.

Having problems configuring your development environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

Getting Started with Tesseract

In the first part of this tutorial, we’ll review our directory structure for this project. From there, we’ll implement a simple Python script that will:

Load an input image from disk via OpenCV
OCR the image via Tesseract and pytesseract
Display the OCR’d text on our screen

We’ll wrap up the tutorial with a discussion of the OCR’d text results.

Project Structure

|-- pyimagesearch_address.png
|-- steve_jobs.png
|-- whole_foods.png
|-- first_ocr.py

Our first project is very straightforward in the way it is organized. Inside the tutorial’s code directory, you’ll find three example PNG images for OCR testing and a single Python script named first_ocr.py.

Let’s dive right into our Python script in the next section.

Basic OCR with Tesseract

Let’s get started with your very first Tesseract OCR project! Open a new file, name it first_ocr.py, and insert the following code:

# import the necessary packages
import pytesseract
import argparse
import cv2

# construct the argument parser and parse the arguments}
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to input image to be OCR'd")
args = vars(ap.parse_args())

The first Python import you’ll notice in this script is pytesseract (Python Tesseract), a Python binding that ties in directly with the Tesseract OCR application running on your system. The power of pytesseract is our ability to interface with Tesseract rather than relying on ugly os.cmd calls as we needed to do before pytesseract ever existed. Thanks to its power and ease of use, we’ll use pytesseract in this and future tutorials!

Our script requires a single command line argument using Python’s argparse interface. By providing the --image argument and image file path value directly in your terminal when you execute this example script, Python will dynamically load an image of your choosing. I’ve provided three example images in the project directory for this tutorial that you can use. I also highly encourage you to try using Tesseract via this Python example script to OCR your images!

Now that we’ve handled our imports and lone command line argument, let’s get to the fun part — OCR with Python:

# load the input image and convert it from BGR to RGB channel
# ordering}
image = cv2.imread(args["image"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# use Tesseract to OCR the image
text = pytesseract.image_to_string(image)
print(text)

Here, Lines 14 and 15 load our input --image from disk and swap color channel ordering. Tesseract expects RGB-format images; however, OpenCV loads images in BGR order. This isn’t a problem because we can fix it using OpenCV’s cv2.cvtColor call — just be especially careful to know when to use RGB (Red Green Blue) vs. BGR (Blue Green Red).

Remark 1. I’d also like to point out that many times when you see Tesseract examples online, they will use PIL or pillow to load an image. Those packages load images in RGB format, so a conversion step is not required.

Finally, Line 18 performs OCR on our input RGB image and returns the results as a string stored in the text variable.

Given that text is now a string, we can pass it onto Python’s built-in print function and see the result in our terminal (Line 19). Future examples will explain how to annotate an input image with the text itself (i.e., overlay the text result on a copy of the input --image using OpenCV, and display it on your screen).

We’re done!

Wait, for real?

Oh yeah, if you didn’t notice, OCR with PyTesseract is as easy as a single function call, provided you’ve loaded the image in proper RGB order. So now, let’s check the results and see if they meet our expectations.

Tesseract OCR Results

Let’s put our newly implemented Tesseract OCR script to the test. Open your terminal, and execute the following command:

$ python first_ocr.py --image pyimagesearch_address.png
PyImageSearch
PO Box 17598 #17900
Baltimore, MD 21297

In Figure 2, you can see our input image, which contains the address for PyImageSearch on a gray, slightly textured background. As the command and terminal output indicate, both Tesseract and pytesseract correctly, OCR’d the text.

**Figure 2.** A slightly noisy image of PyImageSearch’s business address.

Let’s try another image, this one of Steve Jobs’ old business card:

$ python first_ocr.py --image steve_jobs.png
Steven P. Jobs
Chairman of the Board

Apple Computer, Inc.

20525 Mariani Avenue, MS: 3K
Cupertino, California 95014
408 973-2121 or 996-1010.

Steve Jobs’ business card in Figure 3 is correctly OCR’d even though the input image is posing several difficulties common to OCR’ing scanned documents, including:

Yellowing of the paper due to age
Noise on the image, including speckling
Text that is starting to fade

**Figure 3.** Steve Jobs’ Apple Computer Inc. business card containing the company address. Image source: http://pyimg.co/sjbiz.

Despite all these challenges, Tesseract was able to correctly OCR the business card. But that begs the question — is OCR this simple? Do we just open a Python shell, import the pytesseract package, and then call image_to_string on an input image? Unfortunately, OCR isn’t that simple (if it were, this tutorial would be unnecessary). As an example, let’s apply our same first_ocr.py script to a more challenging photo of a Whole Food’s receipt:

$ python first_ocr.py --image whole_foods.png
aie WESTPORT CT 06880

yHOLE FOODS MARKE
399 post RD WEST ~ ;

903) 227-6858

BACON LS NP

365
pacon LS N

The Whole Foods grocery store receipt in Figure 4 was not OCR’d correctly using Tesseract. You can see that Tesseract has to spit out a bunch of garbled nonsense. OCR isn’t always perfect.

**Figure 4.** A Whole Foods grocery store receipt. This image presents an interesting challenge for Tesseract.

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you created your very first OCR project using the Tesseract OCR engine, the pytesseract package (used to interact with the Tesseract OCR engine), and the OpenCV library (used to load an input image from disk).

We then applied our basic OCR script to three example images. Our basic OCR script worked for the first two but struggled tremendously for the final one. So what gives? Why was Tesseract able to OCR the first two examples perfectly but then utterly fail on the third image? The secret lies in the image pre-processing steps, along with the underlying Tesseract modes and options.

Congrats on completing today’s tutorial, well done!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

Your First OCR Project with Tesseract and Python

Learning Objectives

Configuring your development environment

Having problems configuring your development environment?

Getting Started with Tesseract

Project Structure

Basic OCR with Tesseract

Tesseract OCR Results

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Comment section

Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy

Convolutions with OpenCV and Python

OpenCV Vehicle Detection, Tracking, and Speed Estimation

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Your First OCR Project with Tesseract and Python

Learning Objectives

Configuring your development environment

Having problems configuring your development environment?

Getting Started with Tesseract

Project Structure

Basic OCR with Tesseract

Tesseract OCR Results

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Installing Tesseract, PyTesseract, and Python OCR packages on your system

Detecting and OCR’ing Digits with Tesseract and Python

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?