Installing Tesseract, PyTesseract, and Python OCR packages on your system

In this tutorial, we will configure our development environment for OCR. Once your machine is configured, we’ll start writing Python code to perform OCR, paving the way for you to develop your own OCR applications.

To learn how to configure your development environment, just keep reading.

Learning Objectives

In this tutorial, you will:

Learn how to install the Tesseract OCR engine on your machine
Learn how to create a Python virtual environment (a best practice in Python development)
Install the necessary Python packages you need to run the examples in this tutorial (and develop OCR projects of your own)

OCR Development Environment Configuration

In the first part of this tutorial, you will learn how to install the Tesseract OCR engine on your system. From there, you’ll learn how to create a Python virtual environment and then install OpenCV, PyTesseract, and all the other necessary Python libraries you’ll need for OCR, computer vision, and deep learning.

A Note on Install Instructions

The Tesseract OCR engine has existed for over 30 years. The install instructions for Tesseract OCR are fairly stable. Therefore I have included the steps.

With that said, let’s install the Tesseract OCR engine on your system!

Installing Tesseract

Inside this tutorial, you will learn how to install Tesseract on your machine.

Installing Tesseract on macOS

Installing the Tesseract OCR engine on macOS is quite simple if you use the Homebrew package manager.

Use the link above to install Homebrew on your system if it is not already installed.

From there, all you need to do is use the brew command to install Tesseract:

 $ brew install tesseract

Provided that the above command does not exit with an error, you should now have Tesseract installed on your macOS machine.

Installing Tesseract on Ubuntu

Installing Tesseract on Ubuntu 18.04 is easy — all we need to do is utilize apt-get:

 $ sudo apt install tesseract-ocr

The apt-get package manager will automatically install any prerequisite libraries or packages required for Tesseract.

Installing Tesseract on Windows

Please note that the PyImageSearch team and I do not officially support Windows, except for customers who use our pre-configured Jupyter/Colab Notebooks, which you can find at PyImageSearch University. These notebooks run on all environments, including macOS, Linux, and Windows.

We instead recommend using a Unix-based machine such as Linux/Ubuntu or macOS, both of which are better suited for developing computer vision, deep learning, and OCR projects.

That said, if you wish to install Tesseract on Windows, we recommend that you follow the official Windows install instructions put together by the Tesseract team.

Verifying Your Tesseract Install

Provided that you were able to install Tesseract on your operating system, you can verify that Tesseract is installed by using the tesseract command:

 $ tesseract -v
 tesseract 4.1.1
  leptonica-1.79.0
   libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
  Found AVX2
  Found AVX
  Found FMA
  Found SSE

Your output should look similar to mine.

Creating a Python Virtual Environment for OCR

Python virtual environments are a best practice for Python development, and we recommend using them to have more reliable development environments.

Installing the necessary packages for Python virtual environments, as well as creating your first Python virtual environment, can be found in our pip Install OpenCV tutorial. We recommend you follow that tutorial to create your first Python virtual environment.

Installing OpenCV and PyTesseract

Now that you have your Python virtual environment created and ready, we can install both OpenCV and PyTesseract, the Python package that interfaces with the Tesseract OCR engine.

Both of these can be installed using the following commands:

 $ workon <name_of_your_env> # required if using virtual envs
 $ pip install numpy opencv-contrib-python
 $ pip install pytesseract

Next, we’ll install other Python packages we’ll need for OCR, computer vision, deep learning, and machine learning.

Installing Other Computer Vision, Deep Learning, and Machine Learning Libraries

Let’s now install some other supporting computer vision and machine learning/deep learning packages that we’ll need throughout the rest of this tutorial:

 $ pip install pillow scipy
 $ pip install scikit-learn scikit-image
 $ pip install imutils matplotlib
 $ pip install requests beautifulsoup4
 $ pip install h5py tensorflow textblob

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned how to install the Tesseract OCR engine on your machine. You also learned how to install the required Python packages you will need to perform OCR, computer vision, and image processing.

Now that your development environment is configured, we will write an OCR code in our next tutorial!

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.

Learning Objectives

OCR Development Environment Configuration

A Note on Install Instructions

Installing Tesseract

Installing Tesseract on macOS

Installing Tesseract on Ubuntu

Installing Tesseract on Windows

Verifying Your Tesseract Install

Creating a Python Virtual Environment for OCR

Installing OpenCV and PyTesseract

Installing Other Computer Vision, Deep Learning, and Machine Learning Libraries

What's next? I recommend PyImageSearch University.

Summary

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

Comment section

How to install mxnet for deep learning

OCR a document, form, or invoice with Tesseract, OpenCV, and Python

Data pipelines with tf.data and TensorFlow

Topics

Books & Courses

PyImageSearch

Learning Objectives

OCR Development Environment Configuration

A Note on Install Instructions

Installing Tesseract

Installing Tesseract on macOS

Installing Tesseract on Ubuntu

Installing Tesseract on Windows

Verifying Your Tesseract Install

Creating a Python Virtual Environment for OCR

Installing OpenCV and PyTesseract

Installing Other Computer Vision, Deep Learning, and Machine Learning Libraries

What's next? I recommend PyImageSearch University.

Summary

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

What is Optical Character Recognition (OCR)?

Your First OCR Project with Tesseract and Python

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?