Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy

Most introductions to Tesseract tutorials will provide you with instructions to install and configure Tesseract on your machine, provide one or two examples of how to use the tesseract binary, and then perhaps how to integrate Tesseract with Python using a library such as pytesseract — the problem with these intro tutorials is that they fail to capture the importance of page segmentation modes (PSMs). Let’s get a bullseye with our OCR.

Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy

After going through these guides, a computer vision/deep learning practitioner is given the impression that OCR’ing an image, regardless of how simple or complex it may be, is as simple as opening up a shell, executing the tesseract command, and providing the path to the input image (i.e., no additional options or configurations).

More times than not (and nearly always the case for complex images), Tesseract either:

Cannot optical character recognition (OCR) any of the text in the image, returning an empty result
Attempts to OCR the text, but is wildly incorrect, returning nonsensical results

In fact, that was the case for me when I started playing around with OCR tools back in college. I read one or two tutorials online, skimmed through the documentation, and quickly became frustrated when I couldn’t obtain the correct OCR result. I had absolutely no idea how and when to use different options. I didn’t even know what half the options controlled as the documentation was so sparse and did not provide concrete examples!

The mistake I made, and perhaps one of the biggest issues I see with budding OCR practitioners make now, is not fully understanding how Tesseract’s page segmentation modes can dramatically influence the accuracy of your OCR output.

When working with the Tesseract OCR engine, you absolutely have to become comfortable with Tesseract’s PSMs — without them, you’re quickly going to become frustrated and will not be able to obtain high OCR accuracy.

Inside this tutorial, you’ll learn all about Tesseract’s 14 page segmentation modes, including:

What they do
How to set them
When to use each of them (thereby ensuring you’re able to correctly OCR your input images)

Let’s dive in!

Learning Objectives

In this tutorial, you will:

Learn what page segmentation modes (PSMs) are
Discover how choosing a PSM can be the difference between a correct and incorrect OCR result
Review the 14 PSMs built into the Tesseract OCR engine
See examples of each of the 14 PSMs in action
Discover my tips, suggestions, and best practices when using these PSMs

To learn how to improve your OCR results with PSM, just keep reading.

Tesseract Page Segmentation Modes

In the first part of this tutorial, we’ll discuss what page segmentation modes (PSMs) are, why they are important, and how they can dramatically impact our OCR accuracy.

From there, we’ll review our project directory structure for this tutorial, followed by exploring each of the 14 PSMs built into the Tesseract OCR engine.

The tutorial will conclude with a discussion of my tips, suggestions, and best practices when applying various PSMs with Tesseract.

What Are Page Segmentation Modes?

The number one reason I see budding OCR practitioners fail to obtain the correct OCR result is that they are using the incorrect page segmentation mode. To quote the Tesseract documentation, by default, Tesseract expects a page of text when it segments an input image (Improving the quality of the output).

That “page of text” assumption is so incredibly important. If you’re OCR’ing a scanned chapter from a book, the default Tesseract PSM may work well for you. But if you’re trying to OCR only a single line, a single word, or maybe even a single character, then this default mode will result in either an empty string or nonsensical results.

Think of Tesseract as your big brother growing up as a child. He genuinely cares for you and wants to see you happy — but at the same time, he has no problem pushing you down in the sandbox, leaving you there with a mouthful of grit, and not offering a helping hand to get back up.

Part of me thinks that this is a user experience (UX) problem that could potentially be improved by the Tesseract development team. Including just a short message saying:

Not getting the correct OCR result? Try using different page segmentation modes. You can see all PSM modes by running tesseract --help-extra.

Perhaps they could even link to a tutorial that explains each of the PSMs in easy to understand language. From there the end user would be more successful in applying the Tesseract OCR engine to their own projects.

But until that time comes, Tesseract’s page segmentation modes, despite being a critical aspect of obtaining high OCR accuracy, are somewhat of a mystery to many new OCR practitioners. They don’t know what they are, how to use them, why they are important — many don’t even know where to find the various page segmentation modes!

To list out the 14 PSMs in Tesseract, just supply the --help-psm argument to the tesseract binary:

$ tesseract --help-psm
Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

You can then apply a given PSM by supplying the corresponding integer value for the --psm argument.

For example, suppose we have an input image named input.png and we want to use PSM 7, which is used to OCR a single line of text. Our call to tesseract would thus look like this:

$ tesseract input.png stdout --psm 7

In the rest of this tutorial, we’ll review each of the 14 Tesseract PSMs. You’ll gain hands-on experience using each of them and will come out of this tutorial feeling much more confident in your ability to correctly OCR an image using the Tesseract OCR engine.

Project Structure

Unlike most tutorials, which include one or more Python scripts to review, this tutorial is one of the very few that does not utilize Python. Instead, we’ll be using the tesseract binary to explore each of the page segmentation modes.

Keep in mind that this tutorial aims to understand the PSMs and gain first-hand experience working with them. Once you have a strong understanding of them, that knowledge directly transfers over to Python. To set a PSM in Python, it’s as easy as setting an options variable — it couldn’t be easier, quite literally taking only a couple of keystrokes!

Therefore, we’re going to first start with the tesseract binary first.

With that said, let’s take a look at our project directory structure:

|-- psm-0
|   |-- han_script.jpg
|   |-- normal.png
|   |-- rotated_90.png
|-- psm-1
|   |-- example.png
|-- psm-3
|   |-- example.png
|-- psm-4
|   |-- receipt.png
|-- psm-5
|   |-- receipt_rotated.png
|-- psm-6
|   |-- sherlock_holmes.png
|-- psm-7
|   |-- license_plate.png
|-- psm-8
|   |-- designer.png
|-- psm-9
|   |-- circle.png
|   |-- circular.png
|-- psm-10
|   |-- number.png
|-- psm-11
|   |-- website_menu.png
|-- psm-13
|   |-- the_old_engine.png

As you can see, we have 13 directories, each with an example image inside that will highlight when to use that particular PSM.

But wait … didn’t earlier in the tutorial I say that Tesseract has 14, not 13, page segmentation modes? If so, why are there not 14 directories?

The answer is simple — one of the PSMs is not implemented in Tesseract. It’s essentially just a placeholder for future potential implementation.

Let’s get started exploring page segmentation modes with Tesseract!

PSM 0. Orientation and Script Detection Only

The --psm 0 mode does not perform OCR, at least in terms of how we think of it in the context of this book. When we think of OCR, we think of a piece of software that is able to localize the characters in an input image, recognize them, and then convert them to a machine-encoded string.

Orientation and script detection (OSD) examines the input image, but instead of returning the actual OCR’d text, OSD returns two values:

How the page is oriented, in degrees, where angle = {0, 90, 180, 270}
The confidence of the script (i.e., graphics signs/writing system), such as Latin, Han, Cyrillic, etc.

OSD is best seen with an example. Take a look at Figure 1, where we have three example images. The first one is a paragraph of text from my first book, Practical Python and OpenCV. The second is the same paragraph of text, this time rotated 90^◦ clockwise, and the final image contains Han script.

**Figure 1.** *Top-left* is the text from my first book, *Practical Python and OpenCV*. *Top-right* is the same paragraph of text, this time rotated 90^◦clockwise. *Bottom* image contains Han script.

Let’s start by applying tesseract to the normal.png image which is shown top-left in Figure 1:

$ tesseract normal.png stdout --psm 0
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 11.34
Script: Latin
Script confidence: 8.10

Here, we can see that Tesseract has determined that this input image is unrotated (i.e., 0^◦) and that the script is correctly detected as Latin.

Let’s now take that same image and rotate it 90^◦ which is shown in Figure 1 (top-right):

$ tesseract rotated_90.png stdout --psm 0
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 5.49
Script: Latin
Script confidence: 4.76

Tesseract has determined that the input image has been rotated 90^◦, and in order to correct the image, we need to rotate it 270^◦. Again, the script is correctly detected as Latin.

For a final example, we’ll now apply Tesseract OSD to the Han script image (Figure 1, bottom):

$ tesseract han_script.jpg stdout --psm 0
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 2.94
Script: Han
Script confidence: 1.43

Notice how the script has been labeled correctly as Han.

You can think of the --psm 0 mode as a “meta information” mode where Tesseract provides you with just the script and rotation of the input image — when applying this mode, Tesseract does not OCR the actual text and return it for you.

If you need just the meta information on the text, using --psm 0 is the right mode for you; however, many times we need the OCR’d text itself, in which case you should use the other PSMs covered in this tutorial.

PSM 1. Automatic Page Segmentation with OSD

Tesseract’s documentation and examples on --psm 1 is not complete so it made it hard to provide detailed research and examples on this method. My understanding of --psm 1 is that:

Automatic page segmentation for OCR should be performed
And that OSD information should be inferred and utilized in the OCR process

However, if we take the images in Figure 1 and pass them through tesseract using this mode, you can see that there is no OSD information:

$ tesseract example.png stdout --psm 1
Our last argument is how we want to approximate the
contour. We use cv2.CHAIN_APPROX_SIMPLE to compress
horizontal, vertical, and diagonal segments into their end-
points only. This saves both computation and memory. If
we wanted all the points along the contour, without com-
pression, we can pass in cv2. CHAIN_APPROX_NONE; however,
be very sparing when using this function. Retrieving all
points along a contour is often unnecessary and is wasteful
of resources.

This result makes me think that Tesseract must be performing OSD internally but not returning it to the user. Based on my experimentation and experiences with --psm 1, I think it may be that --psm 2 is not fully working/implemented.

Simply put: in all my experiments, I could not find a situation where --psm 1 obtained a result that the other PSMs could not. If I find such a situation in the future, I will update this section and provide a concrete example. But until then, I don’t think it’s worth applying --psm 1 in your projects.

PSM 2. Automatic Page Segmentation, But No OSD, or OCR

The --psm 2 mode is not implemented in Tesseract. You can verify this by running the tesseract --help-psm command looking at the output for mode two:

$ tesseract --help-psm
Page segmentation modes:
...
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
...

It is unclear if or when Tesseract will implement this mode, but for the time being, you can safely ignore it.

PSM 3. Fully Automatic Page Segmentation, But No OSD

PSM 3 is the default behavior of Tesseract. If you run the tesseract binary without explicitly supplying a --psm, then a --psm 3 will be used.

Inside this mode, Tesseract will:

Automatically attempt to segment the text, treating it as a proper “page” of text with multiple words, multiple lines, multiple paragraphs, etc.
After segmentation, Tesseract will OCR the text and return it to you

However, it’s important to note that Tesseract will not perform any orientation/script detection. To gather that information, you will need to run tesseract twice:

Once with the --psm 0 mode to gather OSD information
And then again with --psm 3 to OCR the actual text

The following example shows how to take a paragraph of text and apply both OSD and OCR in two separate commands:

$ tesseract example.png stdout --psm 0
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 11.34
Script: Latin
Script confidence: 8.10

$ tesseract example.png stdout --psm 3
Our last argument is how we want to approximate the
contour. We use cv2.CHAIN_APPROX_SIMPLE to compress
horizontal, vertical, and diagonal segments into their end-
points only. This saves both computation and memory. If
we wanted all the points along the contour, without com-
pression, we can pass in cv2. CHAIN_APPROX_NONE; however,
be very sparing when using this function. Retrieving all
points along a contour is often unnecessary and is wasteful
of resources.

Again, you can skip the first command if you only want the OCR’d text.

PSM 4. Assume a Single Column of Text of Variable Sizes

A good example of using --psm 4 is when you need to OCR column data and require text to be concatenated row-wise (e.g., the data you would find in a spreadsheet, table, or receipt).

For example, consider Figure 2, which is a receipt from the grocery store. Let’s try to OCR this image using the default (--psm 3) mode:

**Figure 2.** Whole Foods Market receipt we will OCR.

$ tesseract receipt.png stdout
ee

OLE
YOSDS:

cea eam

WHOLE FOODS MARKET - WESTPORT,CT 06880
399 POST RD WEST - (203) 227-6858

365
365
365
365

BACON LS
BACON LS
BACON LS
BACON LS

BROTH CHIC
FLOUR ALMOND
CHKN BRST BNLSS SK

HEAVY CREAM
BALSMC REDUCT

BEEF GRND 85/15
JUICE COF CASHEW C

DOCS PINT ORGANIC

HNY ALMOND BUTTER

wee TAX

.00

BAL

NP 4.99
NP 4.99
NP 4.99
NP 4.99
NP 2.19
NP 91.99
NP 18.80
NP 3.39
NP. 6.49
NP 5.04
ne £8.99
np £14.49
NP 9.99

101.33

aaa AAAATAT

ie

That didn’t work out so well. Using the default --psm 3 mode, Tesseract cannot infer that we are looking at column data and that text along the same row should be associated together.

To remedy that problem, we can use the --psm 4 mode:

$ tesseract receipt.png stdout --psm 4
WHOLE
FOODS.

cea eam

WHOLE FOODS MARKET - WESTPORT,CT 06880
399 POST RD WEST - (203) 227-6858

365 BACONLS NP 4.99

365 BACON LS NP 4.99

365 BACONLS NP 4.99

365  BACONLS NP 4,99
BROTH CHIC NP 2.19

FLOUR ALMOND NP 91.99

CHKN BRST BNLSS SK NP 18.80
HEAVY CREAM NP 3.39

BALSMC REDUCT NP 6.49

BEEF GRND 85/15 NP 6.04
JUICE COF CASHEW C NP £8.99
DOCS PINT ORGANIC NP 14,49
HNY ALMOND BUTTER NP 9,99
wee TAX = 00 BAL 101.33

As you can see, the results here are far better. Tesseract is able to understand that text should be grouped row-wise, thereby allowing us to OCR the items in the receipt.

PSM 5. Assume a Single Uniform Block of Vertically Aligned Text

The documentation surrounding --psm 5 is a bit confusing as it states that we wish to OCR a single block of vertically aligned text. The problem is there is a bit of ambiguity as to what “vertically aligned text” actually means (as there is no Tesseract example showing an example of vertically aligned text).

To me, vertically aligned text is either placed at the top of the page, center of the page, bottom of the page. In Figure 3, an example of text that is top-aligned (left), middle-aligned (center), and bottom-aligned (right).

**Figure 3.** *Left* is a top-aligned text, *center* is middle-aligned text, and *right* is bottom-aligned text.

However, in my own experimentation, I found that --psm 5 works similar to --psm 4, only for rotated images. Consider Figure 4, where we have a receipt rotated 90^◦clockwise to see such an example in action.

**Figure 4.** A receipt from Whole Foods rotated 90^◦.

Let’s first apply the default --psm 3:

$ tesseract receipt_rotated.png stdout
WHOLE
FOODS.
(mM AR K E T)

WHOLE FOODS MARKET - WESTPORT,CT 06880
399 POST RD WEST - (203) 227-6858

365 BACON LS

365 BACON LS

365 BACON LS

365 BACON LS

BROTH CHIC

FLOUR ALMOND

CHKN BRST BNLSS SK

HEAVY CREAM

BALSMC REDUCT

BEEF GRND 85/15

JUICE COF CASHEW C

DOCS PINT ORGANIC

HNY ALMOND BUTTER
eee TAX  =.00 BAL

ee

NP 4.99
NP 4.99
NP 4,99
NP 4.99
NP 2.19
NP 1.99
NP 18.80
NP 3.39
NP 6.49
NP 8.04
NP £8.99
np "14.49
NP 9.99

101.33

 

aAnMAIATAAT AAA ATAT

ie

Again, our results are not good here. While Tesseract can correct for rotation, we don’t have our row-wise elements of the receipt.

To resolve the problem, we can use --psm 5:

$ tesseract receipt_rotated.png stdout --psm 5
Cea a amD

WHOLE FOODS MARKET - WESTPORT, CT 06880

399 POST RD WEST - (203) 227-6858
* 365 BACONLS NP 4.99 F
* 365 BACON LS NP 4.99 F
* 365 BACONLS NP 4,99 F*
* 365  BACONLS NP 4.99 F
* BROTH CHIC NP 2.19 F
* FLOUR ALMOND NP 1.99 F
* CHKN BRST BNLSS SK NP 18.80 F
* HEAVY CREAM NP 3.39 F
* BALSMC REDUCT NP 6.49 F
* BEEF GRND 85/1§ NP {6.04 F
* JUICE COF CASHEW C NP [2.99 F
*, DOCS PINT ORGANIC NP "14.49 F
* HNY ALMOND BUTTER NP 9,99

wee TAX = 00 BAL 101.33

Our OCR results are now far better on the rotated receipt image.

PSM 6. Assume a Single Uniform Block of Text

I like to use --psm 6 for OCR’ing pages of simple books (e.g., a paperback novel). Pages in books tend to use a single, consistent font throughout the entirety of the book. Similarly, these books follow a simplistic page structure, which is easy for Tesseract to parse and understand.

The keyword here is uniform text, meaning that the text is a single font face without any variation.

Below shows the results of applying Tesseract to a single uniform block of text from a Sherlock Holmes novel (Figure 5) with the default --psm 3 mode:

$ tesseract sherlock_holmes.png stdout
CHAPTER ONE

we
Mr. Sherlock Holmes

M: Sherlock Holmes, who was usually very late in the morn-
ings, save upon those not infrequent occasions when he
was up all night, was seated at the breakfast table. I stood upon
the hearth-rug and picked up the stick which our visitor had left
behind him the night before. It was a fine, thick piece of wood,
bulbous-headed, of the sort which is known as a “Penang lawyer.”
Just under the head was a broad silver band nearly an inch across.
“To James Mortimer, M.R.C.S., from his friends of the C.C.H.,”
was engraved upon it, with the date “1884.” It was just such a
stick as the old-fashioned family practitioner used to carry--dig-
nified, solid, and reassuring.

“Well, Watson, what do you make of it2”

Holmes w:

sitting with his back to me, and I had given him no
sign of my occupation.

“How did you know what I was doing? I believe you have eyes in
the back of your head.”

“L have, at least, a well-polished, silver-plated coffee-pot in front
of me,” said he. “But, tell me, Watson, what do you make of our
visitor's stick? Since we have been so unfortunate as to miss him

and have no notion of his errand, this

accidental souvenir be-
comes of importance, Let me hear you reconstruct the man by an
examination of it.”

**Figure 5.** A page from the book, ***The Hound of the Baskervilles & the Valley of Fear*** (p. 6).

To save space, I removed many of the newlines from the above output. If you run the above command in your own system you will see that the output is far messier than what it appears in the text.

By using the --psm 6 mode, we are better able to OCR this big block of text:

$ tesseract sherlock_holmes.png stdout --psm 6
CHAPTER ONE
SS
Mr. Sherlock Holmes
M Sherlock Holmes, who was usually very late in the morn
ings, save upon those not infrequent occasions when he

was up all night, was seated at the breakfast table. I stood upon
the hearth-rug and picked up the stick which our visitor had left
behind him the night before. It was a fine, thick piece of wood,
bulbous-headed, of the sort which is known as a “Penang lawyer.”
Just under the head was a broad silver band nearly an inch across.
“To James Mortimer, M.R.C.S., from his friends of the C.C.H.,”
was engraved upon it, with the date “1884.” It was just such a
stick as the old-fashioned family practitioner used to carry--dig-
nified, solid, and reassuring.

“Well, Watson, what do you make of it2”

Holmes was sitting with his back to me, and I had given him no
sign of my occupation.

“How did you know what I was doing? I believe you have eyes in
the back of your head.”

“T have, at least, a well-polished, silver-plated coflee-pot in front
of me,” said he. “But, tell me, Watson, what do you make of our
visitor’s stick? Since we have been so unfortunate as to miss him
and have no notion of his errand, this accidental souvenir be-
comes of importance. Let me hear you reconstruct the man by an
examination of it.”

6

There are far less mistakes in this output, thus demonstrating how --psm 6 can be used for OCR’ing uniform blocks of text.

PSM 7. Treat the Image as a Single Text Line

The --psm 7 mode should be utilized when you are working with a single line of uniform text. For example, let’s suppose we are building an automatic license/number plate recognition (ANPR) system and need to OCR the license plate in Figure 6.

**Figure 6.** A license plate we will OCR.

Let’s start by using the default --psm 3 mode:

$ tesseract license_plate.png stdout
Estimating resolution as 288
Empty page!!
Estimating resolution as 288
Empty page!!

The default Tesseract mode balks, totally unable to OCR the license plate.

However, if we use --psm 7 and tell Tesseract to treat the input as a single line of uniform text, we are able to obtain the correct result:

$ tesseract license_plate.png stdout --psm 7
MHOZDW8351

PSM 8. Treat the Image as a Single Word

If you have a single word of uniform text, you should consider using --psm 8. A typical use case would be:

Applying text detection to an image
Looping over all text ROIs
Extracting them
Passing each individual text ROI through Tesseract for OCR

For example, let’s consider Figure 7, which is a photo of a storefront. We can try to OCR this image using the default --psm 3 mode:

$ tesseract designer.png stdout
MS atts

But unfortunately, all we get is gibberish out.

To resolve the issue, we can use --psm 8, telling Tesseract to bypass any page segmentation methods and instead just treat this image as a single word:

$ tesseract designer.png stdout --psm 8
Designer

Sure enough, --psm 8 is able to resolve the issue!

Furthermore, you may find situations where --psm 7 and --psm 8 can be used interchangeably — both will function similarly as we are either looking at a single line or a single word, respectively.

PSM 9. Treat the Image as a Single Word in a Circle

I’ve played around with the --psm 9 mode for hours, and truly, I cannot figure out what it does. I’ve searched Google and read the Tesseract documentation, but come up empty handed — I cannot find a single concrete example on what the circular PSM is intended to do.

To me, there are two ways to interpret this parameter (Figure 8):

The text is actually inside the circle (left)
The text is wrapped around an invisible circular/arc region (right)

**Figure 8.** *Left:* text inside a circle. *Right:* text wrapped around an invisible circle/arc.

The second option seems much more likely to me, but I could not make this parameter work no matter how much I tried. I think it’s safe to assume that this parameter is rarely, if ever, used — and furthermore, the implementation may be a bit buggy. I suggest avoiding this PSM if you can.

PSM 10. Treat the Image as a Single Character

Treating an image as a single character should be done when you have already extracted each individual character from the image.

Going back to our ANPR example, let’s say you’ve located the license plate in an input image and then extracted each individual character on the license plate — you can then pass each of these characters through Tesseract with --psm 10 to OCR them.

Figure 9 shows an example of the digit 2. Let’s try to OCR it with the default --psm 3:

**Figure 9.** The digit “2,” surrounded by a solid black background.

$ tesseract number.png stdout
Estimating resolution as 1388
Empty page!!
Estimating resolution as 1388
Empty page!!

Tesseract attempts to apply automatic page segmentation methods, but due to the fact that there is no actual “page” of text, the default --psm 3 fails and returns an empty string.

We can resolve the matter by treating the input image as a single character via --psm 10:

$ tesseract number.png stdout --psm 10
2

Sure enough, --psm 10 resolves the matter!

PSM 11. Sparse Text: Find as Much Text as Possible in No Particular Order

Detecting sparse text can be useful when there is lots of text in an image you need to extract. When using this mode, you typically don’t care about the order/grouping of text, but rather the text itself.

This information is useful if you’re performing Information Retrieval (i.e., text search engine) by OCR’ing all the text you can find in a dataset of images, and then building a text-based search engine via Information Retrieval algorithms (tf-idf, inverted indexes, etc.).

Figure 10 shows an example of sparse text. Here, we have a screenshot from my “Get Started” page on PyImageSearch. This page provides tutorials grouped by popular computer vision, deep learning, and OpenCV topics.

**Figure 10.** Sparse text from PyImageSearch.com.

Let’s try to OCR this list of topics using the default --psm 3:

$ tesseract website_menu.png stdout
How Do | Get Started?
Deep Learning
Face Applications

Optical Character Recognition (OCR)

Object Detection
Object Tracking

Instance Segmentation and Semantic

Segmentation

Embedded and lol Computer Vision

Computer Vision on the Raspberry Pi

Medical Computer Vision
Working with Video
Image Search Engines

Interviews, Case Studies, and Success Stories

My Books and Courses

While Tesseract can OCR the text, there are several incorrect line groupings and additional whitespace. The additional whitespace and newlines are a result of how Tesseract’s automatic page segmentation algorithm works — here it’s trying to infer document structure when in fact there is no document structure.

To get around this issue, we can treat the input image as sparse text with --psm 11:

$ tesseract website_menu.png stdout --psm 11
How Do | Get Started?

Deep Learning

Face Applications

Optical Character Recognition (OCR)

Object Detection

Object Tracking

Instance Segmentation and Semantic

Segmentation

Embedded and lol Computer Vision

Computer Vision on the Raspberry Pi

Medical Computer Vision

Working with Video

Image Search Engines

Interviews, Case Studies, and Success Stories

My Books and Courses

This time the results from Tesseract are far better.

PSM 12. Sparse Text with OSD

The --psm 12 mode is essentially identical to --psm 11, but now adds in OSD (similar to --psm 0).

That said, I had a lot of problems getting this mode to work properly and could not find a practical example where the results meaningfully differed from --psm 11.

I feel it’s necessary to say that --psm 12 exists; however, in practice, you should use a combination of --psm 0 (for OSD) followed by --psm 11 (for OCR’ing sparse text) if you want to replicate the intended behavior of --psm 12.

PSM 13. Raw Line: Treat the Image as a Single Text Line, Bypassing Hacks That Are Tesseract-Specific

There are times that OSD, segmentation, and other internal Tesseract-specific preprocessing techniques will hurt OCR performance, either by:

Reducing accuracy
No text being detected at all

Typically this will happen if a piece of text is closely cropped, the text is computer generated/stylized in some manner, or it’s a font face Tesseract may not automatically recognize. When this happens, consider applying --psm 13 as a “last resort.”

To see this method in action, consider Figure 11 which has the text “The Old Engine” typed in a stylized font face, similar to that of an old-time newspaper.

**Figure 11.** Text as it would have appeared in an old newspaper.

Let’s try to OCR this image using the default --psm 3:

$ tesseract the_old_engine.png stdout
Warning. Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 491
Estimating resolution as 491

Tesseract fails to OCR the image, returning an empty string.

Let’s now use --psm 13, bypassing all page segmentation algorithms and Tesseract preprocessing functions, thereby treating the image as a single raw line of text:

$ tesseract the_old_engine.png stdout --psm 13
Warning. Invalid resolution 0 dpi. Using 70 instead.
THE OLD ENGINE.

This time we are able to correctly OCR the text with --psm 13!

Using --psm 13 can be a bit of a hack at times so try exhausting other page segmentation modes first.

Tips, Suggestions, and Best Practices for PSMs

Getting used to page segmentation modes in Tesseract takes practice — there is no other way around that. I strongly suggest that you:

Read this tutorial multiple times
Run the examples included in the text for this tutorial
And then start practicing with your own images

Tesseract, unfortunately, doesn’t include much documentation on their PSMs, nor are there specific concrete examples that are easily referred to. This tutorial serves as my best attempt to provide you with as much information on PSMs as I can, including practical, real-world examples of when you would want to use each PSM.

That said, here are some tips and recommendations to help you get up and running with PSMs quickly:

Always start with the default --psm 3 to see what Tesseract spits out. In the best-case scenario, the OCR results are accurate, and you’re done. In the worst case, you now have a baseline to beat.
While I’ve mentioned that --psm 13 is a “last resort” type of mode, I would recommend applying it second as well. This mode works surprisingly well, especially if you’ve already preprocessed your image and binarized your text. If --psm 13 works you can either stop or instead focus your efforts on modes 4-8 as it’s likely one of them will work in place of 13.
Next, applying --psm 0 to verify that the rotation and script are being properly detected. If they aren’t, it’s unreasonable to expect Tesseract to perform well on an image where it cannot properly detect the rotation angle and script/writing system.
Provided the script and angle are being detected properly, you need to follow the guidelines in this tutorial. Specifically, you should focus on PSMs 4-8, 10, and 11. Avoid PSMs 1, 2, 9, and 12 unless you think there is a specific use case for them.
Finally, you may want to consider brute-forcing it. Try each of the modes sequentially 1-13. This is a “throw spaghetti at the wall and see what sticks” type of hack, but you get lucky every now and then.

If you find that Tesseract isn’t giving you the accuracy you want regardless of what page segmentation mode you’re using, don’t panic or get frustrated — it’s all part of the process. OCR is part art, part science. Leonardo da Vinci wasn’t painting the Mona Lisa right out of the gate. It was an acquired skill that took practice.

We’re just scratching the surface of what’s possible with OCR. Future tutorials will take a deeper dive and help you to better hone this art. With practice, like Figure 12 shows, you too will be hitting bullseyes with your OCR project.

**Figure 12.** Improve the accuracy of your OCR using PSM.

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned about Tesseract’s 14 page segmentation modes (PSMs). Applying the correct PSM is absolutely critical for correctly OCR’ing an input image.

Simply put, your choice in PSM can mean the difference between an image accurately OCR’d versus getting either no result or a nonsensical result back from Tesseract.

Each of the 14 PSMs inside of Tesseract makes an assumption on your input image, such as a block of text (e.g., a scanned chapter), a single line of text (perhaps a single sentence from a chapter), or even a single word (e.g., a license/number plate).

The key to obtaining accurate OCR results is to:

Use OpenCV (or your image processing library of your choice) to clean up your input image, remove noise, and potentially segment text from the background
Apply Tesseract, taking care to use the correct PSM that corresponds to your output from any preprocessing

For example, if you are building an automatic license plate recognizer (which we’ll do in a future tutorial), then we would utilize OpenCV to first detect the license plate in the image. This can be accomplished using either image processing techniques or a dedicated object detector such as HOG + Linear SVM, Faster R-CNN, SSD, YOLO, etc.

Once we have the license plate detected, we would segment the characters from the plate, such that the characters appear as white (foreground) against a black background.

The final step would be to take the binarized license plate characters and pass them through Tesseract for OCR. Our choice in PSM will be the difference between correct and incorrect results.

Since a license plate can be seen as either a “single line of text” or a “single word,” we would want to try --psm 7 or --psm 8. A --psm 13 may also work as well, but using the default (--psm 3) is unlikely to work here since we’ve already processed our image quality heavily.

I would highly suggest that you spend a fair amount of time exploring all the examples in this tutorial, and even going back and reading it again — there’s so much knowledge to be gained from Tesseract’s page segmentation modes.

From there, start applying the various PSMs to your own images. Note your results along the way:

Are they what you expected?
Did you obtain the correct result?
Did Tesseract fail to OCR the image, returning an empty string?
Did Tesseract return completely nonsensical results?

The more practice you have with PSMs, the more experience you gain, which will make it that much easier for you to correctly apply OCR to your own projects.

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.

Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy

Learning Objectives

Tesseract Page Segmentation Modes

What Are Page Segmentation Modes?

Project Structure

PSM 0. Orientation and Script Detection Only

PSM 1. Automatic Page Segmentation with OSD

PSM 2. Automatic Page Segmentation, But No OSD, or OCR

PSM 3. Fully Automatic Page Segmentation, But No OSD

PSM 4. Assume a Single Column of Text of Variable Sizes

PSM 5. Assume a Single Uniform Block of Vertically Aligned Text

PSM 6. Assume a Single Uniform Block of Text

PSM 7. Treat the Image as a Single Text Line

PSM 8. Treat the Image as a Single Word

PSM 9. Treat the Image as a Single Word in a Circle

PSM 10. Treat the Image as a Single Character

PSM 11. Sparse Text: Find as Much Text as Possible in No Particular Order

PSM 12. Sparse Text with OSD

PSM 13. Raw Line: Treat the Image as a Single Text Line, Bypassing Hacks That Are Tesseract-Specific

Tips, Suggestions, and Best Practices for PSMs

What's next? I recommend PyImageSearch University.

Summary

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

Comment section

It’s time. The PyImageSearch Gurus Kickstarter is officially LIVE.

Histogram of Oriented Gradients and Object Detection

My Experience with CUDAMat, Deep Belief Networks, and Python

Topics

Books & Courses

PyImageSearch

Tesseract Page Segmentation Modes (PSMs) Explained: How to Improve Your OCR Accuracy

Learning Objectives

Tesseract Page Segmentation Modes

What Are Page Segmentation Modes?

Project Structure

PSM 0. Orientation and Script Detection Only

PSM 1. Automatic Page Segmentation with OSD

PSM 2. Automatic Page Segmentation, But No OSD, or OCR

PSM 3. Fully Automatic Page Segmentation, But No OSD

PSM 4. Assume a Single Column of Text of Variable Sizes

PSM 5. Assume a Single Uniform Block of Vertically Aligned Text

PSM 6. Assume a Single Uniform Block of Text

PSM 7. Treat the Image as a Single Text Line

PSM 8. Treat the Image as a Single Word

PSM 9. Treat the Image as a Single Word in a Circle

PSM 10. Treat the Image as a Single Character

PSM 11. Sparse Text: Find as Much Text as Possible in No Particular Order

PSM 12. Sparse Text with OSD

PSM 13. Raw Line: Treat the Image as a Single Text Line, Bypassing Hacks That Are Tesseract-Specific

Tips, Suggestions, and Best Practices for PSMs

What's next? I recommend PyImageSearch University.

Summary

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

About the Author

Get Started: DCGAN for Fashion-MNIST

Computer Graphics and Deep Learning with NeRF using TensorFlow and Keras: Part 2

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch