Fast, optimized 'for' pixel loops with OpenCV and Python

Have you ever had to loop over an image pixel-by-pixel using Python and OpenCV?

If so, you know that it’s a painfully slow operation even though images are internally represented by NumPy arrays.

So why is this? Why are individual pixel accesses in NumPy so slow?

You see, NumPy operations are implemented in C. This allows us to avoid the expensive overhead of Python loops. When using NumPy, it’s not uncommon to see performance gains by multiple orders of magnitude (as compared to standard Python lists). In general, if you can frame your problem as a vector operation using NumPy arrays, you’ll be able to benefit from the speed boosts.

The problem here is that accessing individual pixels is not a vector operation. Therefore, even though NumPy is arguably the best numerical processing library available for nearly any programming language, when combined with Python’s for loops + individual element accesses, we lose much of the performance gains.

Along your computer vision journey, there will be algorithms you may need to implement that will require you to perform these manual for loops. Whether you need to implement Local Binary Patterns from scratch, create a custom convolution algorithm, or simply cannot rely on vectorized operations, you’ll need to understand how to optimize for loops using OpenCV and Python.

In the remainder of this blog post I’ll discuss how we can create super fast `for` pixel loops using Python and OpenCV — to learn more, just keep reading.

Looking for the source code to this post?

Super fast ‘for’ pixel loops with OpenCV and Python

A few weeks ago I was reading Satya Mallick’s excellent LearnOpenCV blog. His latest article discussed a special function named forEach . The forEach function allows you to utilize all cores on your machine when applying a function to every pixel in an image.

Distributing the computation across multiple cores resulted in a ~5x speedup.

But what about Python?

Is there a forEach OpenCV function exposed to the Python bindings?

Unfortunately, no, there isn’t — instead, we need to create our own forEach-like method. Luckily this isn’t as hard as it sounds.

I’ve been using this exact method to speed up for pixel loops using OpenCV and Python for years — and today I’m happy to share the implementation with you.

In the first part of this blog post, we’ll discuss Cython and how it can be used to speed up operations inside Python.

From there, I’ll provide a Jupyter Notebook detailing how to implement our faster pixel loops with OpenCV and Python.

What is Cython? And how will it speed up our pixel loops?

We all know that Python, being a high-level language, provides a lot of abstraction and convenience — that’s the main reason why it is so great for image processing. What comes with this typically is slower speeds than a language which is closer to assembly like C.

You can think of Cython as a combination of Python with traces of C which provides C-like performance.

Cython differs from Python in that the code is translated to C using the CPython interpreter. This allows the script to be written mostly in Python along with some decorators and type declarations.

So when should you take advantage of Cython in image processing?

Probably the best time to use Cython would be when you find yourself looping pixel-by-pixel in an image. You see, OpenCV and scikit-image are already optimized — a call to a function such as template-matching, like we did when we OCR’d bank checks and credit cards, has been optimized in underlying C. There is a tiny amount of overhead in the function call, but that’s it. You would never write your own template-matching algorithm in Python — it just wouldn’t be fast enough.

If you find yourself writing any custom image processing functions in Python which analyze or modify images pixel-by-pixel (perhaps with a kernel) it is extremely likely that your function won’t run as fast as possible.

In fact, it will run very slowly.

However, if you take advantage of Cython, which compiles with major C/C++ compilers, you can achieve significant performance gains as we will demonstrate today.

Implementing faster pixel loops with OpenCV and Python

A few years ago I was struggling to come across a method to help improve the speed of accessing individual pixels in a NumPy array using Python and OpenCV.

Everything I tried didn’t work — I resorted to framing my problem as complicated, hard to follow vector operations on NumPy arrays to achieve my desired speed increase. But there will still times where looping over each individual pixel in an image was simply unavoidable.

It wasn’t until I found Matthew Perry’s excellent blog post on parallelizing NumPy array loops with Cython was I able to find a solution and adapt it to working with images.

In this section we’ll review a Jupyter Notebook I put together to help you
learn how to implement faster pixel-by-pixel loops with OpenCV and Python.

But before we get started, ensure you install NumPy, Cython, matplotlib, and Jupyter:

$ workon cv
$ pip install numpy
$ pip install cython
$ pip install matplotlib
$ pip install jupyter

Note: I recommend that you install these into your virtual environment for computer vision development with Python. If you have followed an install tutorial on this site, you may have a virtual environment called cv. Before issuing the above commands (Lines 2-5), simply enter workon cv in your shell (PyImageSearch Gurus members may install into their gurus environment if they choose to do so). If you don’t already have a virtual environment, create one and then symbolic-link your cv2.so bindings following instructions available here.

From there you can launch a Jupyter Notebook in your environment and begin entering the code from this post:

$ jupyter notebook

Alternatively, use the “Downloads” section of this blog post to follow along with the Jupyter Notebook I have created for you (highly recommended). If you’re using the notebook from the Downloads section, ensure to change your working directory to where the notebook lives on your disk.

Regardless of whether you have chosen to use the pre-baked notebook or follow along from scratch, the remainder of this section will discuss how to boost pixel-by-pixel loops with OpenCV and Python by over two orders of magnitude.

In this example, we’ll be implementing a simple threshold function. For each pixel in the image, we’ll check to see if the input pixel is greater than or equal to some threshold value T .

If the pixel passes the threshold test, we’ll set the output value to 255. Otherwise, the output pixel will be set to 0.

Using this function we’ll be able to binarize our input image, very similar to how OpenCV and scikit-image’s built-in thresholding methods work.

We’ll be using a simple threshold function as an example as it will enable us to (1) not focus on the actual image processing code but rather (2) learn how to obtain speed boosts when manually looping over every pixel in an image.

To compare “naïve” pixel loops with our faster Cython loops, take a look at the notebook below:

# import the necessary packages
import matplotlib.pyplot as plt
import cv2

%matplotlib inline

Note: When your notebook is launched, I suggest you click “View” > “Toggle Line Numbers” from the menubar — in Jupyter, each In [ ] and Out [ ] block restarts numbering from 1, so you’ll see those same numbers reflected in the code blocks here. If you are using the notebook from the Downloads section of this post, feel free to execute all blocks by clicking “Cell” > “Run All”.

Inside In [1] above, on Lines 2-3 we import our necessary packages. Line 5 simply specifies that we want our matplotlib plots to show up in-line within the notebook.

Next, we’ll load and preprocess an example image:

# load the original image, convert it to grayscale, and display
# it inline
image = cv2.imread("example.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
plt.imshow(image, cmap="gray")

On Line 3 of In [2] , we load example.png followed by converting it to grayscale on Line 4.

Then we show the graphic using matplotlib (Line 5).

In-line output of the command is shown below:

**Figure 1:** Our input image (400×400 pixels) that we will be thresholding.

Next, we will load Cython:

%load_ext cython

Within In [3] above, we load Cython.

Now that we have Cython in memory, we will instruct Cython to show which lines can be optimized in our custom thresholding function:

%%cython -a
def threshold_slow(T, image):
    # grab the image dimensions
    h = image.shape[0]
    w = image.shape[1]
    
    # loop over the image, pixel by pixel
    for y in range(0, h):
        for x in range(0, w):
            # threshold the pixel
            image[y, x] = 255 if image[y, x] >= T else 0
            
    # return the thresholded image
    return image

Line 1 in In [3] above tells the interpreter that we want Cython to determine which lines can be optimized.

Then, we define our function, threshold_slow . Our function requires two arguments:

T : the threshold
image : the input image

On Lines 5 and 6 we extract the height and width from the image’s .shape object. We will need w and h such that we can loop over the image pixel-by-pixel.

Lines 9 and 10 begin a nested for loop where we’re looping top-to-bottom and left-to-right up until our height and width. Later, we will see that there is room for optimization in this loop.

On Line 12, we perform our in-place binary threshold of each pixel using the ternary operator — if the pixel is >= T we set the pixel to white (255) and otherwise, we set the pixel to black (0).

Finally, we return our resulting image .

In Jupyter (assuming you execute the above In [ ] blocks), you’ll see the following output:

 01: 
+02: def threshold_slow(T, image):
 03:     # grab the image dimensions
+04:     h = image.shape[0]
+05:     w = image.shape[1]
 06: 
 07:     # loop over the image, pixel by pixel
+08:     for y in range(0, h):
+09:         for x in range(0, w):
 10:             # threshold the pixel
+11:             image[y, x] = 255 if image[y, x] >= T else 0
 12: 
 13:     # return the thresholded image
+14:     return image

The yellow-highlighted lines in Out [4] demonstrate areas where Cython can be used for optimization — we’ll see later how to perform optimization with Cython. Notice how pixel-by-pixel looping action is highlighted.

Tip: You may click the ‘+’ at the beginning of a line to see the underlying C code — something that I find very interesting.

Next, let’s time the operation of the function:

%timeit threshold_slow(5, image)

Using the %timeit syntax we can execute and time the function — we specify a threshold value of 5 and our image which we’ve already loaded. The resulting output is shown below:

1 loop, best of 3: 244 ms per loop

The output shows that 244 ms was the fastest that the function ran on my system. This serves as our baseline time — we will reduce this number drastically later in this post.

Let’s see the result of the thresholding operation to visually validate that our function is working properly:

# threshold our image to validate that it's working
image = threshold_slow(5, image)
plt.imshow(image, cmap="gray")

The two lines shown in In [6] run the function and show the output in-line on the notebook. The resulting thresholded image is shown:

**Figure 2:** Thresholding our input image using the threshod_slow method.

Now we are to the fun part. Let’s leverage Cython to create a highly-optimized pixel-by-pixel loop:

%%cython -a
import cython

@cython.boundscheck(False)
cpdef unsigned char[:, :] threshold_fast(int T, unsigned char [:, :] image):
    # set the variable extension types
    cdef int x, y, w, h
    
    # grab the image dimensions
    h = image.shape[0]
    w = image.shape[1]
    
    # loop over the image
    for y in range(0, h):
        for x in range(0, w):
            # threshold the pixel
            image[y, x] = 255 if image[y, x] >= T else 0
    
    # return the thresholded image
    return image

Line 1 of In [7] again specifies that we want Cython to highlight lines that can be optimized.

Then, we import Cython on Line 2.

The beauty of Cython is that very few changes are necessary for our Python code — you will; however, see some traces of C syntax. Line 4 is a Cython decorator stating that we won’t check array index bounds, offering a slight speedup.

The following paragraphs highlight some Cython syntax, so pay particular attention.

We then define the function (Line 5) using the cpdef keyword rather than Python’s def — this creates a cdef type for C types and def type for Python types (source).

The threshold_fast function will return an unsigned char [:,:] , which will be our output NumPy array. We use unsigned char since OpenCV represents images as unsigned 8-bit integers and an unsigned char (effectively) gives us the same data type. The [:, :] implies that we are working with a 2D array.

From there, we provide the actual data types to our function, including int T (the threshold value), and another unsigned char array, our input image .

On Line 7, using cdef we can declare our Python variables as C variables instead — this allows Cython to understand our data types.

Everything else in In [7] is identical to that of threshold_slow which demonstrates the convenience of Cython.

Our output is shown below:

+01: import cython
 02: 
 03: @cython.boundscheck(False)
+04: cpdef unsigned char[:, :] threshold_fast(int T, unsigned char [:, :] image):
 05:     # set the variable extension types
 06:     cdef int x, y, w, h
 07: 
 08:     # grab the image dimensions
+09:     h = image.shape[0]
+10:     w = image.shape[1]
 11: 
 12:     # loop over the image
+13:     for y in range(0, h):
+14:         for x in range(0, w):
 15:             # threshold the pixel
+16:             image[y, x] = 255 if image[y, x] >= T else 0
 17: 
 18:     # return the thresholded image
+19:     return image

This time notice in Out [7] that fewer lines are highlighted by Cython. In fact, only the Cython import and the function declaration are highlighted — this is typical.

Next, we will reload and re-pre-process our original image (effectively resetting it):

# reload the original image and convert it to grayscale
image = cv2.imread("example.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

The purpose for reloading the image is because our first threshold_slow operation modified the image in-place. We need to re-initialize it to a known state.

Let’s go ahead and benchmark our threshold_fast function against the original threshold_slow function in Python:

%timeit threshold_fast(5, image)

The result:

10000 loops, best of 3: 41.2 µs per loop

This time we are achieving 41.2 microseconds per call, a massive improvement of the 244 milliseconds using strict Python. This implies that by using Cython we can increase the speed of our pixel-by-pixel loop by over 2 orders of magnitude!

What about OpenMP?

After reading through this tutorial you might be wondering if there are more performance gains we can achieve. While we have achieved massive performance gains by using Cython over Python, we’re actually still only using one core of our CPU.

But what if we wanted to distribute computation across multiple CPUs/cores? Is that possible?

It absolutely is — we just need to use OpenMP (Open Multi-processing).

In a follow-up blog post, I’ll demonstrate how to use OpenMP to further boost `for` pixel loops using OpenCV and Python.

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

Inspired by Satya Mallick’s original blog post to speed up for pixel loops using C++, I decided to write a tutorial that attempts to accomplish the same thing — only in Python.

Unfortunately, Python has only a fraction of the function calls available as bindings (as compared to C++). Because of this, we need to “roll our own” faster ‘for’ loop method using Cython.

The results were quite dramatic — by using Cython we were able to boost our thresholding function from 244 ms per function call (pure Python) to less than 40.8 μs (Cython).

What’s interesting is that there are still optimizations to be made.

Our simple method thus far is only using one core of our CPU. By enabling OpenMP support, we can actually distribute the for loop computation across multiple CPUs/cores — doing this will only further increase the speed of our function.

I will be covering how to use OpenMP to boost our for pixel loops with OpenCV and Python in a future blog post.

For the time being, be sure to enter your email address in the form below to be notified when new blog posts are published!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

Super fast ‘for’ pixel loops with OpenCV and Python

What is Cython? And how will it speed up our pixel loops?

Implementing faster pixel loops with OpenCV and Python

What about OpenMP?

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Real-time object detection with deep learning and OpenCV

Building a Pokedex in Python: Scraping the Pokemon Sprites (Step 2 of 6)

Running a Python + OpenCV script on reboot

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Super fast ‘for’ pixel loops with OpenCV and Python

What is Cython? And how will it speed up our pixel loops?

Implementing faster pixel loops with OpenCV and Python

What about OpenMP?

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?