OpenCV Selective Search for Object Detection

Today, you will learn how to use OpenCV Selective Search for object detection.

Today’s tutorial is Part 2 in our 4-part series on deep learning and object detection:

Part 1: Turning any deep learning image classifier into an object detector with Keras and TensorFlow
Part 2: OpenCV Selective Search for Object Detection (today’s tutorial)
Part 3: Region proposal for object detection with OpenCV, Keras, and TensorFlow (next week’s tutorial)
Part 4: R-CNN object detection with Keras and TensorFlow (publishing in two weeks)

Selective Search, first introduced by Uijlings et al. in their 2012 paper, Selective Search for Object Recognition, is a critical piece of computer vision, deep learning, and object detection research.

In their work, Uijlings et al. demonstrated:

How images can be over-segmented to automatically identify locations in an image that could contain an object
That Selective Search is far more computationally efficient than exhaustively computing image pyramids and sliding windows (and without loss of accuracy)
And that Selective Search can be swapped in for any object detection framework that utilizes image pyramids and sliding windows

Automatic region proposal algorithms such as Selective Search paved the way for Girshick et al.’s seminal R-CNN paper, which gave rise to highly accurate deep learning-based object detectors.

Furthermore, research with Selective Search and object detection has allowed researchers to create state-of-the-art Region Proposal Network (RPN) components that are even more accurate and more efficient than Selective Search (see Girshick et al.’s follow-up 2015 paper on Faster R-CNNs).

But before we can get into RPNs, we first need to understand how Selective Search works, including how we can leverage Selective Search for object detection with OpenCV.

To learn how to use OpenCV’s Selective Search for object detection, just keep reading.

Looking for the source code to this post?

OpenCV Selective Search for Object Detection

In the first part of this tutorial, we’ll discuss the concept of region proposals via Selective Search and how they can efficiently replace the traditional method of using image pyramids and sliding windows to detect objects in an image.

From there, we’ll review the Selective Search algorithm in detail, including how it over-segments an image via:

Color similarity
Texture similarity
Size similarity
Shape similarity
A final meta-similarity, which is a linear combination of the above similarity measures

I’ll then show you how to implement Selective Search using OpenCV.

Region proposals versus sliding windows and image pyramids

In last week’s tutorial, you learned how to turn any image classifier into an object detector by applying image pyramids and sliding windows.

As a refresher, image pyramids create a multi-scale representation of an input image, allowing us to detect objects at multiple scales/sizes:

**Figure 1:** Selective Search is a more advanced form of object detection compared to sliding windows and image pyramids, which search *every* ROI of an image by means of an image pyramid and sliding window.

Sliding windows operate on each layer of the image pyramid, sliding from left-to-right and top-to-bottom, thereby allowing us to localize where in an image a given object is:

There are a number of problems with the image pyramid and sliding window approach, but the two primary ones are:

It’s painfully slow. Even with an optimized-for-loops approach and multiprocessing, looping over each image pyramid layer and inspecting every location in the image via sliding windows is computationally expensive.
They are sensitive to their parameter choices. Different values of your image pyramid scale and sliding window size can lead to dramatically different results in terms of positive detection rate, false-positive detections, and missing detections altogether.

Given these reasons, computer vision researchers have looked into creating automatic region proposal generators that replace sliding windows and image pyramids.

The general idea is that a region proposal algorithm should inspect the image and attempt to find regions of an image that likely contain an object (think of region proposal as a cousin to saliency detection).

The region proposal algorithm should:

Be faster and more efficient than sliding windows and image pyramids
Accurately detect the regions of an image that could contain an object
Pass these “candidate proposals” to a downstream classifier to actually label the regions, thus completing the object detection framework

The question is, what types of region proposal algorithms can we use for object detection?

What is Selective Search and how can Selective Search be used for object detection?

The Selective Search algorithm implemented in OpenCV was first introduced by Uijlings et al. in their 2012 paper, Selective Search for Object Recognition.

Selective Search works by over-segmenting an image using a superpixel algorithm (instead of SLIC, Uijlings et al. use the Felzenszwalb method from Felzenszwalb and Huttenlocher’s 2004 paper, Efficient graph-based image segmentation).

An example of running the Felzenszwalb superpixel algorithm can be seen below:

**Figure 2:** OpenCV’s Selective Search uses the Felzenszwalb superpixel method to find regions of an image that could contain an object. Selective Search is not end-to-end object detection. (*image source*)

From there, Selective Search seeks to merge together the superpixels to find regions of an image that could contain an object.

Selective Search merges superpixels in a hierarchical fashion based on five key similarity measures:

Color similarity: Computing a 25-bin histogram for each channel of an image, concatenating them together, and obtaining a final descriptor that is 25×3=75-d. Color similarity of any two regions is measured by the histogram intersection distance.
Texture similarity: For texture, Selective Search extracts Gaussian derivatives at 8 orientations per channel (assuming a 3-channel image). These orientations are used to compute a 10-bin histogram per channel, generating a final texture descriptor that is 8x10x=240-d. To compute texture similarity between any two regions, histogram intersection is once again used.
Size similarity: The size similarity metric that Selective Search uses prefers that smaller regions be grouped earlier rather than later. Anyone who has used Hierarchical Agglomerative Clustering (HAC) algorithms before knows that HACs are prone to clusters reaching a critical mass and then combining everything that they touch. By enforcing smaller regions to merge earlier, we can help prevent a large number of clusters from swallowing up all smaller regions.
Shape similarity/compatibility: The idea behind shape similarity in Selective Search is that they should be compatible with each other. Two regions are considered “compatible” if they “fit” into each other (thereby filling gaps in our regional proposal generation). Furthermore, shapes that do not touch should not be merged.
A final meta-similarity measure: A final meta-similarity acts as a linear combination of the color similarity, texture similarity, size similarity, and shape similarity/compatibility.

The results of Selective Search applying these hierarchical similarity measures can be seen in the following figure:

**Figure 3:** OpenCV’s Selective Search applies hierarchical similarity measures to join regions and eventually form the final set of proposals for where objects could be present. (*image source*)

On the bottom layer of the pyramid, we can see the original over-segmentation/superpixel generation from the Felzenszwalb method.

In the middle layer, we can see regions being joined together, eventually forming the final set of proposals (top).

If you’re interested in learning more about the underlying theory of Selective Search, I would suggest referring to the following resources:

Efficient Graph-Based Image Segmentation (Felzenszwalb and Huttenlocher, 2004)
Selective Search for Object Recognition (Uijlings et al., 2012)
Selective Search for Object Detection (C++/Python) (Chandel/Mallick, 2017)

**Selective Search generates regions, not class labels**

A common misconception I see with Selective Search is that readers mistakenly think that Selective Search replaces entire object detection frameworks such as HOG + Linear SVM, R-CNN, etc.

In fact, a couple of weeks ago, PyImageSearch reader Hayden emailed in with that exact same question:

Hi Adrian, I am using Selective Search to detect objects with OpenCV.
However, Selective Search is just returning bounding boxes — I can’t seem to figure out how to get labels associated with these bounding boxes.

So, here’s the deal:

Selective Search does generate regions of an image that could contain an object.
However, Selective Search does not have any knowledge of what is in that region (think of it as a cousin to saliency detection).
Selective Search is meant to replace the computationally expensive, highly inefficient method of exhaustively using image pyramids and sliding windows to examine locations of an image for a potential object.
By using Selective Search, we can more efficiently examine regions of an image that likely contain an object and then pass those regions on to a SVM, CNN, etc. for final classification.

If you are using Selective Search, just keep in mind that the Selective Search algorithm will not give you class label predictions — it is assumed that your downstream classifier will do that for you (the topic of next week’s blog post).

But in the meantime, let’s learn how we can use OpenCV Selective Search in our own projects.

Project structure

Be sure to grab the .zip for this tutorial from the “Downloads” section. Once you’ve extracted the files, you may use the tree command to see what’s inside:

$ tree
.
├── dog.jpg
└── selective_search.py

0 directories, 2 files

Our project is quite simple, consisting of a Python script (selective_search.py) and a testing image (dog.jpg).

In the next section, we’ll learn how to implement our Selective Search script with Python and OpenCV.

Implementing Selective Search with OpenCV and Python

We are now ready to implement Selective Search with OpenCV!

Open up a new file, name it selective_search.py, and insert the following code:

# import the necessary packages
import argparse
import random
import time
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to the input image")
ap.add_argument("-m", "--method", type=str, default="fast",
	choices=["fast", "quality"],
	help="selective search method")
args = vars(ap.parse_args())

We begin our dive into Selective Search with a few imports, the main one being OpenCV (cv2). The other imports are built-in to Python.

Our script handles two command line arguments:

--image: The path to your input image (we’ll be testing with dog.jpg today).
--method: The Selective Search algorithm to use. You have two choices — either "fast" or "quality". In most cases, the fast method will be sufficient, so it is set as the default method.

We’re now ready to load our input image and initialize our Selective Search algorithm:

# load the input image
image = cv2.imread(args["image"])

# initialize OpenCV's selective search implementation and set the
# input image
ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
ss.setBaseImage(image)

# check to see if we are using the *fast* but *less accurate* version
# of selective search
if args["method"] == "fast":
	print("[INFO] using *fast* selective search")
	ss.switchToSelectiveSearchFast()

# otherwise we are using the *slower* but *more accurate* version
else:
	print("[INFO] using *quality* selective search")
	ss.switchToSelectiveSearchQuality()

Line 17 loads our --image from disk.

From there, we initialize Selective Search and set our input image (Lines 21 and 22).

Initialization of Selective search requires another step — choosing and setting the internal mode of operation. Lines 26-33 use the command line argument --method value to determine whether we should use either:

The "fast" method: switchToSelectiveSearchFast
The "quality" method: switchToSelectiveSearchQuality

Generally, the faster method will be suitable; however, depending on your application, you might want to sacrifice speed to achieve better quality results (more on that later).

Let’s go ahead and perform Selective Search with our image:

# run selective search on the input image
start = time.time()
rects = ss.process()
end = time.time()

# show how along selective search took to run along with the total
# number of returned region proposals
print("[INFO] selective search took {:.4f} seconds".format(end - start))
print("[INFO] {} total region proposals".format(len(rects)))

To run Selective Search, we simply call the process method on our ss object (Line 37). We’ve set timestamps around this call, so we can get a feel for how fast the algorithm is; Line 42 reports the Selective Search benchmark to our terminal.

Subsequently, Line 43 tells us the number of region proposals the Selective Search operation found.

Now, what fun would finding our region proposals be if we weren’t going to visualize the result? Zero fun. To wrap up, let’s draw the output on our image:

# loop over the region proposals in chunks (so we can better
# visualize them)
for i in range(0, len(rects), 100):
	# clone the original image so we can draw on it
	output = image.copy()

	# loop over the current subset of region proposals
	for (x, y, w, h) in rects[i:i + 100]:
		# draw the region proposal bounding box on the image
		color = [random.randint(0, 255) for j in range(0, 3)]
		cv2.rectangle(output, (x, y), (x + w, y + h), color, 2)

	# show the output image
	cv2.imshow("Output", output)
	key = cv2.waitKey(0) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
		break

To annotate our output, we simply:

Loop over region proposals in chunks of 100 (Selective Search will generate a few hundred to a few thousand proposals; we “chunk” them so we can better visualize them) via the nested for loops established on Line 47 and Line 52
Extract the bounding box coordinates surrounding each of our region proposals generated by Selective Search, and draw a colored rectangle for each (Lines 52-55)
Show the result on our screen (Line 59)
Allow the user to cycle through results (by pressing any key) until either all results are exhausted or the q (quit) key is pressed

In the next section, we’ll analyze results of both methods (fast and quality).

OpenCV Selective Search results

We are now ready to apply Selective Search with OpenCV to our own images.

Start by using the “Downloads” section of this blog post to download the source code and example images.

From there, open up a terminal, and execute the following command:

$ python selective_search.py --image dog.jpg 
[INFO] using *fast* selective search
[INFO] selective search took 1.0828 seconds
[INFO] 1219 total region proposals

**Figure 4:** The results of OpenCV’s “fast mode” of Selective Search, a component of object detection.

Here, you can see that OpenCV’s Selective Search “fast mode” took ~1 second to run and generated 1,219 bounding boxes — the visualization in Figure 4 shows us looping over each of the regions generated by Selective Search and visualizing them to our screen.

If you’re confused by this visualization, consider the end goal of Selective Search: to replace traditional computer vision object detection techniques such as sliding windows and image pyramids with a more efficient region proposal generation method.

Thus, Selective Search will not tell you what is in the ROI, but it tells you that the ROI is “interesting enough” to passed on to a downstream classifier (ex., SVM, CNN, etc.) for final classification.

Let’s apply Selective Search to the same image, but this time, use the --method quality mode:

$ python selective_search.py --image dog.jpg --method quality
[INFO] using *quality* selective search
[INFO] selective search took 3.7614 seconds
[INFO] 4712 total region proposals

**Figure 5:** OpenCV’s Selective Search “quality mode” sacrifices speed to produce more accurate region proposal results.

The “quality” Selective Search method generated 286% more region proposals but also took 247% longer to run.

Whether or not you should use the “fast” or “quality” mode is dependent on your application.

In most cases, the “fast” Selective Search is sufficient, but you may choose to use the “quality” mode:

When performing inference and wanting to ensure you generate more quality regions to your downstream classifier (of course, this means that real-time detection is not a concern)
When using Selective Search to generate training data, thereby ensuring you generate more positive and negative regions for your classifier to learn from

Where can I learn more about OpenCV’s Selective Search for object detection?

In next week’s tutorial, you’ll learn how to:

Use Selective Search to generate object detection proposal regions
Take a pre-trained CNN and classify each of the regions (discarding any low confidence/background regions)
Apply non-maxima suppression to return our final object detections

And in two weeks, we’ll use Selective Search to generate training data and then fine-tune a CNN to perform object detection via region proposal.

This has been a great series of tutorials so far, and you don’t want to miss the next two!

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned how to perform Selective Search to generate object detection proposal regions with OpenCV.

Selective Search works by over-segmenting an image by combining regions based on five key components:

Color similarity
Texture similarity
Size similarity
Shape similarity
And a final similarity measure, which is a linear combination of the above four similarity measures

It’s important to note that Selective Search itself does not perform object detection.

Instead, Selective Search returns proposal regions that could contain an object.

The idea here is that we replace our computationally expensive, highly inefficient sliding windows and image pyramids with a less expensive, more efficient Selective Search.

Next week, I’ll show you how to take the proposal regions generated by Selective Search and then run an image classifier on top of them, allowing you to create an ad hoc deep learning-based object detector!

Stay tuned for next week’s tutorial.

To download the source code to this post (and be notified when the next tutorial in this series publishes), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

OpenCV Selective Search for Object Detection

Region proposals versus sliding windows and image pyramids

What is Selective Search and how can Selective Search be used for object detection?

**Selective Search generates regions, not class labels**

Project structure

Implementing Selective Search with OpenCV and Python

OpenCV Selective Search results

Where can I learn more about OpenCV’s Selective Search for object detection?

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Comment section

Install OpenCV and Python on your Raspberry Pi 2 and B+

Intersection over Union (IoU) for object detection

Using Tesseract OCR with Python

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

OpenCV Selective Search for Object Detection

Region proposals versus sliding windows and image pyramids

What is Selective Search and how can Selective Search be used for object detection?

Selective Search generates regions, not class labels

Project structure

Implementing Selective Search with OpenCV and Python

OpenCV Selective Search results

Where can I learn more about OpenCV’s Selective Search for object detection?

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Turning any CNN image classifier into an object detector with Keras, TensorFlow, and OpenCV

Region proposal object detection with OpenCV, Keras, and TensorFlow

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?

**Selective Search generates regions, not class labels**