Still using the original, plain ole’ implementation of SIFT by David Lowe?

Well, according to Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval, you’re selling yourself (and your accuracy) short by using the original implementation.

Instead, you should be utilizing a simple extension to SIFT, called RootSIFT, that can be used to dramatically increase object recognition accuracy, quantization, and retrieval accuracy.

Whether you’re matching descriptors of regions surrounding keypoints, clusterings SIFT descriptors using k-means, or building a bag of visual words model, the RootSIFT extension can be used to improve your results.

Best of all, the RootSIFT extension sits on top of the original SIFT implementation and does not require changes to the original SIFT source code.

You do not have to recompile or modify your favorite SIFT implementation to utilize the benefits of RootSIFT.

So if you’re using SIFT regularly in your computer vision applications, but have yet to level-up to RootSIFT, read on.

This blog post will show you how to implement RootSIFT in Python and OpenCV — without (1) having to change a single line of code in the original OpenCV SIFT implementation and (2) without having to compile the entire library.

Sound interesting? Check out the rest of this blog post to learn how to implement RootSIFT in Python and OpenCV.

Looking for the source code to this post?

OpenCV and Python versions:
In order to run this example, you’ll need Python 2.7 and OpenCV 2.4.X.

Why RootSIFT?

It is well known that when comparing histograms the Euclidean distance often yields inferior performance than when using the chi-squared distance or the Hellinger kernel [Arandjelovic et al. 2012].

And if this is the case why do we often use the Euclidean distance to compare SIFT descriptors when matching keypoints? Or clustering SIFT descriptors to form a codebook? Or quantizing SIFT descriptors to form a bag of visual words?

Remember, while the original SIFT papers discuss comparing descriptors using the Euclidean distance, SIFT is still a histogram itself — and wouldn’t other distance metrics offer greater accuracy?

It turns out, the answer is yes. And instead of comparing SIFT descriptors using a different metric we can instead modify the 128-dim descriptor returned from SIFT directly.

You see, Arandjelovic et al. suggest a simple algebraic extension to the SIFT descriptor itself, called RootSIFT, that allow SIFT descriptors to be “compared” using a Hellinger kernel — but still utilizing the Euclidean distance.

Here is the simple algorithm to extend SIFT to RootSIFT:

Step 1: Compute SIFT descriptors using your favorite SIFT library.
Step 2: L1-normalize each SIFT vector.
Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2-normalized.

That’s it!

It’s a simple extension. But this little modification can dramatically improve results, whether you’re matching keypoints, clustering SIFT descriptors, of quantizing to form a bag of visual words, Arandjelovic et al. have shown that RootSIFT can easily be used in all scenarios that SIFT is, while improving results.

In the rest of this blog post, I’ll show you how to implement RootSIFT using Python and OpenCV. Using this implementation, you’ll be able to incorporate RootSIFT into your own applications — and improve your results!

Implementing RootSIFT in Python and OpenCV

Open up your favorite editor, create a new file and name it rootsift.py , and let’s get started:

# import the necessary packages
import numpy as np
import cv2

class RootSIFT:
	def __init__(self):
		# initialize the SIFT feature extractor
		self.extractor = cv2.DescriptorExtractor_create("SIFT")

	def compute(self, image, kps, eps=1e-7):
		# compute SIFT descriptors
		(kps, descs) = self.extractor.compute(image, kps)

		# if there are no keypoints or descriptors, return an empty tuple
		if len(kps) == 0:
			return ([], None)

		# apply the Hellinger kernel by first L1-normalizing and taking the
		# square-root
		descs /= (descs.sum(axis=1, keepdims=True) + eps)
		descs = np.sqrt(descs)
		#descs /= (np.linalg.norm(descs, axis=1, ord=2) + eps)

		# return a tuple of the keypoints and descriptors
		return (kps, descs)

The first thing we’ll do is import our necessary packages. We’ll use NumPy for numerical processing and cv2 for our OpenCV bindings.

We then define our RootSIFT class on Line 5 and the constructer on Lines 6-8. The constructor simply initializes the OpenCV SIFT descriptor extractor.

The compute function on Line 10 then handles the computation of the RootSIFT descriptor. This function requires two arguments and an optional third argument.

The first argument to the compute function is the image that we want to extract RootSIFT descriptors from. The second argument is the list of keypoints, or local regions, from where the RootSIFT descriptors will be extracted. And finally, an epsilon variable, eps , is supplied to prevent any divide-by-zero errors.

From there, we extract the original SIFT descriptors on Line 12.

We make a check on Lines 15 and 16 — if there are no keypoints or descriptors, we simply return an empty tuple.

Converting the original SIFT descriptors to RootSIFT descriptors takes place on Lines 20-22.

We first L1-normalize each vector in the descs array (Line 20).

From there, we take the square root of each element in the SIFT vector (Line 21).

Lastly, all we have to do is return the tuple of keypoints and RootSIFT descriptors to the calling function on Line 25.

Running RootSIFT

To actually see RootSIFT in action, open up a new file, name it driver.py , and we’ll explore how to extract SIFT and RootSIFT descriptors from images:

# import the necessary packages
from rootsift import RootSIFT
import cv2

# load the image we are going to extract descriptors from and convert
# it to grayscale
image = cv2.imread("example.png")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# detect Difference of Gaussian keypoints in the image
detector = cv2.FeatureDetector_create("SIFT")
kps = detector.detect(gray)

# extract normal SIFT descriptors
extractor = cv2.DescriptorExtractor_create("SIFT")
(kps, descs) = extractor.compute(gray, kps)
print "SIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape)

# extract RootSIFT descriptors
rs = RootSIFT()
(kps, descs) = rs.compute(gray, kps)
print "RootSIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape)

On Lines 1 and 2 we import our RootSIFT descriptor along with our OpenCV bindings.

We then load our example image, convert it to grayscale, and detect Difference of Gaussian keypoints on Lines 7-12.

From there, we extract the original SIFT descriptors on Lines 15-17.

And we extract the RootSIFT descriptors on Lines 20-22.

To execute our script, simply issue the following command:

$ python driver.py

Your output should look like this:

SIFT: kps=1006, descriptors=(1006, 128) 
RootSIFT: kps=1006, descriptors=(1006, 128)

As you can see, we have extract 1,006 DoG keypoints. And for each keypoint we have extracted 128-dim SIFT and RootSIFT descriptors.

From here, you can take this RootSIFT implementation and apply it to your own applications, including keypoint and descriptor matching, clustering descriptors to form centroids, and quantizing to create a bag of visual words model — all of which we will cover in future posts.

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this blog post, I showed you how to extend the original OpenCV SIFT implementation by David Lowe to create the RootSIFT descriptor, a simple extension suggested by Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval.

The RootSIFT extension does not require you to modify the source of your favorite SIFT implementation — it simply sits on top of the original implementation.

The simple ~~4-step~~ 3-step process to compute RootSIFT is:

Step 1: Compute SIFT descriptors using your favorite SIFT library.
Step 2: L1-normalize each SIFT vector.
Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2 normalized

No matter if you are using SIFT to match keypoints, form cluster centers using k-means, or quantize SIFT descriptors to form a bag of visual words, you should definitely consider utilizing RootSIFT rather than the original SIFT to improve your object retrieval accuracy.

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

Why RootSIFT?

Implementing RootSIFT in Python and OpenCV

Running RootSIFT

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Getting started with the NVIDIA Jetson Nano

OpenCV, RPi.GPIO, and GPIO Zero on the Raspberry Pi

Image Data Loaders in PyTorch

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Why RootSIFT?

Implementing RootSIFT in Python and OpenCV

Running RootSIFT

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?