Still using the original, plain ole’ implementation of SIFT by David Lowe?
Well, according to Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval, you’re selling yourself (and your accuracy) short by using the original implementation.
Instead, you should be utilizing a simple extension to SIFT, called RootSIFT, that can be used to dramatically increase object recognition accuracy, quantization, and retrieval accuracy.
Whether you’re matching descriptors of regions surrounding keypoints, clusterings SIFT descriptors using k-means, or building a bag of visual words model, the RootSIFT extension can be used to improve your results.
Best of all, the RootSIFT extension sits on top of the original SIFT implementation and does not require changes to the original SIFT source code.
You do not have to recompile or modify your favorite SIFT implementation to utilize the benefits of RootSIFT.
So if you’re using SIFT regularly in your computer vision applications, but have yet to level-up to RootSIFT, read on.
This blog post will show you how to implement RootSIFT in Python and OpenCV — without (1) having to change a single line of code in the original OpenCV SIFT implementation and (2) without having to compile the entire library.
Sound interesting? Check out the rest of this blog post to learn how to implement RootSIFT in Python and OpenCV.
OpenCV and Python versions:
In order to run this example, you’ll need Python 2.7 and OpenCV 2.4.X.
Why RootSIFT?
It is well known that when comparing histograms the Euclidean distance often yields inferior performance than when using the chi-squared distance or the Hellinger kernel [Arandjelovic et al. 2012].
And if this is the case why do we often use the Euclidean distance to compare SIFT descriptors when matching keypoints? Or clustering SIFT descriptors to form a codebook? Or quantizing SIFT descriptors to form a bag of visual words?
Remember, while the original SIFT papers discuss comparing descriptors using the Euclidean distance, SIFT is still a histogram itself — and wouldn’t other distance metrics offer greater accuracy?
It turns out, the answer is yes. And instead of comparing SIFT descriptors using a different metric we can instead modify the 128-dim descriptor returned from SIFT directly.
You see, Arandjelovic et al. suggest a simple algebraic extension to the SIFT descriptor itself, called RootSIFT, that allow SIFT descriptors to be “compared” using a Hellinger kernel — but still utilizing the Euclidean distance.
Here is the simple algorithm to extend SIFT to RootSIFT:
- Step 1: Compute SIFT descriptors using your favorite SIFT library.
- Step 2: L1-normalize each SIFT vector.
- Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2-normalized.
That’s it!
It’s a simple extension. But this little modification can dramatically improve results, whether you’re matching keypoints, clustering SIFT descriptors, of quantizing to form a bag of visual words, Arandjelovic et al. have shown that RootSIFT can easily be used in all scenarios that SIFT is, while improving results.
In the rest of this blog post, I’ll show you how to implement RootSIFT using Python and OpenCV. Using this implementation, you’ll be able to incorporate RootSIFT into your own applications — and improve your results!
Implementing RootSIFT in Python and OpenCV
Open up your favorite editor, create a new file and name it rootsift.py
, and let’s get started:
# import the necessary packages import numpy as np import cv2 class RootSIFT: def __init__(self): # initialize the SIFT feature extractor self.extractor = cv2.DescriptorExtractor_create("SIFT") def compute(self, image, kps, eps=1e-7): # compute SIFT descriptors (kps, descs) = self.extractor.compute(image, kps) # if there are no keypoints or descriptors, return an empty tuple if len(kps) == 0: return ([], None) # apply the Hellinger kernel by first L1-normalizing and taking the # square-root descs /= (descs.sum(axis=1, keepdims=True) + eps) descs = np.sqrt(descs) #descs /= (np.linalg.norm(descs, axis=1, ord=2) + eps) # return a tuple of the keypoints and descriptors return (kps, descs)
The first thing we’ll do is import our necessary packages. We’ll use NumPy for numerical processing and cv2
for our OpenCV bindings.
We then define our RootSIFT
class on Line 5 and the constructer on Lines 6-8. The constructor simply initializes the OpenCV SIFT descriptor extractor.
The compute
function on Line 10 then handles the computation of the RootSIFT descriptor. This function requires two arguments and an optional third argument.
The first argument to the compute
function is the image
that we want to extract RootSIFT descriptors from. The second argument is the list of keypoints, or local regions, from where the RootSIFT descriptors will be extracted. And finally, an epsilon variable, eps
, is supplied to prevent any divide-by-zero errors.
From there, we extract the original SIFT descriptors on Line 12.
We make a check on Lines 15 and 16 — if there are no keypoints or descriptors, we simply return an empty tuple.
Converting the original SIFT descriptors to RootSIFT descriptors takes place on Lines 20-22.
We first L1-normalize each vector in the descs
array (Line 20).
From there, we take the square root of each element in the SIFT vector (Line 21).
Lastly, all we have to do is return the tuple of keypoints and RootSIFT descriptors to the calling function on Line 25.
Running RootSIFT
To actually see RootSIFT in action, open up a new file, name it driver.py
, and we’ll explore how to extract SIFT and RootSIFT descriptors from images:
# import the necessary packages from rootsift import RootSIFT import cv2 # load the image we are going to extract descriptors from and convert # it to grayscale image = cv2.imread("example.png") gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # detect Difference of Gaussian keypoints in the image detector = cv2.FeatureDetector_create("SIFT") kps = detector.detect(gray) # extract normal SIFT descriptors extractor = cv2.DescriptorExtractor_create("SIFT") (kps, descs) = extractor.compute(gray, kps) print "SIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape) # extract RootSIFT descriptors rs = RootSIFT() (kps, descs) = rs.compute(gray, kps) print "RootSIFT: kps=%d, descriptors=%s " % (len(kps), descs.shape)
On Lines 1 and 2 we import our RootSIFT
descriptor along with our OpenCV bindings.
We then load our example image, convert it to grayscale, and detect Difference of Gaussian keypoints on Lines 7-12.
From there, we extract the original SIFT descriptors on Lines 15-17.
And we extract the RootSIFT descriptors on Lines 20-22.
To execute our script, simply issue the following command:
$ python driver.py
Your output should look like this:
SIFT: kps=1006, descriptors=(1006, 128) RootSIFT: kps=1006, descriptors=(1006, 128)
As you can see, we have extract 1,006 DoG keypoints. And for each keypoint we have extracted 128-dim SIFT and RootSIFT descriptors.
From here, you can take this RootSIFT implementation and apply it to your own applications, including keypoint and descriptor matching, clustering descriptors to form centroids, and quantizing to create a bag of visual words model — all of which we will cover in future posts.
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this blog post, I showed you how to extend the original OpenCV SIFT implementation by David Lowe to create the RootSIFT descriptor, a simple extension suggested by Arandjelovic and Zisserman in their 2012 paper, Three things everyone should know to improve object retrieval.
The RootSIFT extension does not require you to modify the source of your favorite SIFT implementation — it simply sits on top of the original implementation.
The simple 4-step 3-step process to compute RootSIFT is:
- Step 1: Compute SIFT descriptors using your favorite SIFT library.
- Step 2: L1-normalize each SIFT vector.
- Step 3: Take the square root of each element in the SIFT vector. Then the vectors are L2 normalized
No matter if you are using SIFT to match keypoints, form cluster centers using k-means, or quantize SIFT descriptors to form a bag of visual words, you should definitely consider utilizing RootSIFT rather than the original SIFT to improve your object retrieval accuracy.
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Great one Adrian! A lot of new terms and functional phrases here too. Thanks again
Glad to hear you enjoyed the article! 🙂
I m not able to implement in opencv 3.0. please help
Hey Ham, I suggest reading my post on Where did SIFT and SURF go in OpenCV 3? to resolve the SIFT initialization issue.
Nice article! Thanks for sharing.
Thanks Ben, I’m glad you enjoyed it! 🙂
No luck for me yet on this one. Segmentation dump going on. Probably has something to do with the opencv version. Any advice on what to check?
If it’s segfaulting then it’s probably an issue with the way OpenCV was installed and compiled. Check your OpenCV version and ensure you are running something of the 2.4.X flavor. Also try to remove one line at a time and re-run until you can pinpoint the line of code that is causing the segfault.
Alright, from the python interpreter I managed to dump out the version of both python and opencv which are: Python 2.7.6 (default, Mar 22 2014, 22:59:56) and
>>> from cv2 import __version__; __version__
‘2.4.8’
I went ahead and requested your original scripts and its still giving me an error: segmentation fault (core dumped)
trying to execute line: 12 kps = detector.detect(image)
I am still digging just had some other priorities come up.
Thanks again
Thanks for the added info Johnny! I have tried the code with OpenCV 2.4.9, 2.4.10, and 2.4.11 and I’m not getting a segfault. That’s definitely quite strange. Keep me posted and if you find anything!
Hi Adrian and Johnny,
The segmentation fault is because OpenCV was built without using the nonfree module. SIFT is in the nonfree module and hence the segmentation fault.
I faced this problem some weeks back when I had built OpenCV from source code downloaded from the OpenCV GitHub repository.
Hope it helps!!
Thank You
Nice, thanks for the tip Bikram! 🙂 But to my understanding, only OpenCV 3 (which is in beta) does not compile the nonfree module by default. The previous versions of OpenCV 2.4.X still compiled all nonfree modules during installation time.
Somehow I managed to get 2.4.11 build by following this: https://help.ubuntu.com/community/OpenCV
Now when I kick off python driver.py I get the output just as you demonstrated but no image pops up.
NOTE: back on the build steps when I try to execute line 8 it gave me this:
Package ‘ffmpeg’ has no installation candidate
So I removed it and everything else went some what smooth.
Any ideas?
Wow, that’s really strange. I’ll admit that I’m stumped on that one.
What is the point of converting the image to grayscale after reading it in the driver file?
The color information (i.e. the Red, Green, and Blue channels individually) are not needed to detect keypoints. Furthermore, most keypoint detectors expect a grayscale image so we convert from RGB to grayscale and discard the color information.
Hi, Adrian, in the rootsift.py, you apply the Hellinger kernel by first L1-normalizing, taking the square-root, and then L2-normalizing. I have tested by first L2-normalizing, taking the square-root, and then L1-normalizing, just as the paper says: 1) L1-normailze the SIFT vector (originally it has unit L2 norm); 2) square root each element. I find the order of L2-normalizing and L1-normalizing doesn’t effect the values of rootSIFT. But for understanding, I thought it’s better to follow the order as the paper says.
Hey Yong, take a look at Slide 10 of the presentation done by Arandjelovic. This slide details their implementation of RootSIFT where the first step is L1 normalization, the second is element-wise square root, and the final step is L2 normalization. It’s interesting that the values were not effected though.
Yeah, it’s truth. The slide at 38th page shows the mAP they combine all the improvements reaches 92.9%, It’s really amazing. I only get 83.35% perfermance with 500k visual words.
It’s very interesting and useful, and I’m looking forward to reading your future posts. By the way, I find some of your posts have been translated to Chinese.
Very nice article, but I had a few questions about your python code:
1) When you do L1 normalization with axis=0, aren’t you normalizing all columns of the set of descriptors? I would think you would use ‘desc /= (desc.sum(axis=1, keepdims=True_ + eps)’ if you wanted to normalize each SIFT descriptor…
2) Are you sure you are supposed to divide the normalized and squarerooted descriptor vectors by the L2 norm? If you read the paper by Arandjelovic and Zimmerman, they do not do this. I feel like you did this because you saw that as a step in their presentation, but I think that in that slide they were saying that the descriptor is L2 normalized as a result of L1 normalizing and taking the square root.
Let me know if I’m off my rocker, and thanks for introducing me to this cool trick!
Hey Chris, to answer your questions:
1. Thanks for pointing this out! It looks like I have accidentally pushed a previous version of the RootSIFT code online from my git repo. This was certainly not my intention. Thanks a million for pointing this out. The code and blog post have been updated.
2. The original SIFT descriptor is L2 normalized, so while the paper does not explicitly state that square-rooted descriptor should be L2 normalized, I think it’s applied. Perhaps I am wrong, but that’s how I interpreted it.
Thanks for clearing that up, Adrian!
Here’s my take on the L2 normalization. When I just do the math on a SIFT descriptor, this is what happens.
Step 1: L1 normalize SIFT vector
Step 2: Take square root of each element.
Step 3: Calculate L2 norm of transformed SIFT vector and divide each element by this value.
Now what’s happening is that the L2 norm is always 1.0 (or near to it as 0.999999). So it seems that this step is just unnecessary because the vector is already L2 normalized.
Good point. It does seem like this step is unnecessary. I am going to run some benchmarks related to image retrieval accuracy on my system and see if anything changes. Technically, it shouldn’t. But either way I’ll be posting an update on this article mentioning that the final L2 normalization is not necessary.
Hey Chris, I just wanted to let you know that I have updated the post to reflect your notes. Thanks again!
Hey Adrian,
What does the radius of each green circle tell us about that particular keypoint?
Would u mind posting the code that generates the “detect” image?
Thanks,
Sawyer
Hey Sawyer, sure thing. I have lots of plans to cover features and descriptors in future posts, so stay tuned!
Hey Adrian,
I was recently reading through a paper, “Food 101 – Mining Discriminative Components with Random Forests” (bit.ly/1RiH8tc).
In section 5.1, Implementation Details, the author mentions transforming SURFs using signed square rooting, and then references the RootSIFT paper:
“…two feature types are extracted: Dense SURFS, which are transformed using signed square-rooting.”
Is this essentially a “RootSURF”, or am I oversimplifying it? And can the Hellinger Kernel be used [effectively] with other feature extractors, like AKAZE?
Feature vectors generated from AKAZE and KAZE are binary feature vectors so they are compared using a Hamming distance. The chi-squared distance doesn’t make much sense here, unless you have constructed a bag-of-visual-words and are comparing the bag-of-visual-words histograms using the chi-squared distance.
As for SURF, yes, that is essentially RootSURF.
Thanks for clarifying. pyimagesearch has the best customer service on the net
PyImageSearch customer service = Adrian on his laptop 😉
Only AKAZE uses a binary descriptor and therefore the hamming distance. KAZE uses the euclidean distances like SIFT!
Thanks Man ! Every paper writer should have writing skills like you !
You just made it so simple.
Thanks for the kind words Rushi 😀
Adrian,
Two questions:
1) You convert the image to grayscale on line 8, but on line 12, 16, and 21 it appears you are using the original full color image. Or… am I missing something?
2) Do you have a version of this code that plays nice with OpenCV 3.0.0? I’m getting a “module object has no attribute” error for the DescriptorExtractor.
Thanks!
-Brian
Hey Brian — thanks for pointing that out. You can convert DoG keypoints in either color or gray images. I’ll update the code to make sure it’s using the grayscale image though.
As for working with OpenCV 3.0 and SIFT, you should give this post a read.
Hi Adrian,
I used your post on “Where did SIFT and SURF go” just a few days ago to get OpenCV 3.0.0 installed on my fresh install of Jessie (RPI 2). I admire your method as it parallels my own “How To” instructions; replete with expected times for each step! The install went great, but for some reason it’s not playing well with the code above.
For example, on line 11 you have:
detector = cv2.FeatureDetector_create(“SIFT”)
… which results in the following error:
‘module’ object has no attribute FeatureDetector_create’
…so I changed it to:
detector = cv2.xfeatures2d.SIFT_create()
On line 15 you have:
extractor = cv2.DescriptorExtractor_create(“SIFT”)
… which results in the same type of error (module has no attribute)
I tried playing around with an “xfeatures2d” version of that line of code without any luck. Documentation from OpenCV is also not up-to-date regarding 3.0.0 as several sites have pointed out.
Any ideas for a fix are greatly appreciated!
-Brian
This post was written well before OpenCV 3 was released — it’s intended for OpenCV 2.4.X.
However, you can easily update the code to run with OpenCV 3 by using something like this:
Hey Adrian, could you please help me out by giving me any urls from where I can get sift executable binary for Mac??? Thanks in advance 🙂
Unfortunately it’s not that simple. You’ll need to compile OpenCV with the extra modules support enabled.
You mean I need to compile the code to create sift executable binary on Mac environment???
Yes, that is correct.
Thanks man. 🙂
Traceback (most recent call last):
File “gesture1.py”, line 97, in
print “RootSIFT: kps=%d, descriptors=%s ” % (len(kps), descs.shape)
AttributeError: ‘NoneType’ object has no attribute ‘shape’
Can you help me with this?
Try investigating the
len(kps)
— the only reasondescs
would beNone
is that no keypoints were initially detected in the input image.Thanks for the nice tutorial. I just want to ask:
– What do you mean by “detect Difference of Gaussian keypoints in the image”? (line 10)
– Where can I find the documentation for the detect( ) method? (line 12)
Thanks.
The “Difference of Gaussian”, or more commonly DoG, is the default keypoint detector that SIFT utilizes. As for documentation for the
detect
method, see the OpenCV documentation.hey man! just wanted to ask how would I compute a matrix for all the descriptors if i am taking a large dataset of images,here you have taken one image in consideration.
I could run a loop to read all the images but how should i store all the descriptors in a matrix?
There are multiple ways to do this. The easiest would be to create a list, loop over your images, extract features from each image, append them to the list, and then convert the list to a NumPy array.
Another option, if you know the number of images you are going to process ahead of time, is to allocate memory for the NumPy array before feature extraction. Either option will work.
Hello Adrian
i am getting the same result as you shown, but i am not getting any image
please help me out
How are you accessing your system? In a headless manner via SSH or VNC? Or are you “wired in” with a keyboard, mouse, and monitor?
Hey Adrian,
Really useful article. Thank you! 🙂 But I have a conceptual based doubt regarding SIFT. Since I am a beginner, this question might come out to be a bit absurd.
I understand that cv2.KeyPoints Class offers an attribute ‘pt’ for the x and y coordinates of each keypoint (DoG). But can I extract the pixel position after applying SIFT? I have searched long and hard but haven’t gotten an answer. I am really hoping I can find the answer here.
Thank you in advance!
The DoG keypoint detector (confusingly called “SIFT” in OpenCV which is also the name of the local invariant descriptor) does indeed return a keypoint object with a
.pt
attribute. The SIFT descriptor takes this object and then describes the region surrounding it. I'm not sure what you mean by "extract the pixel position after applying SIFT" because the pixel position hasn't changed at all. Applying the SIFT descriptor does not change the (x, y)-coordinates of the keypoint.Thanks for the quick reply!
I admit I might be even more confused with the concept than I thought.
I am currently working on something where I am required to apply SLIC and then SIFT on an input image. I am the trying to calculate the number of keypoints for each of the superpixels I obtain after SLIC. Upon implementing SLIC, I get a 2D numpy array, let’s say ‘segments.’ Segments has the same dimensions as the image. So I was hoping upon extracting the coordinates of the keypoints after SIFT, I can apply a simple conditional statement in the iteration of ‘segments’ and use a “count” variable to calculate the total number of keypoints in that superpixel?
What my question really means is : Are the coordinates returned using ‘pt’ the pixel positions of the keypoint (so as I can use them as stated above)? But I have noticed that the ‘pt’ attribute returns float-like (x,y) value. This is where my real confusion arises.
Thanks once again!
I would suggest detecting keypoints on the image first. Then, apply SLIC and obtain your “segments”. Loop over each of these segments and then check to see if the (x, y)-coordinates of the keypoint
.pt
object falls inside the segment. This will allow you to assign each of the keypoints to a specific superpixel.At the end of the day, the coordinates returned by
.pt
are the (x, y)-coordinate pixel positions of the keypoint in the original image.Tried it and worked! Thanks a ton once again 🙂
Please post detailed implementation of SIFT in python not just how to use library.
I have provided a detailed explanation of SIFT (along with many other keypoint detectors and local invariant descriptors) Inside the PyImageSearch Gurus course.
Hi
Very nice exampel, I will try to impement the same in Java
Complex topic and yet simple to understand as always.
My question is
What is the significance of color and size of Keypoints?
The “size” is the radius of the keypoint area. The “color” has no significance — it’s just used to display the actual keypoint on the screen.
While running this program i get an erroe ** ImportError: No module named rootsift**
please help.. mine is opecv 2.4.9
Make sure you use the “Downloads” section of this blog post to download the source code + project structure for the tutorial. You likely do not have the project structure setup correctly, hence the import error.
Hi Adrian .. How to compute sift descriptor for None key point?
You cannot. A
KeyPoint
object cannot beNone
.Hi Adrian,
kps = detector.detect(gray)
error: ..\..\..\..\opencv\modules\core\src\alloc.cpp:52: error: (-4) Failed to allocate 127844356 bytes in function cv::OutOfMemoryError
I get the above error, what should I do?
Based on the error message, it looks like your system is running out of memory during the keypoint detection. This is likely because your input image is too large (in terms of width and height). Resize your image to have a maximum size of 600 to 1000px along its maximum dimension and everything should work fine.
hi,Adrian
how to object recognition (identification) for different objects?
There are many different ways to do this, but it really depends on your project. What types of objects are you trying to identify and in under what context? Without knowing what you’re working on, the PyImageSearch Gurus course covers object detection as does Deep Learning for Computer Vision with Python.
sir, i am using python 3.7.1 and opencv 4.1.1 but i cant use sift or surf in it. How can use sift and surf whether i have to use a older version or is there any other methods.