If you’ve been paying attention to my Twitter account lately, you’ve probably noticed one or two teasers of what I’ve been working on — a Python framework/package to rapidly construct object detectors using Histogram of Oriented Gradients and Linear Support Vector Machines.
Honestly, I really can’t stand using the Haar cascade classifiers provided by OpenCV (i.e. the Viola-Jones detectors) — and hence why I’m working on my own suite of classifiers. While cascade methods are extremely fast, they leave much to be desired. If you’ve ever used OpenCV to detect faces you’ll know exactly what I’m talking about.
In order to detect faces/humans/objects/whatever in OpenCV (and remove the false positives), you’ll spend a lot of time tuning the cv2.detectMultiScale
parameters. And again, there is no guarantee that the exact same parameters will work from image-to-image. This makes batch-processing large datasets for face detection a tedious task since you’ll be very concerned with either (1) falsely detecting faces or (2) missing faces entirely, simply due to poor parameter choices on a per image basis.
There is also the problem that the Viola-Jones detectors are nearing 15 years old. If this detector were a nice bottle of Cabernet Sauvignon I might be pretty stoked right now. But the field has advanced substantially since then. Back in 2001 the Viola-Jones detectors were state-of-the-art and they were certainly a huge motivating force behind the incredible new advances we have in object detection today.
Now, the Viola-Jones detector isn’t our only choice for object detection. We have object detection using keypoints, local invariant descriptors, and bag-of-visual-words models. We have Histogram of Oriented Gradients. We have deformable parts models. Exemplar models. And we are now utilizing Deep Learning with pyramids to recognize objects at different scales!
All that said, even though the Histogram of Oriented Gradients descriptor for object recognition is nearly a decade old, it is still heavily used today — and with fantastic results. The Histogram of Oriented Gradients method suggested by Dalal and Triggs in their seminal 2005 paper, Histogram of Oriented Gradients for Human Detection demonstrated that the Histogram of Oriented Gradients (HOG) image descriptor and a Linear Support Vector Machine (SVM) could be used to train highly accurate object classifiers — or in their particular study, human detectors.
Histogram of Oriented Gradients and Object Detection
I’m not going to review the entire detailed process of training an object detector using Histogram of Oriented Gradients (yet), simply because each step can be fairly detailed. But I wanted to take a minute and detail the general algorithm for training an object detector using Histogram of Oriented Gradients. It goes a little something like this:
Step 1:
Sample P positive samples from your training data of the object(s) you want to detect and extract HOG descriptors from these samples.
Step 2:
Sample N negative samples from a negative training set that does not contain any of the objects you want to detect and extract HOG descriptors from these samples as well. In practice N >> P.
Step 3:
Train a Linear Support Vector Machine on your positive and negative samples.
Step 4:
Apply hard-negative mining. For each image and each possible scale of each image in your negative training set, apply the sliding window technique and slide your window across the image. At each window compute your HOG descriptors and apply your classifier. If your classifier (incorrectly) classifies a given window as an object (and it will, there will absolutely be false-positives), record the feature vector associated with the false-positive patch along with the probability of the classification. This approach is called hard-negative mining.
Step 5:
Take the false-positive samples found during the hard-negative mining stage, sort them by their confidence (i.e. probability) and re-train your classifier using these hard-negative samples. (Note: You can iteratively apply steps 4-5, but in practice one stage of hard-negative mining usually [not not always] tends to be enough. The gains in accuracy on subsequent runs of hard-negative mining tend to be minimal.)
Step 6:
Your classifier is now trained and can be applied to your test dataset. Again, just like in Step 4, for each image in your test set, and for each scale of the image, apply the sliding window technique. At each window extract HOG descriptors and apply your classifier. If your classifier detects an object with sufficiently large probability, record the bounding box of the window. After you have finished scanning the image, apply non-maximum suppression to remove redundant and overlapping bounding boxes.
These are the bare minimum steps required, but by using this 6-step process you can train and build object detection classifiers of your own! Extensions to this approach include a deformable parts model and Exemplar SVMs, where you train a classifier for each positive instance rather than a collection of them.
However, if you’ve ever worked with object detection in images you’ve likely ran into the problem of detecting multiple bounding boxes around the object you want to detect in the image.
Here’s an example of this overlapping bounding box problem:
Notice on the left we have 6 overlapping bounding boxes that have correctly detected Audrey Hepburn’s face. However, these 6 bounding boxes all refer to the same face — we need a method to suppress the 5 smallest bounding boxes in the region, keeping only the largest one, as seen on the right.
This is a common problem, no matter if you are using the Viola-Jones based method or following the Dalal-Triggs paper.
There are multiple ways to remedy this problem. Triggs et al. suggests to use the Mean-Shift algorithm to detect multiple modes in the bounding box space by utilizing the (x, y) coordinates of the bounding box as well as the logarithm of the current scale of the image.
I’ve personally tried this method and wasn’t satisfied with the results. Instead, you’re much better off relying on a strong classifier with higher accuracy (meaning there are very few false positives) and then applying non-maximum suppression to the bounding boxes.
I spent some time looking for a good non-maximum suppression (sometimes called non-maxima suppression) implementation in Python. When I couldn’t find one, I chatted with my friend Dr. Tomasz Malisiewicz, who has spent his entire career working with object detector algorithms and the HOG descriptor. There is literally no one that I know who has more experience in this area than Tomasz. And if you’ve ever read any of his papers, you’ll know why. His work is fantastic.
Anyway, after chatting with him, he pointed me to two MATLAB implementations. The first is based on the work by Felzenszwalb et al. and their deformable parts model.
The second method is implemented by Tomasz himself for his Exemplar SVM project which he used for his dissertation and his ICCV 2011 paper, Ensemble of Exemplar-SVMs for Object Detection and Beyond. It’s important to note that Tomasz’s method is over 100x faster than the Felzenszwalb et al. method. And when you’re executing your non-maximum suppression function millions of times, that 100x speedup really matters.
I’ve implemented both the Felzenszwalb et al. and Tomasz et al. methods, porting them from MATLAB to Python. Next week we’ll start with the Felzenszwalb method, then the following week I’ll cover Tomasz’s method. While Tomasz’s method is substantially faster, I think it’s important to see both implementations so we can understand exactly why his method obtains such drastic speedups.
Be sure to stick around and check out these posts! These are absolutely critical steps to building object detectors of your own!
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this blog post we had a little bit of a history lesson regarding object detectors. We also had a sneak peek into a Python framework that I am working on for object detection in images.
From there we had a quick review of how the Histogram of Oriented Gradients method is used in conjunction with a Linear SVM to train a robust object detector.
However, no matter what method of object detection you use, you will likely end up with multiple bounding boxes surrounding the object you want to detect. In order to remove these redundant boxes you’ll need to apply Non-Maximum Suppression.
Over the next two weeks I’ll show you two implementations of Non-Maximum Suppression that you can use in your own object detection projects.
Be sure to enter your email address in the form below to receive an announcement when these posts go live! Non-Maximum Suppression is absolutely critical to obtaining an accurate and robust object detection system using HOG, so you definitely don’t want to miss these posts!
Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF
Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.
Hi Adrian,
Thank you for your great post.
I am currently working on object detection too. My goal is to make the detection faster, because object detectors are usually slow.
In your post, the object detection framework you are using is a 1 model, N image scales. However, there are alternatives, such as N models, 1 image scale; 1 model, N/K (K > 1) image scales (FPDW approach); N/K models (K > 1), 1 image scale.
I am thinking about creating a unified framework, which can include all these frameworks, but have no idea about the implementation now. Could you please make some comments:
— Is it worth to try?
— how will you do if you are doing it?
References:
1. Benenson, Rodrigo, et al. “Pedestrian detection at 100 frames per second.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
2. Dollár, Piotr, Serge Belongie, and Pietro Perona. “The Fastest Pedestrian Detector in the West.” BMVC. Vol. 2. No. 3. 2010.
Hi Shi! You are absolutely right, object detectors can be painfully slow, especially when you using the sliding window technique. And when expensive features such as HOG need to be computed, it can really kill performance.
And you are correct, I am utilizing the N image scales model for this framework. However, I have plans to create something similar to the FPDW approach by Dollar et al.
Right now performance using the N image scales model is actually pretty good for my framework, but not great. The reason is because I have distributed the image pyramid to all available cores of the system — this is an obvious solution where making the HOG sliding window computation run in parallel can dramatically speedup the code.
However, doing something like FPDW will further increase the speed (but lessen accuracy slightly).
Send me an email and I would be happy to chat more about implementation details. Thanks again for the comment, there is a ton of great information in here.
Hey, can you please make the font a little bigger and also darker. It’s very hard for me to read.
Use ‘Ctrl’ + ‘+’ next time buddy.
Really good illustration. Thanks! waiting for your following posts
Thanks Adrian for this and other wonderful posts. I have been following your posts for quite a long time now. I must say you are doing a wonderful job. Keep it up!
Thanks, I really appreciate it! 🙂
La Fin Du Monde… that’s quite a good beer! I’m impressed! 🙂 Kudos for the interesting article, BTW.
Fantastic Beer! Maudite is very good as well!
Good post, waiting for some nifty code to mess with 😉
Your blog is now my standard technology for teaching!
Thank you for such an awesome compliment! 🙂
Thank you, not me for such an awesome blog!!
Nice post, thanks.
In phase #5, the false positives are taken along with their probabilities and then sorted by their probabilities in order to further retrain the classifier. I have two questions about which I would appreciate to get a clarification:
1. why do you need their probabilities in that retraining phase? linear SVM doesn’t require their probabilities but rather merely their taggings (negatives in this case).
2. Why do you need them to be sorted? linear SVM doesn’t require any sorting of the training samples.
You would want to keep track of the probabilities for two reasons. The first being that you may not have enough physical memory to store the positive samples, negative samples, and hard-negative samples and train your SVM. In the case you are limited by RAM, you would want to sort the samples by their probability and keep only the samples the classifier performed “worst” on, discarding the rest. Secondly, you may be worried about overfitting.
Will Steps 4 and 5 work for a multiclass cascade classifier as well, that uses a boosting algorithm? I notice in your example you’re dealing with LSVMS.
Potentially, but there are other approaches that are more robust (and faster to train) than cascade classifiers.
According to the paper published by Dalal and Triggs, they suggest gamma correction is not necessary. I have the doubt about whether correcting gamma is a good option to go for or not. If Gamma correction is necessary what is the gamma value I have to take for better performance. Awaiting for reply.!!! thanks in advance!!
That is correct, gamma correction is not necessary. Normalization however is quite often helpful. You can normalize by either taking the log or the square-root of the image channel before applying the HOG descriptor (normally the square-root is used). Another method to make HOG more robust is to compute the gradient magnitude over all channels of the RGB or L*a*b* image and take the maximum gradient response across all channels before forming the histogram.
Thank you for your reply!! But why can’t we consider converting to grayscale and processing would be the better option?
You can certainly convert to grayscale and compute HOG as well. It’s just been shown that taking the maximum response over the individual RGB or L*a*b* channels can increase accuracy in some cases. Your mileage may vary depending on your dataset.
And we are actually stretching the contrast while we are gamma correcting, so which gamma value you think will be providing the higher performance? gamma of 2?
Please see my previous comment. I would not recommend using gamma correction. Just a simple log or square-root normalization should suffice.
In opencv the gamma correction can only be set or not set. I wonder what is the valued applied by default. For images which can be taken at night. I believe gamma correction might help?
Hi Sarah, thanks for the comment. However, log-based gamma correction normally hurts HOG + Linear SVM performance. The square-root (in most cases) or simple variance normalization is better option.
Ya, I got you. And I would like to thank for your valuable blog !!! really helpfull!!
Hi Adrian, this is great stuff. Do you have any tutorials on implementing HOG descriptors. Having that is really helpful for learning the essentials. Thanks.
Hi Morne, check out the scikit-image implementation of HOG. That is a great starting point!
Hi Adrian,
Great post! Are you still working on this or is it already completed?
Great question! The framework is 100% complete at this point. I’ll be covering it inside the computer vision course I am working on. Check out the Kickstarter for more information!
I have a suggestion for speeding up the search, and would like your opinion.
What if you applied the KPM, or Boyer-Moore to the search ? Meaning, convert the histogram ‘coefficients’ to a string representation, and do the same with the image being searched. Then, look at the ‘suffixes’ first.
I’ve used the Boyer-Moore, because it’s about 10x faster for string searching. In this application, it would be like looking at the right edge of a rect first, and then skipping over pixels when there isn’t a match.
Hi Adrian, awesome tutorial btw. Im quite a beginner in openCV. I want to create a computer vision algorithm that is able to detect license plates and read them. However, I dont really know where to begin. What tutorial do you suggest that I can start with. So far, I was able to create the document scanner in your tutorial in c++.
Thanks in advance.
Hi Jeck, great question. I’ll be covering license plate detection and recognition inside the PyImageSearch Gurus course. License plate detection and recognition is not an easy task, but inside the course I’ll be breaking it down into simple and easy to digest components, similar to other tutorials on the PyImageSearch blog.
Hi , i’am working at detection face and i have a problem. the algorithm didn’t detect the faces in in different position. what’s the problem ?
Thank you
Hi Miriam. Unfortunately, there could be many, many reasons why faces are not detected and your question is a bit ambiguous. Here are just some example questions for you to consider: Are you using HOG + Linear SVM? Have you chosen the optimal HOG parameters for your descriptor? What about your Linear SVM? Are you using a soft-classifier? Do your training examples sufficiently represent the faces that you want to detect in images? Are you performing hard negative mining to increase accuracy? Are you using image pyramids so you can detect faces at multi-scales? Or using a sliding window so detect faces in different positions of the image? Be sure to consider all avenues.
Ok
Thank you Adrian.
Hey Adrian, i really want to thank you for outstanding lessons you taught me about computer vision.
I was wandering, can we expect something about image stiching.Something like creating panorama images? 🙂
All best to you sir! 🙂
Hi Milos, thank you for such a kind comment! And we will absolutely be doing image stitching and panoramas within the next few months. I can’t set an actual date yet since I have a series of posts coming out soon, but I’ll be sure to let you know when the image stitching post goes live.
if my dataset is on colored images and color is a demarcating feature in my classification then should I use grayscale image or should I find gradient resultant on all channels of RGB.Will it help in the result
Dalal and Triggs actually recommend converting from the RGB color space to the L*a*b* color space and computing the gradient magnitude for each of the L*, a*, and b* channels, respectively, and then taking the maximum magnitude across all channels. In many cases this can improve performance.
Thanks for your great work. I have a question about training positive and negative samples. They can be of different size. Do you try to resize them to a predefined size? I concern resizing may change the original gradients orientation depending on how it is resized. Looking forwards to your answer.
Indeed, each object of each class should be resized to a pre-defined size to ensure the resulting HOG feature vectors are the same dimensionality.
Thanks. I am aware of that. Just want to make sure whether it will give distortion or not. If it does bring distortion to the gradient orientations. Maybe compensation is needed? Or maybe the distortion is too little to consider.
It will absolutely bring in some distortions since we are ignoring the aspect ratio of the image during the resizing, but that’s a standard side effect that we accept so we can get features of the same dimensionality.
Hey Adrian ,
Just wanted to say thank you. I am a student studying computer vision for the first time as part of a taught masters and this blog really helped me so much – you have a knack for explaining things intuitively 🙂
Thanks again,
Méabh
Thanks so much Méabh 😀 I’m happy to help. Best of luck with your masters program!
First I am just thank you for your wonderful and super easy to understand tutorials and perhaps the best available.
I have one question on Object Recognition using sliding window and SVM. I am using C++ and the SVM on OpenCV. On detection I get multiple windows where I need to apply Non-Maximum Suppress (which I learnt well from your tutorial). However, the linear SVM output is a hard decision of +1 for objects and -1 for non-objects. In this case its not possible to do NMS as all weights (considering the prediction response) are same.
I read a paper from Platt, 1999 to convert the prediction response to probability. However, I am wondering if you know there is any simpler or better way to achieve this perhaps.
Thank you for your time.
Hey Rish, thanks for the comment. Does your SVM library not support returning probabilities? Most SVM libraries do. I don’t do much work in C++ (literally none), but I know the scikit-learn SVMs support returning probabilities along with class labels.
Also, you can still do NMS without probabilities. Instead of using the probabilities, just use the bottom right-hand corner, like I do in this post on fast non-maximum suppression. It’s obviously better if you have the probabilities, but this approach will still work for you.
Hi Adrian, thank you for your prompt reply. I checked the LibSVM and it does return probabilities while SVM in OpenCV C++ does not. Hence I changed to using LibSVM.
Regarding the NMS, as your tutorial is implemented using bottom right-hand corner, I am little confused how to use Bounding Box probabilities instead. I guess maybe like this but if I am wrong than could please guide me on the right path.
–> Instead of computing the overlap should I directly use the probabilities of classifier output?
You’ll need to modify my NMS code to accept a list of probabilities that corresponds to the bounding boxes. Then you’ll need to sort those probabilities (Line 27) rather than the bottom-right corner.
Hi Adrian. Thank you very much I managed to get it working with LibSVM. However, I noticed that getting the probabilities from SVM is computationally more expensive compared to the label prediction (hard decision). I wonder if this happens or did I just got something wrong? Mathematically, I think estimating probability needs more operations. However, my question here is that if I dont use probability but just use label predication i.e (+1/-1) than use your NMS to get the final object region, do you think that is also good enough or Probabilities are important or does it improves results? (Apologies my poor English).
Secondly, about the HOG descriptor. Say for example I trained a classifier to detect standing bottle (say sliding window of 64 x 128) = 3780 dimension feature vector. During detection, I will use this window size to detect the bottle. However, my question is when the bottle is resting on its side (now say the dimesion is 128 x 64) but our detector uses the window of 64 x 128, how do I cater for this issue? I read paper on Part based model but I think PBM is computationally expensive. Perhaps I thought to divide the training image into 4 parts (say 16 x 32) and train this. Do you have any suggestion on this?
Hey Rish, you’re absolutely correct — using probabilities does increase the computational cost. However, if you have the probabilities you’ll likely get better more precise final bounding boxes. It really depends on your application and the level of tolerance you’re willing to accept.
As for your second question, HOG based methods, including the PBM model assumes there is a certain “structure” to the image. For example, if you trained your classifier to recognize a bike, it would look for two wheels, handlebars, a seat, etc. But if this bike were rotated 90 degrees, you would run into problems. HOG descriptors are structural descriptors and hence in your case you are going to run into a lot of problems if the object is rotated.
If you’re really concerned about finding all bottles in the image, including those resting on their sides, just take each input image and rotate it 0, 90, 180, and 270 degrees — you’ll be evaluating each image four times which is definitely comes at a considerable computational cost, but you’ll find the bottles even if they are lying on their side.
Hi Adrian. Sorry I was busy with my Finals so was unable to complete my work. Actually, I have got most of the HOG detection implemented in C++. As I mentioned above, that I want to change your NMS from Bottom Right Corner to probability sorting as you mentioned.
I think I am bit lost, I removed all the bounding box params from your NMS and use the Probability instead I am confused like in line 51 where you done the overlapping computation. How can I change this to probability instead? Do I need to do something is Gradient Decent? Sorry I am just confused, please help me out. Thanks
No need for gradient descent or anything fancy. When you make a call to you NMS function you’ll need to pass in three variables: the bounding boxes, the overlap threshold, and a probability of detection that corresponds to the bounding box. Then you need to sort (from highest probability to lowest probability) the bounding boxes based on their probability. From there, everything will be the same. Like I said, the only code that you’ll need to change is the code that sorts on probability rather than the bottom-right coordinate.
Hi Andrian,
I have few doubts regarding NMS algorithm using probabilities
1. What does probability means? It is the detection score by SVM(i.e. Wx+b value), isn’t it?
2. You said that we have to sort in decreasing order of probabilities. But I think we have to sort in increasing order, since we are picking from last.
Thank you for your wonderful posts.
The probabilities actually come from a modified logistic regression. As for the sorting of the probabilities, that depends entirely on the actual implementation of the NMS algorithm.
I am new to sklearn. I see many of you have advanced experience in this area. I am therefore asking for your assistance so that I can catch up 🙂
I have a number of images of sedans, SUV’s and Trucks that pass outside my house. I want to train a SVM classifier to detect whether a vehicle is a one of these three. I have cropped all of the images and successfully created Histogram Oriented Gradient images of these vehicles. My question is Do I have to write code to reshape the image and format it to get the data into a format useable by scikit-learn or is there code already written to do this. I am grateful for your help.
Visualizing a Histogram of Oriented Gradients image versus actually extracting a Histogram of Oriented Gradients feature vector are two completely different things. While you can visualize your HOG image, this is not appropriate for training a classifier — it simply allows you to visually inspect the gradient orientation/magnitude for each cell. Instead, you’ll need to extract the HOG descriptor using something like scikit-image and then pass the feature vector into scikit-learn. As I mentioned in an email to you, I’ll be covering all this inside the PyImageSearch Gurus course.
Hi Adrian,
Yes, I understand that the HOG image is not useable for integrating into Scikit learn. I understand that it needs to me reshaped into a matrix of row vectors so that we have number of samples vs number of features. Are you telling me that this is done in scikit-image? Thanks for your help.
Yes, take a look at the documentation/example. If you take a look at the Handwriting Recognition chapter of Case Studies, you’ll learn how to extract the HOG feature vector. Secondly, the HOG image does not need to be reshaped — the HOG image is essentially useless for anything but visualization. You’ll need the HOG feature vector (which scikit-image provides you with) to create your classifier.
Hi Adrian, it occurred to me that OpenCV’s hog may be a better choice. I looked into scikit-image’s source code of hog and found it does not implement weighted vote for histogram orientation binning. It only does simple binning based on magnitude value falling into which bin, which is not quite an accurate way for computing hog.
Actually, I found in my experiments that object detection based on OpenCV’s implementation is more accurate than with the one from scikit-image, and runs faster.
Thanks for sharing your experience Jay! I’ve been using a personal modification of the scikit-image HOG that I’ve been meaning to create a pull request for. It handles various types of input transformations (square-root, log, variance) along with multi-channel images, allowing you to take the maximum of the gradient across the channels (this tends to work better in practice). I’ll have to look into weighted vote binning as well.
Dang, you’re awesome man… That one helps a lot for my final year project…
Cheers…
I’m happy it helped Timothy! And congrats on your (impending) graduation!
great post!
I just want to know that how many training images are required to make a good classifier?
The answer is, it depends! Some problems are quite simple and require very few training examples. Other problems are very, very complex and require many training examples. A good rule to start with at least 100 positive training examples of what you want to detect, followed by 500-1000 negative training examples. And then perform hard-negative mining and select another 500-1000 false positive examples. This approach will likely not give you the best performance, but it will give you a baseline to further tune your parameters. Training a classifier is an iterative approach, don’t expect your first attempt to give you the best performance.
Thanks a lot! 🙂
Hi Adrian,
I am a student of BS computer science i have started working on human detection in image using HOG descriptor. I have a problem in testing phase step 4 and 5. Are you telling me what i do i use a matlab as a tool.
Hi Abbas, I honestly have not used MATLAB in a very, very long time so I am probably not the right person for that question.
Thank you for your work , May I sak you to direct me to some implementation of hard-negative mining. Actually I didnt get the point of how to reuse the false negative data ! . Should I use multi instance Svm ; I mean like clustering the false positive .
In case I get false positive by my trained classifier on negative train data, should I delete them from my train data ? or keep them with negative samples ? I am a little bit confused on what the procedure should be taken during retrain the classifier , May you help on that !
Cheers ,
So basically, hard negative mining is all about finding the patches of images (and their associated HOG feature vectors) that the Linear SVM gets totally and completely wrong. For example, let’s pretend we are training an object detector to detect faces in images. The first thing we’ll do is extract HOG features from a positive dataset (that contains lots of examples of faces) and a negative dataset (which contains absolutely no faces at all, just a bunch of random images). We then train our Linear SVM.
However, this classifier may falsely detect faces in images where there are no faces. So, we run our classifier on the negative data (which contain no faces what-so-ever), and we collect any HOG feature vectors that the classifier incorrectly reports as a face. This process is called “hard-negative mining”. In your case, if you get a false-positive from your trained classifier you want to keep that data since you can use it to better improve your classifier.
Finally, we need to re-train our classifier, which is just a 1-vs-1 SVM: either “face” or “not a face” using the HOG feature vectors from the original face dataset, the HOG feature vectors from the non-face dataset, as well as the hard-negative HOG feature vectors.
We then let this classifier train, with the end goal (normally) being that our classifier better detects faces, while ignoring non-faces.
hi adrian!
in your view, do you think the success of convolutional NNs have made other image processing techniques (eg hog + svm) obsolete? thanks!
Definitely not. CNNs, and deep learning in general, is a tool just like anything else. There is a time in a place for it to be used. In some situations it’s warranted. In some situations it’s overkill. And in other it’s just the flat out wrong choice. Take a look at my post on the deep learning “band wagon” mentality that rises up every 5-7 years.
Hi, I have two questions after reading this post
1 : Would you show us step 1~step 6 in your blogs just like the pokedex example did?I found step 6(https://www.pyimagesearch.com/2014/11/17/non-maximum-suppression-object-detection-python/) but no step 1 ~step 5.Or I have to wait until the course ” PyImageSearch Gurus ” start?
2 : The SVM of opencv do not provide probability estimates, could I use Random tree of opencv to replace SVM?Or I should find another SVM libraries to develop the algorithms?
Again, thank you for your brilliant posts
1. I plan to review Steps 1-6 inside the PyImageSearch Gurus course. I might cover these steps on the blog later on, but right now I am planning on detailing every step inside PyImageSearch Gurus.
2. I would recommend using the scikit-learn implementation of the Linear SVM.
Thanks for your suggestion, I have some questions to ask(new to the field of machine learning)
1 : what are the pros and cons of using the scikit-learn linear svm vs the random forest of opencv3.0?I think the random forest of opencv proviide probability estimation too(http://stackoverflow.com/questions/28303334/multi-class-image-classification-with-probability-estimation)?
2 : Since HOG is a feature descriptor(am I right?), is it possible to use another descriptor to describe the object and feed to svm?Like akaze, kaze brisk, freak(In truth, I do not know their different) and so on
Simply put, a Linear SVM is very fast. A random forest classifier is capable of generalizing to non-linear spaces; however, it requires computing N number of trees. It also doesn’t work well for sparse feature spaces which Linear SVMs excel at. SVMs can also return probability estimations as well.
2. HOG is indeed an image descriptor. You could certainly use any other image descriptor and feed it into your classifier, such as Local Binary Patterns, Zernike moments, Haralick texture. And yes, you can AKAZE, BRISK, SURF, SIFT, etc.; however, this requires the added step of creating a bag-of-visual-words prior to passing the feature vector on to the classifier. I am covering all of these topics in detail inside the PyImageSearch Gurus course. And if you haven’t played around with machine learning, Practical Python and OpenCV is a great space to start.
According to the dlib document(http://dlib.net/fhog_object_detector_ex.cpp.html), HOG is good at detect semi-rigid object, I have two questions
1 : what is semi-rigid object?
2 : smoke is rigid-object or not?should I use HOG + SVM to classify smoke and non-smoke object?
A semi-rigid object is something that has a clear form and structure, but can change slightly — consider how a human walks. We put one foot in front of the other, maybe move our arms a bit. The form is clearly the same, but it does change a little bit. HOG can be used to detect semi-rigid objects like humans provided that our poses are not too deviant from our training data. Finally, smoke is definitely not a semi-rigid object. Smoke diffuses into the air and has no real shape or form.
Thanks for your reply.What kind of descriptors and machine learning tools you will recommend if you want to classify smoke?
The thesis “Forest smoke detection using CCD camera and spatial-temporal variation of smoke visual patterns” use random forest
Another one “SMOKE DETECTION USING TEMPORAL HOGHOF DESCRIPTORS AND ENERGY COLOUR STATISTICS FROM VIDEO” use HOG + HOF(I do not what is this yet)
I haven’t go through the details of these papers yet, the second paper claim that the HOG can separate rigid object and non-rigid object very well.They use HOF to estimate the motion of the smoke, but it would be a problem if the video is not stable
Some implementation(Video-based Smoke Detection: Possibilities, Techniques, and Challenges) even do not use machine learning at all
I don’t have any experience with smoke detection, so take this with a grand of salt, but I would apply something like a spatial pyramid, some really basic color channel information in the L*a*b* and then I would create an experiment where I try a bunch of image descriptors such as Local Binary Patterns (I would be really curious about this for this problem), Haralick texture, and I would even try HOG to see how it would perform. My guess is not great, but that’s why we perform experiments.
Thanks for your suggestion, I will try to find out which solution fit the problems better.
After studying the cases of Plant classification, Handwriting recognition and some survey on machine learning(biased/overfit), I have more sense about what are this post talking about.
Hi Adrian,
Awesome job, your posts are the best posts about OpenCV that I found!! Congratulations!!
I have one question: Did you ever worked with SVM for ONE_CLASS? I tried to find something about it but got nothing.
Hey Douglas, thanks for the comment. Personally, I prefer to use the scikit-learn machine learning implementations. I find the ones implemented in OpenCV to be a bit cumbersome and not as customizable.
Hi Adrian,
Thank you for this great tutorial !
I’m trying to train a svm for a face detection system. I’ve got positive and negative examples.
Do you know the size I should choose for the hog extraction ? I know it’s often 64*128 for human detection, with blocksize =8. But I don’t know for face problem.. I’m trying to calculate hog features on 25*35 images, with the function hog.compute(…) but it’s not working…
Any help ? 🙂
Plus, how do you know the optimal parameters for svm ?
Thanks !
You normally would experiment with the optimal values for the HOG descriptor, the same goes for the parameters of the SVM. I actually prefer to use the scikit-image implementation of HOG and the scikit-learn implementation of the Linear SVM. You can then apply a grid search and cross-validation to find the optimal values.
Hi Adrian,
As a research problem, I want to apply HOG for Content based image retrieval and definitely it is computationally expensive for large datasets. Please guide what to implement and test for efficient image retrieval. I’ve been tested the effects of changing block, cell, bin, orientation angel etc. , it give a bit performance gain but how does is this sufficient for defense?
Thanks,
Ahmad
Hey Ahmad — can you let me know what the computational bottleneck is for your particular project? Is it the feature extraction itself? Or is it the actual search that is slow? If it’s the search, it’s probably because you are doing a linear scan of the dataset (i.e. comparing the query features to every other feature vector inside the dataset. If at all possible, I would suggest using an approximate nearest neighbor data structure (such as Annoy) to speed up the actual search.
Thanks Adrian for very quick reply.
Actually I’m pursuing my PhD. For my research contribution the above query was asked. In my last effort I used HOG with PCA and classify using SVM at Corel dataset. Then I tuned HOG with its parameters like bin,angel,block size etc. but it is not a contribution. I’m stuck at how can I add my contribution.
Ideas:
> To focus ‘rotation invariance’ in HOG, use HOG+LBP is worthable?
> What to apply to achieve more efficiency , recall rates with HOG?
New your valuable input on it!
Thanks,
Ahmad
Nice, congrats on working on your PhD, that’s very exciting. As far as rotation, HOG is definitely not rotation invariant. In you use only uniform LBPs you can achieve some level of rotation invariance. I don’t recall any papers off the top of my head that have combined both HOG and LBP together, although it seems like there should be papers on the topic — it might be worth doing some more research in the literature and perhaps trying an experiment to see what your results look like.
hi adrian,
what do you think is the next evolutionary step for deformable parts model? it seemed like it was the darling of the cv community 3-4 years ago…
The next evolution is already here and (no pun intended), it’s called Convolutional Neural Networks. CNNs are very accurate for image classification and object localization. However, the largest drawback is the incredible amount of data it takes to learn even a single class. All supervised machine learning algorithms require training data, but CNNs are particularly data hungry. In those cases where data in minimal and data augmentation is not a possibility, HOG-based methods still work very well.
Thank you for this great tutorial !
Thanks Chi! 🙂
Hi Adrian,
While I totally agree with you that the classifiers provided by OpenCv are pretty frustrating, I would not immediately follow in the conclusion that this has to do with the Viola Jones detector as such. I.m.o. it simply says that those cascades were (very) poorly trained. When you see these cascades come back with mere patches of a face, rather than a properly framed face entirely, it’s clear that the training images had poor alignment to start with. They probably carved out some faces by hand and then told the training program that these were faces… Yeah, well… I think it’s fair to say that the cascades from OpenCv merely serve as demonstration material, and that they were never optimized to be production-proof. A properly trained Viola-Jones detector, however, can yield amazing results.
That said; I’m interested to test your Python framework one day. Sounds pretty promising!
Grtz,
Philippe
Hi Adrian,
Will it produce good (accurate) results, If we use HOG features and SVM for vehicle licence plate detection?
I only need to detect the rectangle region where license plate is present, recognition is not necessary.
It really depends on your license plate dataset. If the license plates are skewed or rotated, it will be harder for HOG + Linear SVM to return good results (since HOG is not rotation invariant).
Hi Adrian,
Thank you for all your great tutorials!
I’m trying to implement Exemplar SVMs with some friends for a school project but we got stuck. We followed your tutorials but our classifier doesn’t detect anything. It seems that each single classifier doesn’t train properly: as the number of negative sample grows, we get only 2 support vectors (could it be a problem?). We tried linearSVC(loss=’hinge’), SVC(kernel=’linear’) and linear_model.SGDClassifier() but nothing changes. Maybe we’re doing something wrong with features extraction? Any hint?
For help with Exemplar SVMs, I suggest reaching out to Tomasz Malisiewicz, who authored the original paper on Exemplar SVMs. It’s hard to say what your exact problem is, but when you read through the original paper you’ll see that they use a “calibration” step where they tune the support vectors for each classifier. I have never implemented Exemplar SVMs before, but it seems like this could also cause a problem with your classification. It’s also hard to tell if feature extraction is your problem without knowing your actual feature extraction process.
sir,
i read all the above information regarding HOG.i am currently working on HOG for human detection.do you have any matlab codes for HOG with which i can go on with the project.
thankyou in advance
No, I do not have any codes related to MATLAB for HOG detection, only Python. You can find my Python implementation of this HOG + Linear SVM framework inside the PyImageSearch Gurus course.
Hi Adrian ! Awesome tutorial !
I am using the HOG descriptor from skimage to create a traffic stop sign detector but the process is too long ( approximately 0.5s for a (255, 496) frame).
Did you use the same descriptor as me or the one from Opencv-python ?
Do you have some tips for increasing the speed ?
I use the same scikit-image descriptor as well. As for increasing speed, try splitting processing of the image pyramid to different cores of the processor, that way each core can process a separate layer of the pyramid independently.
Ok !
I have registered to the PyImageSearch Gurus recently.
Are you going to talk about multiprocessing module in the course ?
I don’t directly cover it in the course, although it’s an easy add-on once you’ve seen the source code for implementing an object detector from scratch. Of course, I also share more resources to make multi-processing easier inside the course 🙂
Hi Adrian,
Thank you so much for this perfect step-by-step instruction. I’m working on recognizing traffic signs using SVM + HOG. The algorithm is great and the accuracy is very reasonable; however, the speed is deriving me crazy. For each image, of hundreds, it takes more than 5 minutes to yield the results. I really need to shorten this time to at most a minute or less. The sliding window is the main problem. How can I handle the slow moving of sliding window?
Thank you in advance.
The sliding window itself isn’t the actual problem — it’s the time associated with extracting the HOG features and then passing them through the SVM. Your pyramid scale may also be hurting performance. I would suggest increasing the size of the image pyramid along with the step size of the sliding window as much as possible without hurting accuracy. The larger the pyramid scale and the sliding window step size are, the less window need to be evaluated.
What about training a Haar (well, LBP) cascade first, lowering false-negatives rate and not bothering much about false-positives. Then, instead of sliding the window just go through every object detected by cascade.
I didn’t tried that myself, but maybe you did? Or is there is something wrong with that?
There isn’t anything “wrong” with that approach, but it does imply that you need to train another classifier and tune any-and-all parameters associated with them. It would be better to invest your time in creating a more accurate HOG classifier or pushing the computation to the GPU. Another approach is to do more research on “region proposals” which is the process of iteratively finding “interesting” regions of the image to classify rather than applying an exhaustive sliding window + image pyramid approach.
Hi Adrian great post!
I’m trying to build my own SVM and detect object them on test image using hog.detectMultiScale.
I tried to build a SVM classifier from sklearn and opencv using crossvaligdation. It gives a good accuracy during the cross validation and also while using other test images (however the opencv one just gives me one support vector which is kinda of weird).
If i use sklearn SVM, I load the coef_ attribute of my SVC object (which corresponds to the primal sv) using setSVMDetector but the detection gives a lot of false positive and not even a true positive on the images that were well detected with the sklearn predict…
Have you trained your own svm model and used it with detectMultiScale? Any ideas of what I could do wrong.
Thanks a lot!
I personally don’t like using OpenCV to train a custom HOG detector from scratch. Instead, I prefer using scikit-learn as it gives me more control. I detail the general process in the post we’re currently commenting on and then demonstrate how to implement HOG + Linear SVM inside PyImageSearch Gurus.
Hi Andrian,
I am working on an object detector for fallen people using HOG. As a part of that I have collected some data manually and used some data available online. But I am not obtaining good accuracy. However my data set size is of size train(fall – 400 neg – 1031), test(fall – 246 neg – 503). After initial training I did hard mining also. I have used HOG parameters like cell size – 8×8, Block size – 16×16 Window size – 128×64 Bins – 9. How to improve accuracy?. Accuracy depends on the available data set size, doesn’t it? Should I play with cell size and other parameters? Thanks in advance.
In general, the more data you have the better. In terms of negative samples, I would try to increase this dramatically. Ideally you should ~10x more negative samples than positive ones, especially for HOG. Other than that, you will need to play with HOG parameters, but I think this is less of a concern until you can get more data.
Hi Adrian. Where is the actual implementation and Code for HOG +
SVM, i see in this post it was supposed to be done next week
Sorry for any confusion, but the actual implementation of HOG + Linear SVM is inside the PyImageSearch Gurus course.
Is it possible to define a distance function between the HOG (or other) descriptors for two images? My application is just to iteratively find interestingly similar images, without seeing duplicates.
Absolutely! Typically you would use either the Euclidean distance or the chi-squared distance. Both are implemented in NumPy/Scipy.
Given two sets of descriptors, one interesting metric might be the https://en.wikipedia.org/wiki/Hausdorff_distance which would be the worst closest match distance between set features, or maybe the average of the closest matches.
For comparing histograms, I would recommend utilizing the chi-squared distance — this will give you reasonable performance.
Thank you Adrian
hi ,
feeling somewhat hard to learn..
trying to implement the code..
a hats off to you..
Thank you Adrian it’s very good
Good Luck
Hi Adrian,
In your post you said number of negatives samples >> number of positives samples for the training. I did a bit of research on internet and usually people tend to use balanced data for a binary classification. Could you explain why you are using unbalanced data in this case?
Indeed, that is a great question. Normally we would use a balanced dataset for classification. So why do we use so many more negative examples in the HOG + Linear SVM framework?
The reason is false-positive detections. If we don’t use a lot of negative examples we get many false-positive detections.
Thanks for the answer. I asked because I’m wondering if the unbalanced or balanced data could actually be the cause of some of the errors I’m encountering .
First I’m doing cross validation to confirm i have good training tests. Even with unbalanced data, i get a low FNR and FPR during the training. But when using the detectMultiScale on a video I get much more FP.
Does that mean my model is overfitting during training or my datasets are not good enough?
Secondly, I’m trying to reduce this number of FP by using hard data mining. The optimal number of hard negative samples N and the optimal C of the new retrained model are found via random research. But usually the lowest FNR and FPR i get is using 0 hard negative. Which surprise me.
Does that mean my model was really overfitting and adding some more examples is worst. Should i based my choice of optimal C and N by testing another set rather than getting a FNR and FPR on my training set during cross validation? Any other suggestions? Thanks for your time!
It’s hard to say without seeing your datasets. I would start by asking yourself if your training data is representative of your testing data. If your training data doesn’t look anything like your testing data then you can expect to get strange results.
As you suggested, it’s normal to set your number of samples and SVM value of C based off a random search or grid search — this is a pretty standard procedure. Given that you are getting the smallest FPR rate when zero hard-negatives is used leads me to believe that your training data might not be representative of your testing data.
For the positive training set, i’m using images from a scene view by a fixed camera. The test images are also taken from this camera. But the negative training set, I’m using the one from INRIA. Should a negative training set be whatever scene without a positive instance or should it reflect the scene (I’m working indoor so should my negative set be only a view from the camera without my object)?
It really depends on where you depend on deploying your classifier in a production environment. It’s very common to use negative instances that simply do not contain objects that you do not want to detect. That said, if your production environment will have a lot of varying background clutter then I would highly suggest using negative views from your camera.
hi
Which pyramid shoud be used while implementing with HOG for person detection and why?. Pyramid I mean here is gaussian and laplacian etc
If review image pyramids for object detection in this blog post. You should actually stay away from blurring your images in the image pyramid when applying the HOG descriptor was that will decrease your accuracy.
Bro! I don’t like this site. I come here to read about one topic and ends up spending an hour or two reading different articles 😛
Hey man,
Great site. I have a question for you. If you train your svm with a certain kind of object size in mint (say 60X180) , then how does the detectMultiScale method figure out what kind of crops it needs to take from the image input so that the descriptor from the compute method will be a vector that’s the same size as the ones used in the training process ?
Thanks,
Gabriel
If you want to train your own custom object detector using HOG + Linear SVM then I wouldn’t recommend using OpenCV. I would instead use methods I detail inside the PyImageSearch Gurus course. The Python + OpenCV bindings for HOG are not very friendly. You also don’t have much control over the training process.
I just found out toiday the hard way that setSVMDetector actually requires a vector instead of a svm object. At this point it seems easier to just build my own hog detector with sliding windows and pyramid images and use that with a sklearn svm. Thanks a lot for the reply and i’ll say it again that i really appreciate the info on this site
No problem Gabriel, I’m happy to help. I know I already mentioned this so I don’t want to beat it to death, but I do demonstrate how to train your own HOG detector using sliding windows, image pyramids, and scikit-learns SVM module inside the PyImageSearch Gurus course. If you get stuck implementing on your own (or would like to learn from well documented tutorials) then I would definitely suggest taking a look at the course.
Hi Adrian,
Great site for OpenCV and Image Processing.
I have some questions:
1. As you said in Step 2: Sample N negative samples from a negative training set that “does not contain” any of the objects you want to detect and extract HOG descriptors from these samples as well. In practice N >> P.
So example if I want to train a smile detector, the positive images contain many smiling faces and the negative are not smile faces. It’s mean that both positive and negative are include face, so is it contradict with: “does not contain” ?
2. In case of smile detector, I want to make a classifier with 3-classes: normal – opening mouth (not smile) – smile. I used Viola-Jones algorithm and it’s not good. Now, i’m using Logistic Regression (opencv) to make 2 binary classifier: normal – opening mouth, normal – smile. It’s better than Viola-Jones, but it still get many false-positives. So in your experience, is that HOG+Linear SVM better? Or could you suggest any approach?
3. With HOG + Linear SVM Model or any Model you suggestion, can I save this model to re-use it in mobile, example save model as xml file and load it in Objective-C (iOS) or Java (Android)? So I’m just predict later, not training again. Finally, may I run realtime in mobile (30fps) with your suggestion model on mobile device?
1. Semantically this is not a contradiction, but I think you’ll run into issues if you use your frowning lips as negative samples. HOG is good at discriminating between objects, but lips are still lips and look similar. I would experiment with both, but my guess is that you’ll find better performance having the detectors trained on datasets that do not contain examples from the other set. Overall, I would suggest you take a look at “facial landmarks” (keypoints detected on the face). The geometry of these points might help you build a more accurate classifier.
2. HOG + Linear SVM tends to majorly beat out the Viola-Jones detector in terms of both accuracy and false-positives. If you’re getting many false-positives you should be applying more aggressive hard-negative mining.
3. I’m not sure about this since I don’t do much mobile development. You might have to train the model on a library specific to iOS/Android, but once it’s trained, you can deploy it into the production devices.
Thanks a lot
Hi Adrian,
I am doing a project on traffic sign detection. Ideally how many hog features should be extracted per image for accurate results if the dataset contains about 2000 positive images and 5000 negatives? Also which svm kernel is preferable?
The number of extracted HOG features is entirely dependent on your dataset. A good rule of thumb is to have approximately 5-10x as many negatives as your positives and then perform hard-negative mining. As far as a kernel goes, a linear kernel tends to be standard due to how fast it is. Given your current dataset I would suggest performing an initial experiment to obtain a baseline. From there you can try adjusting the number of negatives and hard-negatives.
Hi Adrian,
Regarding the detection performance. How do you measure it as compared to measuring performance of classifiers using metrics like precision and recall?
It seems to me you need to first evaluate the classifier itself with a set of test (labeled) samples to measure the discriminative ability of the classifier and then evaluate detection performance in some other way.
My impression is that the detector feeds a very large number of “non-objects” to your SVM, hopefully giving you lots of true negatives and a few (or none) false positives plues a few true positives related to the actual object (which can then be processed with non-maxima suppression). I think the typical ML metrics are difficult to apply here, even for a single frame, given the in-balance between positive and negative test vectors.
So, I guess my question is, how do you evaluate your models after training to decide which one is the best?
Thanks,
Gerardo
Hey Gerardo — I actually cover the answer to that question in my latest blog post. You use the Intersection over Union metric to measure the quality of a detection.
Hi Adrian, thank you so much for your great article !
I’m learning about human detection using HOG descriptor + Linear SVM.
I am using Verilog HDL to build my system on FPGA board.
My huge issue is : how to train the input dataset by using HOG descriptor? The detailed steps? Because verilog HDL does not have library for any the mathematical function? So, I am hopefully received your detailed steps ?
Thanks so much!
Hi Cam — I can’t speak for the FGPA side of things (since I don’t have experience in that area), but I provided very detailed steps + source code to train your own custom object detector using the HOG + Linear SVM method inside the PyImageSearch Gurus course.
Hi Adrian ,
I am a huge fan of your blog and really appreciate what you do . I am creating my own object classifier for sign language detection . I am facing some problem with localizing the hand of the person in an image. The hand can be in a square box of size in the range 70 to 220px . I wanted to know how can I apply both the image pyramid technique and hard-negative mining. I mean for which square size do I train my binary classifier and how do I go about the process.
The actual size of the hand doesn’t matter as long as the aspect ratio (ratio of width to height) is the same. That is the entire point of applying a sliding window + image pyramid — to detect objects at various scales and locations of an image. I detail a thorough solution to training a custom HOG + Linear SVM object detector inside the PyImageSearch Gurus course. I would suggest starting there.
Hi Adrian,
How can we retrain an existing svm without starting from scratch the retraining process?
While performing hard negative training I got a Memory Error in the end, apparently the number of final hog features exceeded the numpy array size! (My dataset contains 10K positives and 60K negatives, but I performed hard neg mining on 16K negatives. The number of hog features extracted from every image is 1568).
Thanks in advance! 🙂
There are machine learning algorithms that can be trained in batches and sequentially updated — SVMs are not one of them. In this case, you’ll need to reduce the size of your training dataset. I would suggest reducing the size of the true negatives and keeping only the most confident hard-negatives.
Hi,
Given that for training the classifier I’ve read you need anywhere from 1000 to 50K negatives, where can you get that many pictures from?
I’ve got 600 positives and want a couple of thousand negatives but downloading the same resolution images from Google would take months.
Is there a place you can download a picture set from?
Any help would be gratefully received.
In this case you have two choices:
1. Use your positive set and crop negatives from regions of the image that do not contain the object you’re trying to detect.
2. Use an existing dataset. There are quite literally 100’s of them and I would start there.
How to detect traffic sign using hog + svm
I discuss how to train your own custom object detectors inside the PyImageSearch Gurus course.
Hi Adrian! I really like your posts which are easy to understand and very handy to practice. I am currently working on detecting fish in images. I have used the method mentioned in this post. But I do not get a good performance. I want to know is ‘N >> P’ meaning the number of neg samples should be far greater than the number of pos samples? And could you tell me are there any theoy to help me adjust the size of sliding windows, step size and scale size? I just want to know what I am doing wrong. Sorry for my poor English. Looking forward to your reply! Thank you in advance !
Correct, your negative samples should be far greater than your positive samples. I go into more detail on how to properly set your sliding window size, step size, and scale inside the PyImageSearch Gurus course.
Take a look at the text describing figure 2. It misspells “shown” as “sown”.
Thank you for bringing this to my attention, Suhas. The description for Figure 2 has now been corrected.
Can i use HOG and SVM to classify blur or non-blur image?
The HOG + Linear SVM framework detailed in this blog post is for object detection, but yes, theoretically if you have a dataset of blurred and non-blurred images you could extract HOG features and train a SVM to label regions as blurred or non-blurred. I also have a blog post on blur detection here.
Hi Adrian,
Thanks for your reply, i have tried blur detection approach. As my dataset is smooth (face images), not getting good accuracy using that Laplacian variance approach. Looking for good approach in this case. Any insight on this would be a great help.
Thanks,
Veeru.
If the images have a smooth texture than the variance of the Laplacian by definition will be small and unhelpful. In that case you might want to consider looking at the distribution of values in the Fourier space.
Hello Adrian,
Thank you very much for the nice tutorial about the HoG. I am using HoG for the grass weed detection. From the literature, I found that HoF is not rotationally invariant. I have 2 classes of Weeds and each class has 18 images. In my training and testing database, I have rotated each image at [5 10 15 20 … 355] degree.
The training and testing are done using LibSVM package. and I am getting an accuracy of about 80%.
My question is if the HoG is not rotation invariant then how can I get such high accuracy?
You are correct, HOG is not rotation invariant. As for your training and testing set, it sounds like you rotated each of the images in 5 degree increments to add more data to your training set? It might be the case that your grass weed objects are approximately symmetrical in which case the rotation wouldn’t matter.
hi Adrian
please help me out
I have performed all of the steps stated by you for face detection using HOG method. I had 6000 positives samples and 9000 negative samples then I performed hard negative mining ( with sliding window and pyramids ) and got around 70000 false positives. Then I trained again my linear SVM with 70000+9000+6000 samples , but still i am getting many false positives and false negative on my testing sample.
where I am going wrong ?
My positives samples are faces of people with litlle bit of background and there neck and chest also. And also i don’t have any bounding box ( ROI ) on my training image. is that wrong ?
please answer me.
thanks
You absolutely need the bounding box ROI of the training and validation images. Otherwise, you are computing the HOG descriptor for the entire image. The purpose of the HOG + Linear SVM detector is to detect and localize objects in an image. If you don’t have the bounding box points for these regions, then the detector will not work.
Hello Adrian
thank you so much for your great article!
In step 6, about overlapping bounding box, you said: ” Triggs et al. suggests to use the Mean-Shift algorithm”.I want to know that What is the reference?
Take a look at this StackOverflow question for more information.
Hi Adrian,
I have used HOG for detecting damages, but classification is coming incorrect. How do we perform performance tuning? Attempted changing opencv HOG parameters but no impact. What are guidelines and best practices for training?
Thanks
Whether or not HOG + Linear SVM is a good choice for detection is highly dependent on your images. If the “damages” vary in rotation and viewing angle then HOG + Linear SVM will not work well. It would be better for me to understand what your dataset looks like to provide any type of suggestion here.
Hi Adrain,
Dataset is based on detecting car vehicle damage – front damage, back damage,side damage – images are taken from google. Is it possible to share your mailid -> I will send training dataset + code snippet?
Thanks
You can send an email I can try to point you in the right direction regarding your dataset and the best techniques to apply. However, please keep in mind that I receive 100’s of emails per day, many of them asking me to review code. Reviewing another programmer’s code is a very time consuming/tedious task. It’s not something I can do. I hope you understand.
hello, i want to change the descriptors and than opencv does only support the function ‘detectMultiScale()’, so i need write a function to detect the location. Well, i accomplish it ,but it is so slow that 640*480 image will cost one minites ..what should i do ,thank you .
Can you elaborate on your question? Is your goal to speed-up the sliding window + image pyramid step? I’m not sure I fully understand.
Hello sir, I am new for Image processing..and I have 2 questions for you Sir,
1. Please tell me the preferable algorithm and technique for Multi face recognition according to performance, accuracy, ..
2. I need a face recognition approach for students attendance system, so if You have any suggestion?
Thanks in Advance.
Face recognition algorithms will work regardless of the number of faces in the image, provided it is a clear view of the face and sufficient resolution. First, you need to detect the faces. Then, you can take each detected face and recognize them. Popular algorithms include Eigenfaces and LBPs for face recognition, all of which I cover inside the PyImageSearch Gurus course. Otherwise, you might want to look into deep learning-based embeddings.
Thank you vary much Sir for your response,
Is the deep learning based approach more advance according to flexibility, accuracy than other face recognition algorithms? because in my system might be the persons face is rotated, or high occlusion is there…in which algorithm can I go ahead..if your suggestion is deep learning, in what method? CNN, ANN, RCNN, or other else..I am doing my M.Tech dissertation.
Yes, deep learning based face detection methods tend to be much more robust to changes in rotation, occlusion, etc.
Hey Adrian,
this is a very very helpful blog, you are doing a great job. i am working on my research. i keep visit your blog and subscribed to receive weekly updates. it is wonderful to see your work. keep it up :=)
Hi Sadia — thank you for the kind words. It’s great to have readers like you — without readers, this site would not be possible.
Great Post . Have you written code for the above described method . if yes kindly share link here thanks
Hi Abdul — my implementation (and faster variations) of the HOG + Linear SVM framework are covered inside the PyImageSearch Gurus course.
Hello Adrian,
Very good tutorials! I wanted to ask about the speed of HOG+SVM using Image Pyramids. What speeds have you seen in traditional PCs without any optimizations? Just a rough idea would be great, could they perform real time ? I know there could be faster implementations by sharing the Pyramid layers to different CPUs for example, but I have limited computational power. In this regard Haar Cascades work pretty fast, although as you said, they are a pain to tune.
Thanks!
It really, really depends on the implementation. A well implemented HOG + Linear SVM detector can obtain real-time speeds without a problem — the key here is obviously one that is well implemented.
Hi
thanks for your great post
I need to detect some special points on a face image like eyes center, bottom of nose, center of mouth and some other points, can you help me by Introducing good algorithms for my work?
thank you.
I would suggest taking a look at shape predictors and in particular facial landmarks.
In step 6, probability is used to decide if a positive result will be considered true-positive. Although true-positives have higher confidence score on average than false-positive, you still gonna lose many true-positives by doing that.
Even after doing hard negative mining I am still getting false positives result …how to solve that ?
Please help !!
Hey Ritu — can you explain what type of images you are working with? There are a few parameters you can tune here, including the SVM itself and the HOG feature extractor. I cover how to tune these parameters inside the PyImageSearch Gurus course which may be a good starting point for you.
Thanks for this amazing blog! After working on pedestrian detection with HOG + Linear SVM, I decided to apply it to a new problem that I thought was going to be easier 🙁 I am working on buoy detection, these particular ones are long and rectangular and really make a contrast with the water. For some reason my algorithm detects correctly some buoys far away during the first iteration but then it stops detecting the closer buoys, even when they are more similar to the training samples. Also, it keeps having a high number of false positives, specially in the last iterations on the pyramid, and even after retraining on these negative samples it still detects them! And the Linear SVM has accuracy of 100% supposedly. Do you have any advice? How do you make all your samples have the same size? I used PILs resize. And do you do anything particular in the retraining steps? How do you use the probabilities? I will greatly appreciate your input.
Hey Sandra — it can be a bit of a pain to tune the hyperparameters. It’s hard to say what the best approach would be without knowing what library you are using to train your detectors and the parameters used. I cover my own personal preferences and object detection practices inside the PyImageSearch Gurus course. I would suggest starting there.
hi Adrian,
is it possible to use object detection with HoG in unsupervised learning and also with object occurring multiple times?
No, HOG + Linear SVM requires training data and labels together. It cannot be used in an unsupervised fashion. You should consider looking into “region proposal” algorithms such as Selective Search. They will not be able to tell you the label of a particular ROI but they can at least say there is something “interesting” in this region.
hello, prof. Adrian Rosebrock
the post is really useful and very easy to understand, but i have a question for you .
in the post , Step 1: is prepare positive samples , the number is P, Step 2 is prepare negative samples , the number is N , you also said that N >> P. if N >> P, this leads to unbalanced dataset, cannot get a better accuracy if you donot enhance the P dataset or use class-weighted SVM, am i right? i think N is about equal to P is the best, am i right, hoping you can expain for me , i am really appreciate
ps: i refer to What should be the proportion of positive and negative examples to make a training set result in an unskewed classifier? at https://www.researchgate.net/post/What_should_be_the_proportion_of_positive_and_negative_examples_to_make_a_training_set_result_in_an_unskewed_classifier
You are correct, this does lead to an imbalanced dataset. Object detection is a much more challenging problem than simple classification and we often need far more negatives than positives to reach a desirable accuracy. Therefore, we purposely unbalance the dataset.
Hey Adrian,
Amazing tutorial Man. Can you tell me how robust are HOG features to rotation?. I tried to train my models with augmenting examples with rotated images. But it does not seem to help me. Do you have thoughts on making the model more robust to slight rotation?
Thanks
HOG is by no means invariant to rotation. If you wanted to use HOG + Linear SVM for rotated objects you would have to train a separate object detector for each rotation interval, probably in the range of 5-30 degrees depending on your objects.
Hello Adrian!
Great tutorial and material!
I am currently working on a problem that involves counting of crops from UAV images(orthophotos). From my limited experience, I have noticed that training data for many types of objects is easily available online. In my case I will have to create training data of individual plants. It would be great if you could point me in the right direction regarding the creation of such data. At the moment I am cropping individual plants with no specific considerations such as window size or aspect ratio.
Hey Karun — I discuss how to create your own training sets inside both the PyImageSearch Gurus course and Deep Learning for Computer Vision with Python. I would also suggest looking at tools such as imglab, LabelMe, and Sloth. I hope that helps point you in the right direction!
Hi Adrian, let me give you a big thanks for all the knowledge that you share. It helps me learn a lot. I have a question, if I want to detect some object like a leaf for example, how I can do it? Should I train with different rotations of the object?or should I train one rotation at a time?
You might instead want to train a deep learning-based object detector as they’ll be more naturally invariant to rotation.
Thank you, Adrian. You are so kind.
Viola-Jones is sufficient to detect faces on selfie
Hi Adrian,
My name is Ofir Cohen, I’m a software engineering student in Israel.
I am trying to implement an OCR algorithm for water meter and I’ve seen you’ve implemented it on license plates, printed characters etc.
I will be glad if you can help me or guide me to the implementation of this algorithm like you did.
Make sure you connect with who was a speaker at PyImageConf 2018. Jeff built a system to OCR water meters.
Happy New Year! + Thanks a lot for all your wonderful posts including this one. Your posts are life changing for folks like me. Long live to you with your wonderful posts…
Thanks so much Wessi, I really appreciate that 🙂 A very Happy New Year to you as well.
Thank you Adrian for your great post
You are welcome, Kamel!
for object detection what is now the best method which is easy to implement but also gives good results? As this article was written in 2014. I would like to know any better solutions to this problem. Thanks
Just like all computer vision tasks it really depends on:
1. What you are trying to detect
2. How much computation you have available
3. How much training data you have
HOG + Linear SVM is still used often but as far as raw accuracy on object detection datasets go deep learning-based object detectors often obtain the best results.
If you’re interested in learning more, I cover deep learning object detectors inside my book, Deep Learning for Computer Vision with Python.
Hi Adrian
Thank for your great post.
I am trying hog_feature_based_head_detection throug svm.
I made my own positive datasets, but I dont know what images are good for negative images.
I have tried any background images without people,(cutting into small pieces as positive images) but had too many fasle positive detections.
Umm.. something circular not faces or something?
Can you recommend what kind of images would be good?
I would suggest taking a look at the PyImageSearch Gurus course where I cover HOG + Linear SVM object detection in detail (including how to prepare your dataset and train your own custom HOG models).
can you provide me with the link to download the negative dataset?
I recommend trying to create a negative dataset of your own for optimal performance. Gather images similar to what your object detector will see but are negative examples. If you cannot do that, try datasets such as UKBench or sample images from ImageNet.
Hi Adrian,
Thank for your great post.
i am using hog_feature_based_face_detection and knn classifier for prediction, but i am facing a problem lack of time consuming in this process in real time detection how to improve this ???
Hey Tarun — I’m not sure I fully understand your question. Is the issue that your HOG + Linear SVM detector is too slow?
Hi Adrian, Thank you for the nice tutorial,I am following your tutorials one by one as I am a newbie to computer vision.
I want to find a way to detect grass and non grass. I think I can use SVM for this purpose but my question is can I use HOG for feature extraction of grass? Or can you recommend me better methods?
Thanks in advance!
I would instead suggest semantic segmentation or instance segmentation.
Hi Adrian,
Thanks for the nice tutorial.
My question could be repetition. I wanted to know if we can use customized HOGDescriptor to use model files / SVM data that is generated by training our own samples instead of using HOGDescriptor_getDefaultPeopleDetector from cv.
Many thanks for the help,
Suhas
Sorry, I’m not sure what you are asking. Are you asking how to use your own custom HOG + Linear SVM models with OpenCV? Or are you asking how to train them?
Hii sir,
I am working on my final year project for reading the number on which the ball landed on the roulette wheel using openCV and Python, i have written the code which tracks the ball and finding its position to detect the corresponding number but there are accuracy issues.
i want your help, is there any suggestion on how can i improve my accuracy or how can i change my algorithm to detect the number properly. what steps should i follow?
Should i use sliding window to detect the block at which the ball landed and then read corresponding number?
Please help, as i am stuck here at accuracy issues.
Thanks in advance
It sounds like you are trying to perform OCR? If so, refer to this tutorial.
Hello, i am trying to do hog + svm, want to use in remote sensing image ship detection, the idea is to use some methods in the detection part after training to extract the region of interest, and then detect, have any good suggestions or information, thank you
Hi Tonygu — have you taken a look at the PyImageSearch Gurus course? I cover HOG + SVM in detail inside the course. It will help you build your own object detectors.
Hi I’m doing a project on face recognition. I got the basic idea how to detect the face from above given information but how do i code it. Can you help to get the code
Make sure you refer to my tutorials on face recognition.
i’m working on object detection(any), it should find the objects placed near to camera or bulit in image and it should tell me the no.of objects and which is nearer…?