Last updated on July 4, 2021.
4:18am. Alarm blaring. Still dark outside. The bed is warm. And the floor will feel so cold on my bare feet.
But I got out of bed. I braved the morning, and I took the ice cold floor on my feet like a champ.
Why?
Because I’m excited.
Excited to share something very special with you today…
You see, over the past few weeks I’ve gotten some really great emails from fellow PyImageSearch readers. These emails were short, sweet, and to the point. They were simple “thank you’s” for posting actual, honest-to-goodness Python and OpenCV code that you could take and use to solve your own computer vision and image processing problems.
And upon reflection last night, I realized that I’m not doing a good enough job sharing the libraries, packages, and code that I have developed for myself for everyday use — so that’s exactly what I’m going to do today.
In this blog post I’m going to show you the functions in my transform.py
module. I use these functions whenever I need to do a 4 point cv2.getPerspectiveTransform
using OpenCV.
And I think you’ll find the code in here quite interesting … and you’ll even be able to utilize it in your own projects.
So read on. And checkout my 4 point OpenCV cv2.getPerspectiveTransform
example.
- Update July 2021: Added two new sections. The first covers how to automatically find the top-left, top-right, bottom-right, and bottom-left coordinates for a perspective transform. The second section discusses how to improve perspective transform results by taking into account the aspect ratio of the input ROI.
OpenCV and Python versions:
This example will run on Python 2.7/Python 3.4+ and OpenCV 2.4.X/OpenCV 3.0+.
4 Point OpenCV getPerspectiveTransform Example
You may remember back to my posts on building a real-life Pokedex, specifically, my post on OpenCV and Perspective Warping.
In that post I mentioned how you could use a perspective transform to obtain a top-down, “birds eye view” of an image — provided that you could find reference points, of course.
This post will continue the discussion on the top-down, “birds eye view” of an image. But this time I’m going to share with you personal code that I use every single time I need to do a 4 point perspective transform.
So let’s not waste any more time. Open up a new file, name it transform.py
, and let’s get started.
# import the necessary packages import numpy as np import cv2 def order_points(pts): # initialzie a list of coordinates that will be ordered # such that the first entry in the list is the top-left, # the second entry is the top-right, the third is the # bottom-right, and the fourth is the bottom-left rect = np.zeros((4, 2), dtype = "float32") # the top-left point will have the smallest sum, whereas # the bottom-right point will have the largest sum s = pts.sum(axis = 1) rect[0] = pts[np.argmin(s)] rect[2] = pts[np.argmax(s)] # now, compute the difference between the points, the # top-right point will have the smallest difference, # whereas the bottom-left will have the largest difference diff = np.diff(pts, axis = 1) rect[1] = pts[np.argmin(diff)] rect[3] = pts[np.argmax(diff)] # return the ordered coordinates return rect
We’ll start off by importing the packages we’ll need: NumPy for numerical processing and cv2
for our OpenCV bindings.
Next up, let’s define the order_points
function on Line 5. This function takes a single argument, pts
, which is a list of four points specifying the (x, y) coordinates of each point of the rectangle.
It is absolutely crucial that we have a consistent ordering of the points in the rectangle. The actual ordering itself can be arbitrary, as long as it is consistent throughout the implementation.
Personally, I like to specify my points in top-left, top-right, bottom-right, and bottom-left order.
We’ll start by allocating memory for the four ordered points on Line 10.
Then, we’ll find the top-left point, which will have the smallest x + y sum, and the bottom-right point, which will have the largest x + y sum. This is handled on Lines 14-16.
Of course, now we’ll have to find the top-right and bottom-left points. Here we’ll take the difference (i.e. x – y) between the points using the np.diff
function on Line 21.
The coordinates associated with the smallest difference will be the top-right points, whereas the coordinates with the largest difference will be the bottom-left points (Lines 22 and 23).
Finally, we return our ordered functions to the calling function on Line 26.
Again, I can’t stress again how important it is to maintain a consistent ordering of points.
And you’ll see exactly why in this next function:
def four_point_transform(image, pts): # obtain a consistent order of the points and unpack them # individually rect = order_points(pts) (tl, tr, br, bl) = rect # compute the width of the new image, which will be the # maximum distance between bottom-right and bottom-left # x-coordiates or the top-right and top-left x-coordinates widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2)) widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2)) maxWidth = max(int(widthA), int(widthB)) # compute the height of the new image, which will be the # maximum distance between the top-right and bottom-right # y-coordinates or the top-left and bottom-left y-coordinates heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2)) heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2)) maxHeight = max(int(heightA), int(heightB)) # now that we have the dimensions of the new image, construct # the set of destination points to obtain a "birds eye view", # (i.e. top-down view) of the image, again specifying points # in the top-left, top-right, bottom-right, and bottom-left # order dst = np.array([ [0, 0], [maxWidth - 1, 0], [maxWidth - 1, maxHeight - 1], [0, maxHeight - 1]], dtype = "float32") # compute the perspective transform matrix and then apply it M = cv2.getPerspectiveTransform(rect, dst) warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight)) # return the warped image return warped
We start off by defining the four_point_transform
function on Line 28, which requires two arguments: image
and pts
.
The image
variable is the image we want to apply the perspective transform to. And the pts
list is the list of four points that contain the ROI of the image we want to transform.
We make a call to our order_points
function on Line 31, which places our pts
variable in a consistent order. We then unpack these coordinates on Line 32 for convenience.
Now we need to determine the dimensions of our new warped image.
We determine the width of the new image on Lines 37-39, where the width is the largest distance between the bottom-right and bottom-left x-coordinates or the top-right and top-left x-coordinates.
In a similar fashion, we determine the height of the new image on Lines 44-46, where the height is the maximum distance between the top-right and bottom-right y-coordinates or the top-left and bottom-left y-coordinates.
Note: Big thanks to Tom Lowell who emailed in and made sure I fixed the width and height calculation!
So here’s the part where you really need to pay attention.
Remember how I said that we are trying to obtain a top-down, “birds eye view” of the ROI in the original image? And remember how I said that a consistent ordering of the four points representing the ROI is crucial?
On Lines 53-57 you can see why. Here, we define 4 points representing our “top-down” view of the image. The first entry in the list is (0, 0)
indicating the top-left corner. The second entry is (maxWidth - 1, 0)
which corresponds to the top-right corner. Then we have (maxWidth - 1, maxHeight - 1)
which is the bottom-right corner. Finally, we have (0, maxHeight - 1)
which is the bottom-left corner.
The takeaway here is that these points are defined in a consistent ordering representation — and will allow us to obtain the top-down view of the image.
To actually obtain the top-down, “birds eye view” of the image we’ll utilize the cv2.getPerspectiveTransform
function on Line 60. This function requires two arguments, rect
, which is the list of 4 ROI points in the original image, and dst
, which is our list of transformed points. The cv2.getPerspectiveTransform
function returns M
, which is the actual transformation matrix.
We apply the transformation matrix on Line 61 using the cv2.warpPerspective
function. We pass in the image
, our transform matrix M
, along with the width and height of our output image.
The output of cv2.warpPerspective
is our warped
image, which is our top-down view.
We return this top-down view on Line 64 to the calling function.
Now that we have code to perform the transformation, we need some code to drive it and actually apply it to images.
Open up a new file, call transform_example.py
, and let’s finish this up:
# import the necessary packages from pyimagesearch.transform import four_point_transform import numpy as np import argparse import cv2 # construct the argument parse and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", help = "path to the image file") ap.add_argument("-c", "--coords", help = "comma seperated list of source points") args = vars(ap.parse_args()) # load the image and grab the source coordinates (i.e. the list of # of (x, y) points) # NOTE: using the 'eval' function is bad form, but for this example # let's just roll with it -- in future posts I'll show you how to # automatically determine the coordinates without pre-supplying them image = cv2.imread(args["image"]) pts = np.array(eval(args["coords"]), dtype = "float32") # apply the four point tranform to obtain a "birds eye view" of # the image warped = four_point_transform(image, pts) # show the original and warped images cv2.imshow("Original", image) cv2.imshow("Warped", warped) cv2.waitKey(0)
The first thing we’ll do is import our four_point_transform
function on Line 2. I decided put it in the pyimagesearch
sub-module for organizational purposes.
We’ll then use NumPy for the array functionality, argparse
for parsing command line arguments, and cv2
for OpenCV bindings.
We parse our command line arguments on Lines 8-12. We’ll use two switches, --image
, which is the image that we want to apply the transform to, and --coords
, which is the list of 4 points representing the region of the image we want to obtain a top-down, “birds eye view” of.
We then load the image on Line 19 and convert the points to a NumPy array on Line 20.
Now before you get all upset at me for using the eval
function, please remember, this is just an example. I don’t condone performing a perspective transform this way.
And, as you’ll see in next week’s post, I’ll show you how to automatically determine the four points needed for the perspective transform — no manual work on your part!
Next, we can apply our perspective transform on Line 24.
Finally, let’s display the original image and the warped, top-down view of the image on Lines 27-29.
Obtaining a Top-Down View of the Image
Alright, let’s see this code in action.
Open up a shell and execute the following command:
$ python transform_example.py --image images/example_01.png --coords "[(73, 239), (356, 117), (475, 265), (187, 443)]"
You should see a top-down view of the notecard, similar to below:
Let’s try another image:
$ python transform_example.py --image images/example_02.png --coords "[(101, 185), (393, 151), (479, 323), (187, 441)]"
And a third for good measure:
$ python transform_example.py --image images/example_03.png --coords "[(63, 242), (291, 110), (361, 252), (78, 386)]"
As you can see, we have successfully obtained a top-down, “birds eye view” of the notecard!
In some cases the notecard looks a little warped — this is because the angle the photo was taken at is quite severe. The closer we come to the 90-degree angle of “looking down” on the notecard, the better the results will be.
Automatically finding the corners for the transform
In order to obtain our top-down transform of our input image we had to manually supply/hardcode the input top-left, top-right, bottom-right, and bottom-left coordinates.
That raises the question:
Is there a way to automatically obtain these coordinates?
You bet there is. The following three tutorials show you how to do exactly that:
- Building a document scanner with OpenCV
- Bubble sheet multiple choice scanner and test grader using OMR, Python, and OpenCV
- OpenCV Sudoku Solver and OCR
Improving your top-down transform results by computing the aspect ratio
The aspect ratio of an image is defined as the ratio of the width to the height. When resizing an image or performing a perspective transform, it’s important to consider the aspect ratio of the image.
For example, if you’ve ever seen image that looks “squished” or “crunched” it’s because the aspect ratio is off:
On the left, we have our original image. And on the right, we have two images that have been distorted by not preserving the aspect ratio. They have been resized by ignoring the ratio of the width to the height of the image.
To obtain better, more aesthetically pleasing perspective transforms, you should consider taking into account the aspect ratio of the input image/ROI. This thread on StackOverflow will show you how to do that.
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this blog post I provided an OpenCV cv2.getPerspectiveTransform
example using Python.
I even shared code from my personal library on how to do it!
But the fun doesn’t stop here.
You know those iPhone and Android “scanner” apps that let you snap a photo of a document and then have it “scanned” into your phone?
That’s right — I’ll show you how to use the 4 point OpenCV getPerspectiveTransform example code to build one of those document scanner apps!
I’m definitely excited about it, I hope you are too.
Anyway, be sure to signup for the PyImageSearch Newsletter to hear when the post goes live!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Hello Adrian,
This was really a wonderful post it gave me a very insightful knowledge of how to apply the perspective transform. I just have a very small question about the part where you were finding the maxHeight and maxWidth. For maxHeight (just considering heightA) you wrote
np.sqrt(((tr[1] – br[1]) ** 2) + ((tr[1] – br[1]) ** 2))
but i think that the height should be
np.absolute(tr[1] – br[1])
because you know this gives us the difference in the Y coordinate
but the equation that you wrote gives us
1.4142 * difference of the y coordinates. Why is so?
Hi Vivek. The equation is utilizing the sum of squared differences (Euclidean distance), whereas the equation you proposed is just the absolute value of the differences (Manhattan distance). Try converting to the code to use the
np.absolute
function and let me know how the results look.I know this is old but still, I actually had the same question. Then I tried to replaced with
widthA = abs(br[0] – bl[0])
widthB = abs(tr[0] – tl[0])
heightA = abs(tr[1] – br[1])
heightB = abs(tl[1] – bl[1])
And I’m getting pretty similar results.
Hi Adrian,
Really helpful post.
Hi Ashish,
Here, he’s actually trying to find the Euclidian distance of the line. The line is hypotenuse of the triangle if you see the letter image. If you do abs(x2-x1), you will not find the hypotenuse but the adjacent side (base of the triangle). In this case, the difference between hypotenuse and adjacent side is pretty small and hence you might be getting similar results. Hopefully, this helps.
Hi Adrian,
is it possible, that you mixed up top and bottom in the comments of the function order_points() ? When I did an example rect[0] was BL, rect[1] was BR, rect[2] was TR and rect[3] was TL.
Hi Vertex. Hm, I don’t think so. The
dst
array assumes the ordering that I mentioned above and it’s important to maintain that order. If the order was not maintained then the results from applying the perspective transform would not be correct.Hi Adrian, thanks for your answer, I have to say I am newbie and I tried the following to get a better understanding:
import numpy as np
rect = np.zeros((4, 2), dtype = “float32”)
# TL,BR,TR,BR
a = [[3,6],[3,3],[6,6],[6,3]]
rect[0] = np.argmin(np.sum(a,axis=1))
rect[2] = np.argmax(np.sum(a,axis=1))
rect[1] = np.argmin(np.diff(a,axis=1))
rect[3] = np.argmax(np.diff(a,axis=1))
print(rect)
[[ 1. 1.]
[ 3. 3.]
[ 2. 2.]
[ 0. 0.]]
I guess I got a faulty reasoning.
Ah, I see the problem. You are taking the argmin/argmax, but not grabbing the point associated with it. Try this, for example:
rect[0] = a[np.argmin(np.sum(a,axis=1))]
The argmin/argmax functions give you the index, which you can then apply to the original array.
Hi Adrian! I’m a newbie.
Spent lots of time mulling over this.
Lines 53-57
dst = np.array([
[0, 0],
[maxWidth – 1, 0],
[maxWidth – 1, maxHeight – 1],
[0, maxHeight – 1]], dtype = “float32”)
Isn’t
[0,0] – Bottom Left
[maxWidth – 1, 0] – Bottom Right
[maxWidth – 1, maxHeight – 1] – Top Right
[0, maxHeight – 1]] – Top Left
So it’s bl,br,tr,tl? I’m a bit confused. Could you please explain?
Hey Nithin, it’s actually:
Python arrays are zero-indexed, so we start counting from zero. Furthermore, the top-left corner of the image is located at point (0,0). For more information on the basics of image processing, including the coordinate system, I would definitely look at Practical Python and OpenCV. I have an entire chapter dedicated to learning the coordinate system and image basics.
might be super simple, but I still don’t get it why do you extract 1 from maxWidth and maxHeight for the tr, br and bl ?
In order to apply a perspective transform, the coordinates must be a consistent order. In this case, we supply them in top-left, top-right, bottom-right, and bottom-left order.
Great sample. My question is regarding the transformation matrix. Could it be used to tranform only a small region from the original image to a new image instead of warping the entire image? Say you used the Hello! example above but you wanted to only relocate the exclamation mark from the original image to a new image to end up with exactly the same output you have except without the “Hello” part, just the exclamation mark. I guess the question is whether you can use the TM directly without using the warping function.
Thanks!
Hi Ken, the transformation matrix
M
is simply a matrix. On its own, it cannot do anything. It’s not until you plug it into the transformation function that the image gets warped.As for only warping part of an image, you could certainly only transform the exclamation point. However, this would require you to find the four point bounding box around the exclamation point and then apply the transform — which is exactly what we do in the blog post. And that point you’re better off transforming the entire index card and cropping out the exclamation point from there.
Hey Adrian,
I agree completely with what you say…I apologise, it was a poor example…what I was wondering about was how the mapping worked.
Fwiw, given the transformation matrix M you can calculate the mapped location of any source image pixel (x,y) (or a range of pixels) using:
dest(x) = [M11x + M12y + M13]/[M31x + M32y + M33]
dest(y) = [M21x + M22y + M23]/[M31x + M32y + M33]
Why bother?
I used this method to map a laser pointer from a keystoned camera image of a large screen image back onto the original image…allowing me to “draw” on the large screen.
Thanks!
Wow, that’s really awesome Ken! Do you have an example video of your application in action? I would love to see it.
Hi Adrian,
I am newbie in opencv.
is it possible to measuring angles in getPerspective Transform
can u give the function?
Thanks in advance
Hi Wiem, I’m not sure I understand what you’re asking? If you want to get the angle of rotation for a bounding box, you might want to look into the
cv2.minAreaRect
function. I cover it a handful of times on the PyImageSearch blog, but you’ll want to lookup the actual documentation on how to get the angle.Hi Adrian, thanks for your answer
I looked at “hello image” (original vs wraped) there is an angle of rotation.
i want to know how to get the angle.
sorry for my english
Hope to hear from you
Regards.
Hey, Wiem — please see my previous comment. To get the angle of rotation of the bounding box just use the
cv2.minAreaRect
function, which you can read more about here. Notice how the angle of rotation is returned for the bounding box.Ups yeah.
thanks for fast response.
Thank you very much
regards
Is there equivalent function for order_points(Rect ) in opencv for C++?
P.S. Thanks for your tutorials.
Thanks.
Hey Aamir, if there is a C++ version, I do not know of one. The
order_points
function I created was entirely specific to Python and is not part of the core of OpenCV.hiii Aamir,
Do you have the c++ version of this above code.
The code shows this error:
“TypeError: eval() arg 1 must be a string or code object”
thanks
Hi Palom, make sure you wrap the coordinates in quotes:
python transform_example.py --image images/example_01.png --coords "[(73, 239), (356, 117), (475, 265), (187, 443)]"
That(wrapping in quotes) is already done in the code and the problem persists!
Try removing the argument parsing code and then hardcoding the points, like this:
pts = np.array([(73, 239), (356, 117), (475, 265), (187, 443)], dtype = "float32")
error: (-215) src.cols > 0 && src.rows > 0 in function warpPerspective
I have a error after hardcoding
It sounds like the path you supplied to
cv2.imread
does not against. I would suggest reading up on NoneType errors.It works in my Spyder on Debian:
coords_list = eval(eval(args[“coords”]))
pts = np.array(coords_list, dtype =”float32″)
Hey, i am trying to implement the same thing in java using openCV but I cant seem to find the workaround of the numpy function can you help me out please…..My aim is to implement document scanner in java(ref :https://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/)…..
Thanking you in anticipation
Hey Singh, I honestly don’t do much work in Java, so I’m probably not the best person to answer that question. But you could probably use something like jblas or colt.
Hey Adrian,
I believe your order_points algorithm is not ideal, there are certain perspective in which it will fail, giving non contiguous points, even for a rectangular subject. A better approach is to find the top 2 points and 2 bottom points on the y axis, then sort these pairs on the x axis.
Actually my solution can also fail.
If the points in the input are contiguous, the best would be to choose a begin point meeting a choosing arbitrary ordering constraint, whilst conserving their original order.
Otherwise, a correct solution involve tracing a polygon without intersection, e.g. using the gift wrapping algorithm — simplified for a quadrilateral.
Thanks for the tip Tarik!
i am receiving an error message no module named pyimageseach.transform any idea what i have missed?
Please download the source code using the form at the bottom of this post that includes the
pyimagesearch
module.Thanks for the quick reply.Okay i downloaded the file on my old laptop running windows 7 what do i do with it. Sorry for the stupid question I am a newbee and I am also old so i have two strikes against me,but if you learn from mistakes I should be a genius in no time!
Please see the “Obtaining a Top-Down View of the Image” section of this post. In that section I provide the example commands you need to run the Python script.
Hi Adrian,
Thanks so much for the code. I have tried your code and running well on my MacBook OSX Yosemite, Python 2.7.6 and openCV 3.0.
I am just wondering that if we can improve it by automatically detect the four points. So, the variable input will be only the image. 🙂 Should it be possible? What will be the algorithm to auto-detect the points?
🙂
Thanks!
Ari
Yep! We can absolutely automatically detect the points! Please see my post on building a document scanner.
Thanks for your kind sharing information about python & openCV.
Wonderful!
hi adrian,
thanks for sharing..
what is mean index 0 and 1 in equation widthA = np.sqrt(((br[0] – bl[0]) ** 2) + ((br[1] – bl[1]) ** 2)) ?
The
0
index is the x-coordinate and the1
index is the y-coordinate.Hi!
Thank you for the tutorial. Have you got any tutorial how to transform perspective for the whole image? So user chooses 4 reference points and then the whole image is transformed (in result image there are some black fragments added). I know that I should calculate appropriate points for input image, but I have no idea how to do it. Can you help?
Regards
Given your set of four input points, all you need to do is build your transformation matrix, and then apply a perspective warp, like this tutorial.
Excellent tutoroal! Would it be possible to do the opposite? I mean, given a top-view of an image produce a distorsioned one?
Sure, the same principles apply — just modify the transformation matrix prior to applying it.
Hi, Adrian
I’m passing by just to say this post is really helpful and thank you very much for it.
I’m starting studying Computer Vision and your blog it is really helping my development.
Well done and keep going. =)
Cheers,
Matheus Torquato
Thanks Matheus! 🙂
You have a slight typo in your description of line 21. It should say (i.e. y – x).
Hi Adrian,
Can the getPerspectiveTransform method return the angle that the image makes with the Y-axis?
Can you be more specific what you mean by “the angle the image makes with the y-axis”? I’m not sure I understand your question.
Do you happen to have the c++ version of this?
Sorry, I only have Python versions, I don’t do much C++ coding anymore.
I’m a complete newbie to OpenCV but shouldn’t warping perspective off a `Mat` using the output of `minAreaRect` be a one-line command? I mean, you clearly have extracted some of these things out as ‘utils’ and a nice importable github repo for that, too, for which, we all thank you but don’t you think that if it were so “regularly used by devs for their image processing tasks”, they better lie in vanilla OpenCV? To be *really really* honest, my “duckduckgo-ing” about warping off a rect perspective led me to this post of yours among the very first results and I *knew* the code presented obviously works but I didn’t immediately start using it *ONLY AND ONLY* because I *believed* there would be something to the effect of
warpedMat = cv2.warpPerspective(imageToWarp, areaToWarp)
Ultimately, on asking my colleague on how to do it, she suggested goin’ ahead with your written utils only! 🙂
Adrian,
Thank you so much for your awesome tutorials!! I’ve been learning how to use the raspberry pi the past few weeks in order to make an automated testing system, and your tutorials have been so thorough and helpful. Thank you so so much for making these!!
Thanks Christine, I’m happy i could help 🙂
I tried this code and it’s pretty cool but it can’t handle images like this:
http://vari-print.co.uk/wp-content/uploads/2013/05/business-cards-1.jpg
I tried to cut out the business card but I couldn’t. I got some strange results. Why and how can I fix it?
In order to apply a perspective transform (and get reasonable results), you need to be able to detect all four corners of the object. If you cannot, your results will look very odd indeed. Please see this post for more details on applying perspective transforms to rectangular objects.
Hi Adrian. sorry for my English. 🙂
I’m newbie in opencv. thank you so much for awesome tut 😀
i want crop this image https://goo.gl/photos/kAmDRokUeLcqpycX7 with original on left and i want crop image on right. may u help me?
thanks in advance
If you are trying to manually crop the region, I would use something like the GUI code I detail in this blog post.
thanks your tut. it very excited!
in this image, i want to get the road, and outside will be black, white or other color, because i’m researching about raspberry pi, and i want it process
least as possible. do you have any idea?
Thank you sir. I accept you as my Guru.
Hi Adrian,
I dont understand how (x-y) will be minimum for the top right corner…. Consider this square:
tl= 0,0
tr= 1,0
br= 1,1
bl =0,1
(x-y) is minimum for bl, ie. 0-1 = -1, nd not tr… Am i going wrong somewhere??
The origin (x, y)-coordinates start at the top-left corner and increase going down and to the right. For what it’s worth, I recommend using this updated version of the coordinate ordering function.
I agree with Karthikey,
you made a mistake.
it should be y-x, not x-y
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmax(diff)]
rect[3] = pts[np.argmin(diff)]
Yes, there is actually an updated, better tutorial on ordering coordinates here.
I gave the input image of a irregular polygon formed after applying convex hull function to a set of points, supplying the four end pints of the polygon in the order you mentioned. However the output I get is a blank screen. No polygon in it.
Can you please tell how to give irregular polygons as input to the above code.
It’s hard to say without seeing an example, but I imagine your issue is that you did not supply the coordinates in top-left, top-right, bottom-left, and bottom-right order prior to applying the transformation.
Adrian, great post. I was trying to build this for a visiting card. My problem is that the card could be aligned in any arbitrary angle with respect to the desk as well as with respect to the camera.
When I use this logic, there are alignments at which the resultant image is like a rotated and shrieked one. In your case, the image is rotated towards the right to make it look correct. However, if the original card was rotated clockwise 90 degrees, then the logic of top right, top left does not work correctly.
I tried using the “width will always be greater than height” approach but that too fails at times.
Any suggestions?
This sounds like it may be a problem with the coordinate ordering prior to applying the perspective transform. I would suggest instead suggest using the updated implementation.
Hello Adrain,
I am quit new in opencv. i am working in Imageprocessing progam in python 2.7.Actually i am facing Problem during cropping of Image. I am working in diffferent Images having some black side background. The Picture is taken by camera maulally so the Image is also different in size. I want to crop the Image and separte it from balck Background. could you suggest how can i make a program that can detect the Image first and crop automatically.
Thanks
Hey Sandesh — do you have any examples of images that you’re working with?
Hi Adrian,
I have a doubt when thinking about the generalization of this example regarding the destiny points. This example is specifically aimed to quadrangular objects, right? I mean, you get the destiny image points because you simple say “hey, sure it will be a box, let’s get the max height, max width and that’s it”.
But wouldn’t be so easy if the original object would have had a different shape, right?
Thanks.
Yes, this example presumes that the object you are trying to transform is rectangular. I’m not sure I understand what you mean by the “generalization” of this technique?
I want to know if there is a way that the program can automatically detect the corners??
Absolutely. Please see the followup blog post to this one.
When the rectangular dimensions of the source target are known, the result is much better if you input x,y values for the destination image that conform to the x/y ratio of the source target. The estimation of the output size described here will only be perfect if the target is perpendicular to the viewpoint. This is why the distortion increases in proportion to the angle(s) away from perpendicular. Otherwise, you have to know the focal length of the camera, focus distance, etc. (much more complicated calculations…) to estimate the “real” proportions of the target’s dimensions (or x/y ratio).
As an example, the page size in your samples looks like “legal” 8.5 x 14 paper. Once this is established, if you replace the maxHeight calculation with “maxHeight = (maxWidth * 8) / 14”,
the output image(s) are much better looking as far as the x/y ratio is concerned (no apparent distortion on the last sample). Of course, one must know the target’s x/y ratio…
Good point, thanks for sharing Rene. If the aspect ratio is know then the output image can be much better. There are methods that can attempt to (automatically) determine the aspect ratio but they are outside the scope of this post. I’ll try to do a tutorial on them in the future.
Hi, this is really nice ! What bugs me is the way to find those four corners since the picture does not have the right perspective
If you want to automatically find the four corners, take a look at this blog post.
Hi Andrian, thanks for the tutorial. I cannot help noticing that you mentioned the difference of the coordinates is (x-y), but np.diff([x, y]) actually returns (y-x).
Traceback (most recent call last):
File “transformexple.py”, line 2, in
from pyimagesearch.transform import four_point_transform
ImportError: No module named pyimagesearch.transform
how can install it ????
Make sure you use the “Downloads” section to download the source code associated with this blog post. It includes the “pyimagesearch” directory structure for the project.
It is really nice way to get the bird’s eye view, but when I tried to use it in my algorithm i failed to get the bird’s eye
I want to get a top view of a lane and a car ?
This method is an example of applying a top-down transform using pre-defined coordinates. To automatically determine those coordinates you’ll have to write a script that detects the top of a car.
Hi Adrian,
Thanks a lot for the post, this is great for the app I’m trying to build. The thing is that I’m translating all your code to Java and I don’t know if everything is correctly translated because the image I get after the code is rotated 90º and flipped… I’m investigating what could be happening but maybe you think of something that could be happening. Thanks again for the post.
Hi Christian — thanks for the comment. However, I haven’t used the OpenCV + Java bindings in a long time, so I’m not sure what the exact issue is.
Well, thanks anyway. I’ll giving it a second shot today 🙂
Hi Adrian,
Would affine transform make any sense in this context?
This page provides a nice discussion on affine vs. perspective transforms. An affine transform is a special case of a perspective transform. In this case perspective transforms are more appropriate — we can make them even better if we can estimate the aspect ratio of the object we are trying to transform. See the link for more details.
There is another idea to order four points by
comparing centroid with the four points
there is one limitation when oriantation object =45 degree
Hi Adrian,
Thanks for sharing the example, very inspiring!
Perspective transform works great here as the object being warped is 2D. What if the object is 3D, like a cylinder, http://cdn.free-power-point-templates.com/articles/wp-content/uploads/2012/07/3d-cilinder-wrap-text-ppt.jpg?
Hi Adrian,
I followed your tutorial in one of my projects but the object height decreases.
i used your tuturial ‘Building a Pokedex in Python’ part4 and 5 in order to have the corners points.
http://imgur.com/a/vt4Er
Can you tell me what im doing wrong.
Thanks in advance.
Hello Adrian,
Could you help with the following problem?
$ pip install pyimagesearch
Collecting pyimagesearch
Could not find a version that satisfies the requirement pyimagesearch (from versions: )
No matching distribution found for pyimagesearch
Also I’ve checked comments with similar problems, but all solutions still don’t work…
– downloaded examples on the bottom of the page
– replace them to the folder with python.
Thank you!
There is no “pyimagesearch” module available on PyPI, hence you error when installing via “pip”. You need to use the “Downloads” section at the bottom of this page, download the code, and put the “pyimagesearch” directory in the same directory as the code you are executing. Alternatively, you could place it in the
site-packages
directory of your Python install, but this is not recommended.Hi Thanks for the cool tutorial helped me a lot ,but there were projection errors do u know how to Dewarp an image , for an example a page from a book.
Thanks in advance.
Hey Adrian,
Thanks for the tutorial.
I am porting the order_points func to C++, but I am being confused about:
diff = np.diff(pts, axis = 1)
The tutorial states that in order to find top right and bottom left points, find the diff of each point and the min will be the top right and max will be bottom left. The confusion is: is the function doing x – y or y – x or |x -y|?
I have a hunch that it’s |x – y|, I’ll try it out soon and post an update.
Hi Ahmad — please see my reply “Eugene”. I would port the function I linked to over to C++ instead.
Hi, Adrian
There is a bug in “order_points” function.
For input [(0,0),(20,0),(5,0),(5,5)] it classifies (20,0) as bottom-right, because 20+0 is largest sum. But it is top-right. Real bottom-right is (5,5)
It causes incorrect image processing in some cases
I fixed it https://pastebin.com/WXkhw6tU . Code is less pretty, but works in all cases. Maybe you can rewrite it to make pretty 🙂
Hi Eugene — thanks for the comment. I actually provide an updated method to order the points in this post.
Why do we hard code this. Can we automate this like when we pass the input image it has to automatically detect coordiantes and warp it according to the reference image?How do we do that .Is it possible?
Yes, please see the follow up blog post where we build a document scanner that automatically determines the coordinates.
Hi Adrian!
Your solutions are helping me a lot building a Document Classifier with sklearn and other machine learning related libraries! I already managed to succesfully build the classifier model and now I’m trying to get documents from photos to predict their classes.
In order to make this work I get features from the documents I try to classify so I can build an array with them and pass it to the classifier to predict its class.
It’s very important for the classifier to get information about the colors of the document. I already applied your scanner but I’m struggling a lot to get the colors.
I understand it’s crucial to have do the COLOR_BGR2GRAY transformation for the code to work, either for getting the contour points and to do the warping. Is there any way I could achieve a colored pespective transformation?
Thank you for sharing your ideas! Sorry for my bad english! I hope my explanation makes sense. I’m a spanish speaker!
You can certainly obtain a color version of the warped image. Just resize the image as we do in the blog post, but keep a copy of the color image via the
.copy()
method. From there you can apply the perspective transform to the resized (but still color) image. All other processing can be done on the grayscale image.Hello Adrian,
thanks for the post! I had fun playing around with the code and managed to get a nice bird’s eye view for an image of my own.
I would have one interesting question: Let’s say that we have an image with circular pattern (imagine identical objects regularly inter-spaced along circumference of a circle, kind of like a clock). This pattern is viewed under perspective projection so it appears like an ellipse. My question is: what is the best way how to get the bird’s eye view for this pattern? (The approach presented above doesn’t seem directly applicable as there are now straight lines in the image of circular pattern).
Thanks a lot!
This is certainly a much more challenging problem. To start I would be concerned with how you would attempt the objects along the circumference of the circle under perspective transforms. Can each object be localized? If you can localize them then you could actually treat it like a 4 point perspective transform. You would basically use each of the detected objects and compute the bounding box for the entire circle. Since the bounding box would be a rectangle you could then apply the perspective transform.
What software license are you using for the code in this post/repo?
The license is MIT. I would appreciate attribution and a link back to the PyImageSearch blog if at all possible.
Is there any way we can automatically find out those four coordinates ?
Yes. Please see this tutorial.
when i run this code it just prints none, do not print any picture, why is that?
Hey Erdem — can you elaborate a bit more? Is the script automatically exiting? Are you receiving an error message of some kind?
Hi Adrian,
Thanks for the excellent post. I think the width & height calculation may have some problem. check the results at https://photos.app.goo.gl/2p93mTghSw0AQezD3
For this case the width needs to be scaled, we can’t work with
“width is the largest distance between the bottom-right and bottom-left x-coordinates or the top-right and top-left x-coordinates” logic
This code does not handle the scaling of the width and height to maintain aspect ratio. You can update the code to do so if you wish.
Thanks , I was thinking about that but not sure how we can find aspect ratio from the image
Hello, Adrian. After reading your article, I learned a lot. But I have two confusions and I would like to ask you. The first problem, when acquiring contours, is because the image is noisy, and the points obtained are not four, so do not know what to do when building the dst matrix. If the contour has multiple points, can you construct the dst matrix? The dst I mean is the second parameter in cv2.getPerspectiveTransform(rect, dst). The second question, when I got the matrix M through cv2.getPerspectiveTransform(rect, dst), I want to change the perspective of the original image through M, not just changing the contents of the four points in the image. For example, you In the example, the blank part of the picture is OK? My English is not very good, is a tool to help me translate, I hope you can get your reply, I will be extremely grateful
The answer here is “contour approximation”. Take a look at this blog post to see how you can use contours, contour approximation, and a perspective transformation to obtain the desired result.
ap.add_argument(“-i”, “–image”, help = “path to the image file”)
The above line always gives error in WINDOWS OS in Python 3.6.4 , can you please help me out?
Your error can be resolved by reading up on command line arguments.
Hi Adrian,
Thanks for this great post. I was trying to follow along with this post with slight modifications. First I obtained four_point_transform of the given image. After getting the warped image, I plotted bounding box around text “Hello” using cv2.rectangle.
cv2.rectangle(warped, (37, 20), (222, 97), (255, 0, 0), 2)
Now, I want to plot the bounding box of text “Hello” in the warped image on the original image. I referred to a similar question on stackoverflow: https://stackoverflow.com/questions/14343420/retrieving-the-original-coordinates-of-a-pixel-taken-from-a-warped-image/14346147.
But I do not have the coordinates on the source image so could not follow along. Any help would be very much appreciated.
I’m not sure what you mean by “warped image on the original image”. Could you clarify?
I have these coordinates (37, 20), (222, 97) on the warped image. Now I’ve to find out what would have been their coordinates on the original image? Hope it’s clear now.
Thank you for the clarification, I understand. You linked to the StackOverflow thread you read which contains the correct answer — you use an inverse perspective transform.
Many Thanks for sharing your expertise.
I worked out the code to show original image with matplotlib so I can select the coordinates of the points more easily.
I works quite fine.
Thanks for the sharing.
I found a typo in the blog post where you wrote, “Here we’ll take the difference (i.e. x – y) between the points using the np.diff function on Line 21.”
I think you mean y-x.
Hi Adrian,
How can we automatically know the coordinates of the input image , and don’t write it in the command ,So as to make this project more practical to use . Some reference links will do great help .
Thank you !!
This guide will show you how 🙂
Hi Adrian,
I’m trying to find all the rectangular contours and lines(strictly horizontal or vertical) in a screenshot image. For example I took screenshot of this page and would like to identify all the code blocks, image blocks, comment separation lines etc. Now since it is a screenshot image, I know that all my required rectangle contours edges are strictly horizontal and strictly vertical without any slant. In my scenario, the edges would be lines with some background or segment border
I’ve tried canny edge(https://www.pyimagesearch.com/2015/04/06/zero-parameter-automatic-canny-edge-detection-with-python-and-opencv/) followed by hough transform. I think I’m not able to decide the parameters. Need your help in understanding the parameters for hough transform and few suggestions of parameters or approaches would be great help.
Thank you
The Hough lines function can be a bit of a pain to tune the parameters to. You can actually use simple heuristics to detect text and code blocks in a screenshot. All you would need is the image gradients and some morphological operations. I would start by following this example and working from there.
I do not know what to say
Everything here is useful
I remember the day you published this post I did not care very much maybe I did not understand it
Today I’m after a long research I really benefited from
Thanks Adrian
Thanks Mohamed 🙂
Thank for the tutorial, it works wonderfully. I have a question regards to the transformation, say I have a point on the source image, and after wraptransform, how do I find out where this point in new image ?
Hi Adrian
I tested and saw that it may not transform perspective back to original dimension on top down view. If we already known the original dimensions of object, and scale the result to fit these dimensions(width x height), may it be returned to correct size ? How to do that?
If you know the original aspect ratio of the object you scale the output image width and height (the values you pass into the perspective transform) by that aspect ratio, making the output image match the original aspect ratio.
It works, and i use cv2.resize ,i wonder if interpolation = cv2.INTER_CUBIC is correct
In this scanning lesson, you referenced and challenged us to write an iPhone App that does this, I wasn’t aware that Apps can be written in Python. Can you please give me references on how to do this? I know this is not part of your course (at least I didn’t come across it yet), but reference for this would be great.
Does the new XCode for Apple support python driven iPhone apps? Or is there some other 3rd party outfit that offers this capability.
John
Sorry for any confusion there — I was providing you with an example implementation in Python. You could then translate the code into whatever language you wished and use it as a base for a mobile application.
Hi Adrian , that is a great post and good code…. can you pls let me know how to download the pyimagesearch?
You can use the “Downloads” section of the tutorial to download the source code and “pyimagesearch” module for the post.
Hello Adrian. Thanks for the great post. Really helps me as a new learner.
I would like to know if there is anyway to automatically detect the coordinates that has to be warpped? Because for my project I would require it to automatically detect the coordinates? Could you kindly help me with this?.
Thanks
Yes, absolutely. See this post.
The point sorting helper uses a member function for calculating the sum, but a free standing function for the diff. The later is actually more useful since it can deal with an “array-like” and not just a numpy matrix.
In other words, changing “pts.sum(axis = 1)” into “np.sum(pts, axis=1)” makes it more generic so you can pass a plain python list into four_point_transform().
Adrian, I want to thank you for the opportunity to work with your source code. I wonder if you have come across my situation as I can’t quite get it to process. Once I run the module, the python 2.7.14 shell comes with red message as such:
usage: scan.py [-h] -i IMAGE
scan.py: error: argument -i/–image is required
I went through all of the files needed for it to work and I can’t even find where IMAGE in all caps shows up? Maybe you would have some idea. Thanks!!
Your error is happening due to not properly understanding how command line arguments work. It’s okay if you are new to command line arguments but make sure you read this guide first to help you learn the fundamentals. From there you’ll be ready to go!
Hi, do you have an idea how can I detect the text area without the four points? (e.g the paper area’s four points is not visible in the image because it is skewed/rotated) Thank you so much!
Unfortunately you will need to be able to detect all four coordinates for a reliable perspective transform. If you could detect 3 points you could estimate the 4th point but I would encourage you to try to detect all four points.
Hello, I am extremely new to OpenCV. While running the code, I am getting an invalid syntax at transform_example.py. Can you please help me with it.
What is the exact error? If you’re new to OpenCV you should work through Practical Python and OpenCV. 1000’s of PyImageSearch readers have used the book to learn the fundamentals of computer vision and image processing — I have no doubt it will help you as well.
Hi Adrian,
Thanks for this amazig tutorial. I’d like to ask if there is a way to do the reversal of this project, so puthis particular text in perspective on the paper? Is that possible and is there a tutorial for it.
Thanks,
Igor
Hey Adrian,
I hope you doing amazing well. your tutorials are amazing.
I just want to know how you define the coordinates? did you randomly define them or you find them first and then pass those coordinates in a function
Thanks
AN
These were defined manually. To automatically find the coordinates see this tutorial.
Amazing Tutorial you helped me a lot , i have been following your tutorials for almost 2 years now . keep up the good work .
Thanks Ajay, I’m glad you’re enjoying the blog 🙂
Thanks for this tutorial. My 2 cents about quadrilateral points ordering:
Bellow is my code:
def order_points(pts):
# Step 1: Find center of object:
center = pts.sum(axis=0) / 4
# Step 2: Move coordinate system to center of object
shifted = pts – center
# find andlular component of the polar coordinate of each vertex
theta = np.arctan2(shifted[:, 0], shifted[:, 1])
# return vertices ordered by theta
return pts[theta.argsort]
Hello,
Great post, I have been looking for something like this all over the place!
I have a very similar problem. Imagine I have a camera mounted on dashboard of my car and it takes images of the road ahead. That “view” will be perspective view, right?
Now I want to transform that to something which should look like the plan or top view of the road (orthographic top view of the road)
Do you think I can use your algorithm?
Also, does it automatically correct for non linearity of the lens? For example:
at the bottom of the image….almost the entire width of the image will be equal to width of the road…but looking further away along the road into the horizon (top of the image) …the width of the road would correspond to only a small number of pixels. And this effect is non linear as the distance from camera increases. What do you think about that?
You calculate the width and the height of the output image, but when you are listing destination points of the output image you are subtracting 1 from width and height. wouldn’t that make the destination image smaller than the source and different from the output image shape? I don’t understand that step.
Thanks a lot!
The details allowed me to reproduce it in VB, and I am very thankful for your help.
With respect, from France.
You are welcome, I’m glad it helped you!
Hi,
First of all, a great tutorial. Really helped me a lot.
I am having an issue with ordering the coordinates. I noticed that all of the pictures in your tutorial are shaped more like a parallelogram. I have an image and the source coordinates form more of a trapezoidal shape. I am getting a wrong order if I follow the algorithm you have described.
for reference, my source coordinates are
src_points = [ [118,96], [200,96], [302,140], [13.3,140] ]
The points representing top left, top right, bottom right and bottom left respectively.
Any suggestions or tips about how to go around this? Is there something I am doing wrong?
Thank you for the great tutorial
how i can detect the four points of the image i want to transform?
Refer to this tutorial.
Very clear and helpful tutorial! Thank you for that! Could you please provide some references for how warpPerspective works? I work on a project with encrypted images and I need to implement this function from the beginning. Thx!