Normally, I only publish blog posts on Monday, but I’m so excited about this one that it couldn’t wait and I decided to hit the publish button early.
You see, just a few days ago, François Chollet pushed three Keras models (VGG16, VGG19, and ResNet50) online — these networks are pre-trained on the ImageNet dataset, meaning that they can recognize 1,000 common object classes out-of-the-box.
To utilize these models in your own applications, all you need to do is:
- Install Keras.
- Download the weights files for the pre-trained network(s) (which we’ll be done automatically for you when you import and instantiate the respective network architecture).
- Apply the pre-trained ImageNet networks to your own images.
It’s really that simple.
So, why is this so exciting? I mean, we’ve had the weights to popular pre-trained ImageNet classification networks for awhile, right?
The problem is that these weight files are in Caffe format — and while the Caffe library may be the current standard for which many researchers use to construct new network architectures, train them, and evaluate them, Caffe also isn’t the most Python-friendly library in the world, at least in terms of constructing the network architecture itself.
Note: You can do some pretty cool stuff with the Caffe-Python bindings, but I’m mainly focusing on how Caffe architectures and the training process itself is defined via .prototxt
configuration files rather than code that logic can be inserted into.
There is also the fact that there isn’t an easy or streamlined method to convert Caffe weights to a Keras-compatible model.
That’s all starting to change now — we can now easily apply VGG16, VGG19, and ResNet50 using Keras and Python to our own applications without having to worry about the Caffe => Keras weight conversion process.
In fact, it’s now as simple as these three lines of code to classify an image using a Convolutional Neural Network pre-trained on the ImageNet dataset with Python and Keras:
model = VGG16(weights="imagenet") preds = model.predict(preprocess_input(image)) print(decode_predictions(preds))
Of course, there are a few other imports and helper functions that need to be utilized — but I think you get the point:
It’s now dead simple to apply ImageNet-level pre-trained networks using Python and Keras.
To find out how, keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionImageNet classification with Python and Keras
In the remainder of this tutorial, I’ll explain what the ImageNet dataset is, and then provide Python and Keras code to classify images into 1,000 different categories using state-of-the-art network architectures.
What is ImageNet?
Within computer vision and deep learning communities, you might run into a bit of contextual confusion surrounding what ImageNet is and what it isn’t.
You see, ImageNet is actually a project aimed at labeling and categorizing images into almost 22,000 categories based on a defined set of words and phrases. At the time of this writing, there are over 14 million images in the ImageNet project.
So, how is ImageNet organized?
To order such a massive amount of data, ImageNet actually follows the WordNet hierarchy. Each meaningful word/phrase inside WordNet is called a “synonym set” or “synset” for short. Within the ImageNet project, images are organized according to these synsets, with the goal being to have 1,000+ images per synset.
ImageNet Large Scale Recognition Challenge (ILSVRC)
In the context of computer vision and deep learning, whenever you hear people talking about ImageNet, they are very likely referring to the ImageNet Large Scale Recognition Challenge, or simply ILSVRC for short.
The goal of the image classification track in this challenge is to train a model that can classify an image into 1,000 separate categories using over 100,000 test images — the training dataset itself consists of approximately 1.2 million images.
Be sure to keep the context of ImageNet in mind when you’re reading the remainder of this blog post or other tutorials and papers related to ImageNet. While in the context of image classification, object detection, and scene understanding, we often refer to ImageNet as the classification challenge and the dataset associated with the challenge, remember that there is also a more broad project called ImageNet where these images are collected, annotated, and organized.
Configuring your system for Keras and ImageNet
To configure your system to use the state-of-the-art VGG16, VGG19, and ResNet50 networks, make sure you follow my latest tutorial on installing Keras on Ubuntu or on macOS. GPU Ubuntu users should see this tutorial.
The Keras library will use PIL/Pillow for some helper functions (such as loading an image from disk). You can install Pillow, the more Python friendly fork of PIL, by using this command:
$ pip install pillow
To run the networks pre-trained on the ImageNet dataset with Python, you’ll need to make sure you have the latest version of Keras installed. At the time of this writing, the latest version of Keras is 1.0.6
, the minimum requirement for utilizing the pre-trained models.
You can check your version of Keras by executing the following commands:
$ python Python 3.6.3 (default, Oct 4 2017, 06:09:15) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import keras Using TensorFlow backend. >>> keras.__version__ '2.2.0' >>>
Alternatively, you can use pip freeze
to list the out the packages installed in your environment:
If you are using an earlier version of Keras prior to 2.0.0
, uninstall it, and then use my previous tutorial to install the latest version.
Keras and Python code for ImageNet CNNs
We are now ready to write some Python code to classify image contents utilizing Convolutional Neural Networks (CNNs) pre-trained on the ImageNet dataset.
To start, open up a new file, name it test_imagenet.py
, and insert the following code:
# import the necessary packages from keras.preprocessing import image as image_utils from keras.applications.imagenet_utils import decode_predictions from keras.applications.imagenet_utils import preprocess_input from keras.applications import VGG16 import numpy as np import argparse import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") args = vars(ap.parse_args()) # load the original image via OpenCV so we can draw on it and display # it to our screen later orig = cv2.imread(args["image"])
We start on Lines 2-8 by importing our required Python packages. Line 2 imports the image
pre-processing module directly from the Keras library.
Lines 11-14 parse our command line arguments. We only need a single switch here, --image
, which is the path to our input image.
We then load our image in OpenCV format on Line 18. This step isn’t strictly required since Keras provides helper functions to load images (which I’ll demonstrate in the next code block), but there are differences in how both these functions work, so if you intend on applying any type of OpenCV functions to your images, I suggest loading your image via cv2.imread
and then again via the Keras helpers. Once you get a bit more experience manipulating NumPy arrays and swapping channels, you can avoid the extra I/O overhead, but for the time being, let’s keep things simple.
# load the input image using the Keras helper utility while ensuring # that the image is resized to 224x224 pxiels, the required input # dimensions for the network -- then convert the PIL image to a # NumPy array print("[INFO] loading and preprocessing image...") image = image_utils.load_img(args["image"], target_size=(224, 224)) image = image_utils.img_to_array(image)
Line 25 applies the .load_img
Keras helper function to load our image from disk. We supply a target_size
of 224 x 224 pixels, the required spatial input image dimensions for the VGG16, VGG19, and ResNet50 network architectures.
After calling .load_img
, our image
is actually in PIL/Pillow format, so we need to apply the .img_to_array
function to convert the image
to a NumPy format.
Next, let’s preprocess our image:
# our image is now represented by a NumPy array of shape (224, 224, 3), # assuming TensorFlow "channels last" ordering of course, but we need # to expand the dimensions to be (1, 3, 224, 224) so we can pass it # through the network -- we'll also preprocess the image by subtracting # the mean RGB pixel intensity from the ImageNet dataset image = np.expand_dims(image, axis=0) image = preprocess_input(image)
If at this stage we inspect the .shape
of our image
, you’ll notice the shape of the NumPy array is (3, 224, 224) — each image is 224 pixels wide, 224 pixels tall, and has 3 channels (one for each of the Red, Green, and Blue channels, respectively).
However, before we can pass our image
through our CNN for classification, we need to expand the dimensions to be (1, 3, 224, 224).
Why do we do this?
When classifying images using Deep Learning and Convolutional Neural Networks, we often send images through the network in “batches” for efficiency. Thus, it’s actually quite rare to pass only one image at a time through the network — unless of course, you only have one image to classify (like we do).
We then preprocess the image
on Line 34 by subtracting the mean RGB pixel intensity computed from the ImageNet dataset.
Finally, we can load our Keras network and classify the image:
# load the VGG16 network pre-trained on the ImageNet dataset print("[INFO] loading network...") model = VGG16(weights="imagenet") # classify the image print("[INFO] classifying image...") preds = model.predict(image) P = decode_predictions(preds) # loop over the predictions and display the rank-5 predictions + # probabilities to our terminal for (i, (imagenetID, label, prob)) in enumerate(P[0]): print("{}. {}: {:.2f}%".format(i + 1, label, prob * 100)) # load the image via OpenCV, draw the top prediction on the image, # and display the image to our screen orig = cv2.imread(args["image"]) (imagenetID, label, prob) = P[0][0] cv2.putText(orig, "Label: {}, {:.2f}%".format(label, prob * 100), (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2) cv2.imshow("Classification", orig) cv2.waitKey(0)
On Line 38 we initialize our VGG16
class. We could also substitute in VGG19
or ResNet50
here, but for the sake of this tutorial, we’ll use VGG16
.
Supplying weights="imagenet"
indicates that we want to use the pre-trained ImageNet weights for the respective model.
Once the network has been loaded and initialized, we can predict class labels by making a call to the .predict
method of the model
. These predictions are actually a NumPy array with 1,000 entries — the predicted probabilities associated with each class in the ImageNet dataset.
Calling decode_predictions
on these predictions gives us the ImageNet Unique ID of the label, along with a human-readable text version of the label.
Finally, Lines 47-57 print the predicted label
to our terminal and display the output image to our screen.
ImageNet + Keras image classification results
To apply the Keras models pre-trained on the ImageNet dataset to your own images, make sure you use the “Downloads” form at the bottom of this blog post to download the source code and example images. This will ensure your code is properly formatted (without errors) and your directory structure is correct.
But before we can apply our pre-trained Keras models to our own images, let’s first discuss how the model weights are (automatically) downloaded.
Downloading the model weights
The first time you execute the test_imagenet.py
script, Keras will automatically download and cache the architecture weights to your disk in the ~/.keras/models
directory.
Subsequent runs of test_imagenet.py
will be substantially faster (since the network weights will already be downloaded) — but that first run will be quite slow (comparatively), due to the download process.
That said, keep in mind that these weights are fairly large HDF5 files and might take awhile to download if you do not have a fast internet connection. For convenience, I have listed out the size of the weights files for each respective network architecture:
- ResNet50: 102MB
- VGG16: 553MB
- VGG19: 574MB
ImageNet and Keras results
We are now ready to classify images using the pre-trained Keras models! To test out the models, I downloaded a couple images from Wikipedia (“brown bear” and “space shuttle”) — the rest are from my personal library.
To start, execute the following command:
$ python test_imagenet.py --image images/dog_beagle.png
Notice that since this is my first run of test_imagenet.py
, the weights associated with the VGG16 ImageNet model need to be downloaded:
Once our weights are downloaded, the VGG16 network is initialized, the ImageNet weights loaded, and the final classification is obtained:
Let’s give another image a try, this one of a beer glass:
$ python test_imagenet.py --image images/beer.png
The following image is of a brown bear:
$ python test_imagenet.py --image images/brown_bear.png
I took the following photo of my keyboard to test out the ImageNet network using Python and Keras:
$ python test_imagenet.py --image images/keyboard.png
I then took a photo of my monitor as I was writing the code for this blog post. Interestingly, the network classified this image as “desktop computer”, which makes sense given that the monitor is the primary subject of the image:
$ python test_imagenet.py --image images/monitor.png
This next image is of a space shuttle:
$ python test_imagenet.py --image images/space_shuttle.png
The final image is of a steamed crab, a blue crab, to be specific:
$ python test_imagenet.py --image images/steamed_crab.png
What I find interesting about this particular example is that VGG16 classified this image as “Menu” while “Dungeness Crab” is equally as prominent in the image.
Furthermore, this is actually not a Dungeness crab in the image — it’s actually a blue crab that has been steamed so it’s shell has turned red. Dungeness crabs are naturally red. A blue crab only turns red after it’s been steamed prior to eating.
A note on model timing
From start to finish (not including the downloading of the network weights files), classifying an image using VGG16 took approximately 11 seconds on my Titan X GPU. This includes the process of actually loading both the image and network from disk, performing any initializations, passing the image through the network, and obtaining the final predictions.
However, once the network is actually loaded into memory, classification takes only 1.8 seconds, which goes to show you how much overhead is involved in actually loading an initializing a large Convolutional Neural Network. Furthermore, since images can be presented to the network in batches, this same time for classification will hold for multiple images.
If you’re classifying images on your CPU, then you should obtain a similar classification time. This is mainly because there is substantial overhead in copying the image from memory over to the GPU. When you pass multiple images via batches, it makes the I/O overhead for using the GPU more acceptable.
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this blog post, I demonstrated how to use the newly released deep-learning-models repository to classify image contents using state-of-the-art Convolutional Neural Networks trained on the ImageNet dataset.
To accomplish this, we leveraged the Keras library, which is maintained by François Chollet — be sure to reach out to him and say thanks for maintaining such an incredible library. Without Keras, deep learning with Python wouldn’t be half as easy (or as fun).
Of course, you might be wondering how to train your own Convolutional Neural Network from scratch using ImageNet. Don’t worry, we’re getting there — we just need to understand the basics of neural networks, machine learning, and deep learning first. Walk before you run, so to speak.
I’ll be back next week with a tutorial on hyperparameter tuning, a key step to maximizing your model’s accuracy.
To be notified when future blog posts are published on the PyImageSearch blog, be sure to enter your email address in the form below — se you next week!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!