In this tutorial, you will learn how to visualize class activation maps for debugging deep neural networks using an algorithm called Grad-CAM. We’ll then implement Grad-CAM using Keras and TensorFlow.
While deep learning has facilitated unprecedented accuracy in image classification, object detection, and image segmentation, one of their biggest problems is model interpretability, a core component in model understanding and model debugging.
In practice, deep learning models are treated as “black box” methods, and many times we have no reasonable idea as to:
- Where the network is “looking” in the input image
- Which series of neurons activated in the forward-pass during inference/prediction
- How the network arrived at its final output
That raises an interesting question — how can you trust the decisions of a model if you cannot properly validate how it arrived there?
To help deep learning practitioners visually debug their models and properly understand where it’s “looking” in an image, Selvaraju et al. created Gradient-weighted Class Activation Mapping, or more simply, Grad-CAM:
Grad-CAM uses the gradients of any target concept (say logits for “dog” or even a caption), flowing into the final convolutional layer to produce a coarse localization map highlighting the important regions in the image for predicting the concept.”
Using Grad-CAM, we can visually validate where our network is looking, verifying that it is indeed looking at the correct patterns in the image and activating around those patterns.
If the network is not activating around the proper patterns/objects in the image, then we know:
- Our network hasn’t properly learned the underlying patterns in our dataset
- Our training procedure needs to be revisited
- We may need to collect additional data
- And most importantly, our model is not ready for deployment.
Grad-CAM is a tool that should be in any deep learning practitioner’s toolbox — take the time to learn how to apply it now.
To learn how to use Grad-CAM to debug your deep neural networks and visualize class activation maps with Keras and TensorFlow, just keep reading!
Looking for the source code to this post?
Jump Right To The Downloads SectionGrad-CAM: Visualize class activation maps with Keras, TensorFlow, and Deep Learning
In the first part of this article, I’ll share with you a cautionary tale on the importance of debugging and visually verifying that your convolutional neural network is “looking” at the right places in an image.
From there, we’ll dive into Grad-CAM, an algorithm that can be used visualize the class activation maps of a Convolutional Neural Network (CNN), thereby allowing you to verify that your network is “looking” and “activating” at the correct locations.
We’ll then implement Grad-CAM using Keras and TensorFlow.
After our Grad-CAM implementation is complete, we’ll look at a few examples of visualizing class activation maps.
Why would we want to visualize class activation maps in Convolutional Neural Networks?
There’s an old urban legend in the computer vision community that researchers use to caution budding machine learning practitioners against the dangers of deploying a model without first verifying that it’s working properly.
In this tale, the United States Army wanted to use neural networks to automatically detect camouflaged tanks.
Researchers assigned to the project gathered a dataset of 200 images:
- 100 of which contained camouflaged tanks hiding in trees
- 100 of which did not contain tanks and were images solely of trees/forest
The researchers took this dataset and then split it into an even 50/50 training and testing split, ensuring the class labels were balanced.
A neural network was trained on the training set and obtained a 100% accuracy. The researchers were incredibly pleased with this result and eagerly applied it to to their testing data. Once again, they obtained 100% accuracy.
The researchers called the Pentagon, excited with the news that they had just “solved” camouflaged tank detection.
A few weeks later, the research team received a call from the Pentagon — they were extremely unhappy with the performance of the camouflaged tank detector. The neural network that performed so well in the lab was performing terribly in the field.
Flummoxed, the researchers returned to their experiments, training model after model using different training procedures, only to arrive at the same result — 100% accuracy on both their training and testing sets.
It wasn’t until one clever researcher visually inspected their dataset and finally realized the problem:
- Photos of camouflaged tanks were captured on sunny days
- Images of the forest (without tanks) were captured on cloudy days
Essentially, the U.S. Army had created a multimillion dollar cloud detector.
While not true, this old urban legend does a good job illustrating the importance of model interoperability.
Had the research team had an algorithm like Grad-CAM, they would have noticed that the model was activating around the presence/absence of clouds, and not the tanks themselves (hence their problem).
Grad-CAM would have saved taxpayers millions of dollars, and not to mention, allowed the researchers to save face with the Pentagon — after a catastrophe like that, it’s unlikely they would be getting any more work or research grants.
What is Gradient-weighted Class Activation Mapping (Grad-CAM) and why would we use it?
As a deep learning practitioner, it’s your responsibility to ensure your model is performing correctly. One way you can do that is to debug your model and visually validate that it is “looking” and “activating” at the correct locations in an image.
To help deep learning practitioners debug their networks, Selvaraju et al. published a novel paper entitled, Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.
This method is:
- Easily implemented
- Works with nearly any Convolutional Neural Network architecture
- Can be used to visually debug where a network is looking in an image
Grad-CAM works by (1) finding the final convolutional layer in the network and then (2) examining the gradient information flowing into that layer.
The output of Grad-CAM is a heatmap visualization for a given class label (either the top, predicted label or an arbitrary label we select for debugging). We can use this heatmap to visually verify where in the image the CNN is looking.
For more information on how Grad-CAM works, I would recommend you read Selvaraju et al.’s paper as well as this excellent article by Divyanshu Mishra (just note that their implementation will not work with TensorFlow 2.0 while ours does work with TF 2.0).
Configuring your development environment
In order to use our Grad-CAM implementation, we need to configure our system with a few software packages including:
- TensorFlow (2.0 recommended)
- OpenCV
- imutils
Luckily, each of these packages is pip-installable. My personal recommendation is for you to follow one of my TensorFlow 2.0 installation tutorials:
- How to install TensorFlow 2.0 on Ubuntu (Ubuntu 18.04 OS; CPU and optional NVIDIA GPU)
- How to install TensorFlow 2.0 on macOS (Catalina and Mojave OSes)
Please note: PyImageSearch does not support Windows — refer to our FAQ. While we do not support Windows, the code presented in this blog post will work on Windows with a properly configured system.
Either of those tutorials will teach you how to configure a Python virtual environment with all the necessary software for this tutorial. I highly encourage virtual environments for Python work — industry considers them a best practice as well. If you’ve never worked with a Python virtual environment, you can learn more about them in this RealPython article.
Once your system is configured, you are ready to follow the rest of this tutorial.
Project structure
Let’s inspect our tutorial’s project structure. But first, be sure to grab the code and example images from the “Downloads” section of this blog post. From there, extract the files, and use the tree
command in your terminal:
$ tree --dirsfirst . ├── images │ ├── beagle.jpg │ ├── soccer_ball.jpg │ └── space_shuttle.jpg ├── pyimagesearch │ ├── __init__.py │ └── gradcam.py └── apply_gradcam.py 2 directories, 6 files
The pyimagesearch
module today contains the Grad-CAM implementation inside the GradCAM
class.
Our apply_gradcam.py
driver script accepts any of our sample images/
and applies either a VGG16 or ResNet CNN trained on ImageNet to both (1) compute the Grad-CAM heatmap and (2) display the results in an OpenCV window.
Let’s dive into the implementation.
Implementing Grad-CAM using Keras and TensorFlow
Despite the fact that the Grad-CAM algorithm is relatively straightforward, I struggled to find a TensorFlow 2.0-compatible implementation.
The closest one I found was in tf-explain; however, that method could only be used when training — it could not be used after a model had been trained.
Therefore, I decided to create my own Grad-CAM implementation, basing my work on that of tf-explain, ensuring that my Grad-CAM implementation:
- Is compatible with Keras and TensorFlow 2.0
- Could be used after a model was already trained
- And could also be easily modified to work as a callback during training (not covered in this post)
Let’s dive into our Keras and TensorFlow Grad-CAM implementation.
Open up the gradcam.py
file in your project directory structure, and let’s get started:
# import the necessary packages from tensorflow.keras.models import Model import tensorflow as tf import numpy as np import cv2 class GradCAM: def __init__(self, model, classIdx, layerName=None): # store the model, the class index used to measure the class # activation map, and the layer to be used when visualizing # the class activation map self.model = model self.classIdx = classIdx self.layerName = layerName # if the layer name is None, attempt to automatically find # the target output layer if self.layerName is None: self.layerName = self.find_target_layer()
Before we define the GradCAM
class, we need to import several packages. These include a TensorFlow Model
for which we will construct our gradient model, NumPy for mathematical calculations, and OpenCV.
Our GradCAM
class and constructor are then defined beginning on Lines 7 and 8. The constructor accepts and stores:
- A TensorFlow
model
which we’ll use to compute a heatmap - The
classIdx
— a specific class index that we’ll use to measure our class activation heatmap - An optional CONV
layerName
of the model in case we want to visualize the heatmap of a specific layer of our CNN; otherwise, if a specific layer name is not provided, we will automatically infer on the final CONV/POOL layer of themodel
architecture (Lines 18 and 19)
Now that our constructor is defined and our class attributes are set, let’s define a method to find our target layer:
def find_target_layer(self): # attempt to find the final convolutional layer in the network # by looping over the layers of the network in reverse order for layer in reversed(self.model.layers): # check to see if the layer has a 4D output if len(layer.output_shape) == 4: return layer.name # otherwise, we could not find a 4D layer so the GradCAM # algorithm cannot be applied raise ValueError("Could not find 4D layer. Cannot apply GradCAM.")
Our find_target_layer
function loops over all layers in the network in reverse order, during which time it checks to see if the current layer has a 4D output (implying a CONV or POOL layer).
If find such a 4D output, we return
that layer name (Lines 24-27).
Otherwise, if the network does not have a 4D output, then we cannot apply Grad-CAM, at which point, we raise
a ValueError
exception, causing our program to stop (Line 31).
In our next function, we’ll compute our visualization heatmap, given an input image
:
def compute_heatmap(self, image, eps=1e-8): # construct our gradient model by supplying (1) the inputs # to our pre-trained model, (2) the output of the (presumably) # final 4D layer in the network, and (3) the output of the # softmax activations from the model gradModel = Model( inputs=[self.model.inputs], outputs=[self.model.get_layer(self.layerName).output, self.model.output])
Line 33 defines the compute_heatmap
method, which is the heart of our Grad-CAM. Let’s take this implementation one step at a time to learn how it works.
First, our Grad-CAM requires that we pass in the image
for which we want to visualize class activations mappings for.
From there, we construct our gradModel
(Lines 38-41), which consists of both an input and an output:
inputs
: The standard image input to themodel
outputs
: The outputs of thelayerName
class attribute used to generate the class activation mappings. Notice how we callget_layer
on themodel
itself while also grabbing theoutput
of that specific layer
Once our gradient model is constructed, we’ll proceed to compute gradients:
# record operations for automatic differentiation with tf.GradientTape() as tape: # cast the image tensor to a float-32 data type, pass the # image through the gradient model, and grab the loss # associated with the specific class index inputs = tf.cast(image, tf.float32) (convOutputs, predictions) = gradModel(inputs) loss = predictions[:, self.classIdx] # use automatic differentiation to compute the gradients grads = tape.gradient(loss, convOutputs)
Going forward, we need to understand the definition of automatic differentiation and what TensorFlow calls a gradient tape.
First, automatic differentiation is the process of computing a value and computing derivatives of that value (CS321 Toronto, Wikipedia).
TenorFlow 2.0 provides an implementation of automatic differentiation through what they call gradient tape:
TensorFlow provides the
tf.GradientTape
API for automatic differentiation — computing the gradient of a computation with respect to its input variables. TensorFlow “records” all operations executed inside the context of atf.GradientTape
onto a “tape”. TensorFlow then uses that tape and the gradients associated with each recorded operation to compute the gradients of a “recorded” computation using reverse mode differentiation” (TensorFlow’s Automatic differentiation and gradient tape Tutorial).
I suggest you spend some time on TensorFlow’s GradientTape documentation, specifically the gradient method, which we will now use.
We start recording operations for automatic differentiation using GradientTape
(Line 44).
Line 48 accepts the input image
and casts it to a 32-bit floating point type. A forward pass through the gradient model (Line 49) produces the convOutputs
and predictions
of the layerName
layer.
We then extract the loss
associated with our predictions
and specific classIdx
we are interested in (Line 50).
Notice that our inference stops at the specific layer we are concerned about. We do not need to compute a full forward pass.
Line 53 uses automatic differentiation to compute the gradients that we will call grads
(Line 53).
Given our gradients, we’ll now compute guided gradients:
# compute the guided gradients castConvOutputs = tf.cast(convOutputs > 0, "float32") castGrads = tf.cast(grads > 0, "float32") guidedGrads = castConvOutputs * castGrads * grads # the convolution and guided gradients have a batch dimension # (which we don't need) so let's grab the volume itself and # discard the batch convOutputs = convOutputs[0] guidedGrads = guidedGrads[0]
First, we find all outputs and gradients with a value > 0
and cast them from a binary mask to a 32-bit floating point data type (Lines 56 and 57).
Then we compute the guided gradients by multiplication (Line 58).
Keep in mind that both castConvOutputs
and castGrads
contain only values of 1’s and 0’s; therefore, during this multiplication if any of castConvOutputs
, castGrads
, and grads
are zero, then the output value for that particular index in the volume will be zero.
Essentially, what we are doing here is finding positive values of both castConvOutputs
and castGrads
, followed by multiplying them by the gradient of the differentiation — this operation will allow us to visualize where in the volume the network is activating later in the compute_heatmap
function.
The convolution and guided gradients have a batch dimension that we don’t need. Lines 63 and 64 grab the volume itself and discard the batch from convOutput
and guidedGrads
.
We’re closing in on our visualization heatmap; let’s continue:
# compute the average of the gradient values, and using them # as weights, compute the ponderation of the filters with # respect to the weights weights = tf.reduce_mean(guidedGrads, axis=(0, 1)) cam = tf.reduce_sum(tf.multiply(weights, convOutputs), axis=-1)
Line 69 computes the weights
of the gradient values by computing the mean of the guidedGrads
, which is essentially a 1 x 1 x N average across the volume.
We then take those weights
and sum the ponderated (i.e., mathematically weighted) maps into the Grad-CAM visualization (cam
) on Line 70.
Our next step is to generate the output heatmap associated with our image:
# grab the spatial dimensions of the input image and resize # the output class activation map to match the input image # dimensions (w, h) = (image.shape[2], image.shape[1]) heatmap = cv2.resize(cam.numpy(), (w, h)) # normalize the heatmap such that all values lie in the range # [0, 1], scale the resulting values to the range [0, 255], # and then convert to an unsigned 8-bit integer numer = heatmap - np.min(heatmap) denom = (heatmap.max() - heatmap.min()) + eps heatmap = numer / denom heatmap = (heatmap * 255).astype("uint8") # return the resulting heatmap to the calling function return heatmap
We grab the original dimensions of input image and scale our cam
mapping to the original image dimensions (Lines 75 and 76).
From there, we perform min-max rescaling to the range [0, 1] and then convert the pixel values back to the range [0, 255] (Lines 81-84).
Finally, the last step of our compute_heatmap
method returns the heatmap
to the caller.
Given that we have computed our heatmap, now we’d like a method to transparently overlay the Grad-CAM heatmap on our input image.
Let’s go ahead and define such a utility:
def overlay_heatmap(self, heatmap, image, alpha=0.5, colormap=cv2.COLORMAP_VIRIDIS): # apply the supplied color map to the heatmap and then # overlay the heatmap on the input image heatmap = cv2.applyColorMap(heatmap, colormap) output = cv2.addWeighted(image, alpha, heatmap, 1 - alpha, 0) # return a 2-tuple of the color mapped heatmap and the output, # overlaid image return (heatmap, output)
Our heatmap produced by the previous compute_heatmap
function is a single channel, grayscale representation of where the network activated in the image — larger values correspond to a higher activation, smaller values to a lower activation.
In order to overlay the heatmap, we first need to apply a pseudo/false-color to the heatmap. To do so, we will use OpenCV’s built in VIRIDIS colormap (i.e., cv2.COLORMAP_VIRIDIS
).
The temperature of the VIRIDIS is shown below:
Notice how darker input grayscale values will result in a dark purple RGB color, while lighter input grayscale values will map to a light green or yellow.
Lines 93 applies the color map to the input heatmap
using the VIRIDIS.
From there, we transparently overlay the heatmap on our output
visualization (Line 94). The alpha channel is directly weighted into the BGR image (i.e., we are not adding an alpha channel to the image). To learn more about transparent overlays, I suggest you read my Transparent overlays with OpenCV tutorial.
Finally, Line 98 returns a 2-tuple of the heatmap
(with the VIRIDIS colormap applied) along with the output
visualization image.
Creating the Grad-CAM visualization script
With our Grad-CAM implementation complete, we can now move on to the driver script used to apply it for class activation mapping.
As stated previously, our apply_gradcam.py
driver script accepts an image and performs inference using either a VGG16 or ResNet CNN trained on ImageNet to both (1) compute the Grad-CAM heatmap and (2) display the results in an OpenCV window.
You will be able to use this visualization script to actually “see” what is going on under the hood of your deep learning model, which many critics say is too much of a “black box” especially when it comes to public safety concerns such as self-driving cars.
Let’s dive in by opening up the apply_gradcam.py
in your project structure and inserting the following code:
# import the necessary packages from pyimagesearch.gradcam import GradCAM from tensorflow.keras.applications import ResNet50 from tensorflow.keras.applications import VGG16 from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.preprocessing.image import load_img from tensorflow.keras.applications import imagenet_utils import numpy as np import argparse import imutils import cv2 # construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--image", required=True, help="path to the input image") ap.add_argument("-m", "--model", type=str, default="vgg", choices=("vgg", "resnet"), help="model to be used") args = vars(ap.parse_args())
This script’s most notable imports are our GradCAM
implementation, ResNet/VGG architectures, and OpenCV.
Our script accepts two command line arguments:
--image
: The path to our input image which we seek to both classify and apply Grad-CAM to.--model
: The deep learning model we would like to apply. By default, we will use VGG16 with our Grad-CAM. Alternatively, you can specify ResNet50. Yourchoices
in this example are limited tovgg
orresenet
entered directly in your terminal when you type the command, but you can modify this script to work with your own architectures as well.
Given the --model
argument, let’s load our model:
# initialize the model to be VGG16 Model = VGG16 # check to see if we are using ResNet if args["model"] == "resnet": Model = ResNet50 # load the pre-trained CNN from disk print("[INFO] loading model...") model = Model(weights="imagenet")
Lines 23-31 load either VGG16 or ResNet50 with pre-trained ImageNet weights.
Alternatively, you could load your own model; we’re using VGG16 and ResNet50 in our example and for the sake of simplicity.
Next, we’ll load and preprocess our --image
:
# load the original image from disk (in OpenCV format) and then # resize the image to its target dimensions orig = cv2.imread(args["image"]) resized = cv2.resize(orig, (224, 224)) # load the input image from disk (in Keras/TensorFlow format) and # preprocess it image = load_img(args["image"], target_size=(224, 224)) image = img_to_array(image) image = np.expand_dims(image, axis=0) image = imagenet_utils.preprocess_input(image)
Given our input image (provided via command line argument), Line 35 loads it from disk in OpenCV BGR format while Line 40 loads the same image in TensorFlow/Keras RGB format.
Our first pre-processing step resizes the image to 224×224 pixels (Line 36 and Line 40).
If at this stage we inspect the .shape of our image , you’ll notice the shape of the NumPy array is (224, 224, 3)
— each image is 224 pixels wide and 224 pixels tall, and has 3 channels (one for each of the Red, Green, and Blue channels, respectively).
However, before we can pass our image
through our CNN for classification, we need to expand the dimensions to be (1, 224, 224, 3)
.
Why do we do this?
When classifying images using Deep Learning and Convolutional Neural Networks, we often send images through the network in “batches” for efficiency. Thus, it’s actually quite rare to pass only one image at a time through the network — unless of course, you only have one image to classify and apply Grad-MAP to (like we do).
Thus, we convert the image to an array and add a batch dimension (Lines 41 and 42).
We then preprocess the image on Line 43 by subtracting the mean RGB pixel intensity computed from the ImageNet dataset (i.e., mean subtraction).
For the purposes of classification (i.e., not Grad-CAM yet), next we’ll make predictions on the image with our model:
# use the network to make predictions on the input image and find # the class label index with the largest corresponding probability preds = model.predict(image) i = np.argmax(preds[0]) # decode the ImageNet predictions to obtain the human-readable label decoded = imagenet_utils.decode_predictions(preds) (imagenetID, label, prob) = decoded[0][0] label = "{}: {:.2f}%".format(label, prob * 100) print("[INFO] {}".format(label))
Line 47 performs inference, passing our image
through our CNN.
We then find the class label
index with largest corresponding probability (Lines 48-53).
Alternatively, you could hardcode the class label index you want to visualize for if you believe your model is struggling with a particular class label and you want to visualize the class activation mappings for it.
At this point, we’re ready to compute our Grad-CAM heatmap visualization:
# initialize our gradient class activation map and build the heatmap cam = GradCAM(model, i) heatmap = cam.compute_heatmap(image) # resize the resulting heatmap to the original input image dimensions # and then overlay heatmap on top of the image heatmap = cv2.resize(heatmap, (orig.shape[1], orig.shape[0])) (heatmap, output) = cam.overlay_heatmap(heatmap, orig, alpha=0.5)
To apply Grad-CAM, we instantiate a GradCAM
object with our model
and highest probability class index, i
(Line 57).
Then we compute the heatmap — the heart of Grad-CAM lies in the compute_heatmap
method (Line 58).
We then scale/resize the heatmap to our original input dimensions and overlay the heatmap on our output
image with 50% alpha transparency (Lines 62 and 63).
Finally, we produce a stacked visualization consisting of (1) the original image, (2) the heatmap, and (3) the heatmap transparently overlaid on the original image with the predicted class label:
# draw the predicted label on the output image cv2.rectangle(output, (0, 0), (340, 40), (0, 0, 0), -1) cv2.putText(output, label, (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (255, 255, 255), 2) # display the original image and resulting heatmap and output image # to our screen output = np.vstack([orig, heatmap, output]) output = imutils.resize(output, height=700) cv2.imshow("Output", output) cv2.waitKey(0)
Lines 66-68 draw the predicted class label on the top of the output
Grad-CAM image.
We then stack our three images for visualization, resize to a known height
that will fit on our screen, and display the result in an OpenCV window (Lines 72-75).
In the next section, we’ll apply Grad-CAM to three sample images and see if the results meet our expectations.
Visualizing class activation maps with Grad-CAM, Keras, and TensorFlow
To use Grad-CAM to visualize class activation maps, make sure you use the “Downloads” section of this tutorial to download our Keras and TensorFlow Grad-CAM implementation.
From there, open up a terminal, and execute the following command:
$ python apply_gradcam.py --image images/space_shuttle.jpg [INFO] loading model... [INFO] space_shuttle: 100.00%
Here you can see that VGG16 has correctly classified our input image as space shuttle
with 100% confidence — and by looking at our Grad-CAM output in Figure 4, we can see that VGG16 is correctly activating around patterns on the space shuttle, verifying that the network is behaving as expected.
Let’s try another image:
$ python apply_gradcam.py --image images/beagle.jpg [INFO] loading model... [INFO] beagle: 73.94%
This time, we are passing in an image of my dog, Janie. VGG16 correctly labels the image as beagle
.
Examining the Grad-CAM output in Figure 5, we can see that VGG16 is activating around the face of Janie, indicating that my dog’s face is an important characteristic used by the network to classify her as a beagle.
Let’s examine one final image, this time using the ResNet architecture:
$ python apply_gradcam.py --image images/soccer_ball.jpg --model resnet [INFO] loading model... [INFO] soccer_ball: 99.97%
Our soccer ball
is correctly classified with 99.97% accuracy, but what is more interesting is the class activation visualization in Figure 6 — notice how our network is effectively ignoring the soccer field, activating only around the soccer ball.
This activation behavior verifies that our model has correctly learned the soccer ball
class during training.
After training your own CNNs, I would strongly encourage you to apply Grad-CAM and visually verify that your model is learning the patterns that you think it learning (and not some other pattern that occurs by happenstance in your dataset).
What's next? I recommend PyImageSearch University.
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 30+ Certificates of Completion
- ✓ 39h 44m on-demand video
- ✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial, you learned about Grad-CAM, an algorithm that can be used to visualize class activation maps and debug your Convolutional Neural Networks, ensuring that your network is “looking” at the correct locations in an image.
Keep in mind that if your network is performing well on your training and testing sets, there is still a chance that your accuracy resulted by accident or happenstance!
Your “high accuracy” model may be activating under patterns you did not notice or perceive in the image dataset.
I would suggest you make a conscious effort to incorporate Grad-CAM into your own deep learning pipelines and visually verify that your model is performing correctly.
The last thing you want to do is deploy a model that you think is performing well but in reality is activating under patterns irrelevant to the objects in images you want to recognize.
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
Mid
Hi Adrian , thanks again for this great tutorial.
Adrian , would it be possible to know how to apply the grad-cam , but using our own model , in this case , the small vggnet that you have implemented , in the “Keras and Convolutional Neural Networks” for Pokemon classification (tutorial) :
https://www.pyimagesearch.com/2018/04/16/keras-and-convolutional-neural-networks-cnns/
I would really appreciate any help or suggestion on how to achieve that.
Thank you.
Adrian Rosebrock
Swap out Line 31 (where we load our pre-trained VGG16/ResNet model) and replace it with a call to “load_model” to load your own custom model from disk.
Mid
Hi Adrian .
Thank you for the replay.
In reality , i’ve kind of figured out , what were the changes to make in the apply_gradcam script, the only thing is , in that tutorial , for the pokemon classification , we’ve used a ‘customed’ architecture , it was the SmallVggNet , and since the model function of keras dosen’t support it , i was wondering , what were the changes to make in the gradcam script , since the get_layer method will not work on our SmallVggNet.
Thank’s again Adrain.
Adrian Rosebrock
Unfortunately, it’s hard to say without first seeing your implementation.
anas bin iftikhar
Hi Adrian,
Finally what I have been waiting for. I am trying to use gradCAM in one of your tutorial (video classification) and want to see activations on sports video. Tried to port your apply_gradcam.py to the main prediction script without any errors.
However, the heatmap is overlayed on the entire frame and I see horizontal lines as well.
Is there a way I can post the screenshot of the predicted video frames for you to visualize the error?
Adrian Rosebrock
Perhaps send me an email of it instead? Instead of trying to jump immediately to video classification you should verify the code is working on a single frame first. From there you can extend it to other methods.
Chakib BELAFDIL
Hello Adrian,
First I would like to tell you that I read all your blogs since I discovered your website. Thank you for feeding it regularly.
So I have several questions about your article :
1) In the line 21 to 31, you look for a 4D shape layer in CNNs but I thought that only the input layer is 4D (batch_size, height, width, channels) and the other layers (which are a stack of feature maps) are 5D (number_filters, batch_size, height, width, channels). Can you be more specific about this 4D layer ?
2) Can GradCam be used with an object detection or an instance segmentation model such as Mask R-CNN ?
3) Can GradCam be used to switch the CNN from an classification model to a classification and localization model ? That is to say, use the heatmap and draw a bounding box around the heat.
Adrian Rosebrock
Thanks for being a regular reader, Chakib. Have you considered grabbing a copy of Deep Learning for Computer Vision with Python? That book discusses input shapes to standard layers and would better help you understand the fundamentals.
Additionally, yes, Grad-CAM can be used for other architectures outside of classification models.
You could also use Grad-CAM as a “weak” form of detection, but it’s not a robust solution and won’t work nearly as well as training a dedicated object detector such as Faster R-CNNs, SSDs, etc.
Luke
How would you implement a custom trained model (.h5 & .json) rather than using resnet or vgg? I imagine you mess with this line “model = Model(weights=”imagenet”)” though I don’t know how. Any advice would be appreciated.
Adrian Rosebrock
Hey Luke — see my reply to Mid as I have answered that question there.
Abu-Abdurrahman
Hi Adrian,
Thank you for the massive contribution your making, seriously I am not aware of another blog that is so detailed and dedicated as you’re. Thank you
Coming to my question, I have applied transfer learning to my dataset. I only trained the additional full connected layers that I added from the baseline model(VGG16) i.e keeping ImageNet weights unchanged and only train the additional FC layers. How can I visualize the feature map to get an intuition of where the model is looking? Or do I have to adjust all model weights so as to allow the filters to learn new patterns?
Adrian Rosebrock
Thanks so much, I appreciate the kind words.
As for your question, no further modification of the code is required. Just load your custom, fine-tuned model from disk (after training, of course) and apply Grad-CAM to it.
Gary Cao
Hi Adrian,
Thanks again for a very nice tutorial.
I noticed your implementation is slightly different from the paper. Specifically, the code from line 55-line 70 are different from Eq(1) and (2) in the paper. Is this intentional?
Thanks.
Sri
Hello Mr. Rosebrock,
Thank you so much for your tutorials. I have been regular reader and follow you on twitter as well. Thank you for helping students like to learn about deep learning and computer vision.
I wanted to know if we can obtain a gradient image from the heatmap (just like in the paper Fig2 that shows the Guided Backprop part of image)
Steve Frank
Dear Dr. Rosebrock,
Really enjoyed this tutorial — I finally understand GradCAM thanks to your explanation and code. I’m trying to apply it to a Keras CNN I built with 5 convolutional layers to analyze art images, but in analyzing a single image I run into an error when compute_heatmap runs — specifically, I get a TypeError: unhashable type: ‘list’, which seems to be thrown by a statement in the Keras engine: if len(set(self.inputs)) != len(self.inputs).
If you’ve seen this and have any suggestions, I’d be grateful.
Thanks.
Luke
Hi Adrian,
These tutorials have been super helpful and your engagement is a blessing. Thank you for that.
It appears that GradCam requires a 4D model. The initial model I made was done with the simple_nn from this tutorial (https://www.pyimagesearch.com/2018/09/10/keras-tutorial-how-to-get-started-with-keras-deep-learning-and-python/) in the flattened version. To use GradCam, do I need to retrain my model with a 4d version or is there something I can do instead?
Thanks,
Luke
Adrian Rosebrock
Do you mean you trained a simple fully-connected model? If so, yes, you would need to train a proper CNN instead.
El Jefe
Hi Adrian,
Can this easily be extended to a regression task? The applications of Grad-CAM I’ve seen only address classification.
Thanks for a nice tutorial!
Christoph Ostertag
Hi!
If the gradients are to small the heatmap will be zero everywhere. Therefore, we need to normalize the gradients.
(I had the problem applying grad-cam to the covid19 x-ray detection.)
One easy way to do this would be the following:
grads = grads / (grads.numpy().max()-grads.numpy().min())
Chris
Adrian Rosebrock
Thanks for sharing, Chris!
Christian
Hi Adrian,
Thanks for this great resource!
I’m trying to use this grad-cam-implementation with my own model based on InceptionResNetV2 but all I get is blank heatmaps.
I also tried it with the default InceptionResNetV2 but still get blank heatmaps (except for space_shuttle.jpg for some reson?)
I have changed the following in the code when using the default InceptionResNetV2 model:
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2
Model = InceptionResNetV2
Also changed the input sizes to (299,299)
Any idea what can be wrong?
Christian
Never mind,
I think I solved it by lowering the eps value like this:
cam.compute_heatmap(image, eps=1e-16)
Avigyan Sinha
Dear Dr. Rosebrock,
Thanks for this wonderful post. I have a very small dataset for binary classification. I am using fine-tuning and transfer learning to train a VGG16 model pre-trained on ImageNet. I have removed the dense fully connected layers at the end and added a global average pooling (GAP) layer and then train an SVM on the GAP output. How/on which layer should I apply GradCAM to visualize the discriminative parts of the test image? Looking forward to your advice.
-With sincere regards,
Avigyan Sinha.
(Johns Hopkins University, USA)
Adrian Rosebrock
I would be looking at the final CONV layer prior to the output of the global average pooling.
Avigyan Sinha
Thanks. After transfer learning and finetuning with VGG16(pre-trained on ImageNet), for the small dataset in case of my binary classification problem I am getting both training and validation accuracy of the order of 99%.
In your gradcam.py, while passing the model, I specify “block5_conv3” as layerName. But when I compute “grads = tape.gradient(loss, convOutputs)”, I get all zero terms in grads. So my final gradcam map is also blank/flat. Could you please advice what could be the reason behind this and how to solve it.
Also, before I finetuned my pretrained VGG16, I removed the FC layers as:
baseModel = VGG16(weights=”imagenet”,include_top=False,..).
This however does not remove the final Maxpool layer. Should I remove the final Maxpool layer before adding GAP layer and finetuning? If not can using 2 pool layers successively(maxpool followed by GAP) give poor results.
Looking forward to your advice.
-Avigyan
Adrian Rosebrock
The pooling layer shouldn’t be the issue. Are you applying min/max normalization to your gradients in Grad-CAM? That may be the issue.
Mubashir
Hi! Adrian I am getting error: “Could not import PIL.Image. The use of `load_img` requires PIL.” I am unable to install PIL because some weird error is generated. When I am using tf.keras.preprocessing.image.load_img(), why I am being asked to import PIL?
Adrian Rosebrock
Keras/TensorFlow uses the PIL library for very basic image processing. You need to have PIL installed so I would suggest resolving whatever error you are encountering when trying to install PIL.
Rubiks R.
Hi Adrian, I have a few questions,
1. how to determine the optimal number of neurons in fully connected layer on cnn? and
2. does using global average pooling layer produce better heatmap for localization than using flatten layer, after the last conv. layer, when using grad-cam ?.
Thanks in advance.