Smile detection with OpenCV, Keras, and TensorFlow

In this tutorial, we will be building a complete end-to-end application that can detect smiles in a video stream in real-time using deep learning along with traditional computer vision techniques.

To accomplish this task, we’ll be training the LetNet architecture on a dataset of images that contain faces of people who are smiling and not smiling. Once our network is trained, we’ll create a separate Python script — this one will detect faces in images via OpenCV’s built-in Haar cascade face detector, extract the face region of interest (ROI) from the image, and then pass the ROI through LeNet for smile detection.

To learn how to detect a smile with OpenCV, Keras, and TensorFlow, just keep reading.

Looking for the source code to this post?

Smile detection with OpenCV, Keras, and TensorFlow

When developing real-world applications for image classification, you’ll often have to mix traditional computer vision and image processing techniques with deep learning. I’ve done my best to ensure this tutorial stands on its own in terms of algorithms, techniques, and libraries you need to understand in order to be successful when studying and applying deep learning.

Configuring your development environment

To follow this guide, you need to have the OpenCV library installed on your system.

Luckily, OpenCV is pip-installable:

$ pip install opencv-contrib-python

If you need help configuring your development environment for OpenCV, I highly recommend that you read my pip install OpenCV guide — it will have you up and running in a matter of minutes.

Having problems configuring your development environment?

**Figure 1:** Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?
Learning on your employer’s administratively locked system?
Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?
Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

The SMILES Dataset

The SMILES dataset consists of images of faces that are either smiling or not smiling (Hromada, 2010). In total, there are 13,165 grayscale images in the dataset, with each image having a size of 64×64 pixels.

As Figure 2 demonstrates, images in this dataset are tightly cropped around the face, which will make the training process easier as we’ll be able to learn the “smiling” or “not smiling” patterns directly from the input images.

**Figure 2:** *Top:* Examples of “smiling” faces. *Bottom:* Samples of “not smiling” faces. In this tutorial, we will be training a Convolutional Neural Network to recognize between smiling and not smiling faces in real-time video streams.

However, the close cropping poses a problem during testing — since our input images will not only contain a face but the background of the image as well, we first need to localize the face in the image and extract the face ROI before we can pass it through our network for detection. Luckily, using traditional computer vision methods such as Haar cascades, this is a much easier task than it sounds.

A second issue we need to handle in the SMILES dataset is class imbalance. While there are 13,165 images in the dataset, 9,475 of these examples are not smiling, while only 3,690 belong to the smiling class. Given that there are over 2.5x the number of “not smiling” images to “smiling” examples, we need to be careful when devising our training procedure.

Our network may naturally pick the “not smiling” label since (1) the distributions are uneven and (2) it has more examples of what a “not smiling” face looks like. Later, you will see how we can combat class imbalance by computing a “weight” for each class during training time.

Training the Smile CNN

The first step in building our smile detector is to train a CNN on the SMILES dataset to distinguish between a face that is smiling versus not smiling. To accomplish this task, let’s create a new file named train_model.py. From there, insert the following code:

# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.utils import to_categorical
from pyimagesearch.nn.conv import LeNet
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import imutils
import cv2
import os

Lines 2-14 import our required Python packages. We’ve used all of the packages before, but I want to call your attention to Line 7, where we import the LeNet (LeNet Tutorial) class — this is the architecture we’ll be using when creating our smile detector.

Next, let’s parse our command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset of faces")
ap.add_argument("-m", "--model", required=True,
	help="path to output model")
args = vars(ap.parse_args())

# initialize the list of data and labels
data = []
labels = []

Our script will require two command line arguments, each of which I’ve detailed below:

--dataset: The path to the SMILES directory residing on disk.
--model: The path to where the serialized LeNet weights will be saved after training.

We are now ready to load the SMILES dataset from disk and store it in memory:

# loop over the input images
for imagePath in sorted(list(paths.list_images(args["dataset"]))):
	# load the image, pre-process it, and store it in the data list
	image = cv2.imread(imagePath)
	image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
	image = imutils.resize(image, width=28)
	image = img_to_array(image)
	data.append(image)

	# extract the class label from the image path and update the
	# labels list
	label = imagePath.split(os.path.sep)[-3]
	label = "smiling" if label == "positives" else "not_smiling"
	labels.append(label)

On Line 29, we loop over all images in the --dataset input directory. For each of these images, we:

Load it from disk (Line 31).
Convert it to grayscale (Line 32).
Resize it to have a fixed input size of 28×28 pixels (Line 33).
Convert the image to an array compatible with Keras and its channel ordering (Line 34).
Add the image to the data list that LeNet will be trained on.

Lines 39-41 handle extracting the class label from the imagePath and updating the labels list. The SMILES dataset stores smiling faces in the SMILES/positives/positives7 subdirectory, while not smiling faces live in the SMILES/negatives/negatives7 subdirectory.

Therefore, given the path to an image:

SMILEs/positives/positives7/10007.jpg

We can extract the class label by splitting on the image path separator and grabbing the third-to-last subdirectory: positives. In fact, this is exactly what Line 39 accomplishes.

Now that our data and labels are constructed, we can scale the raw pixel intensities to the range [0, 1] and then apply one-hot encoding to the labels:

# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)

# convert the labels from integers to vectors
le = LabelEncoder().fit(labels)
labels = to_categorical(le.transform(labels), 2)

Our next code block handles our data imbalance issue by computing the class weights:

# calculate the total number of training images in each class and
# initialize a dictionary to store the class weights
classTotals = labels.sum(axis=0)
classWeight = dict()

# loop over all classes and calculate the class weight
for i in range(0, len(classTotals)):
	classWeight[i] = classTotals.max() / classTotals[i]

Line 53 computes the total number of examples per class. In this case, classTotals will be an array: [9475, 3690] for “not smiling” and “smiling,” respectively.

We then scale these totals on Lines 57 and 58 to obtain the classWeight used to handle the class imbalance, yielding the array: [1, 2.56]. This weighting implies that our network will treat every instance of “smiling” as 2.56 instances of “not smiling” and helps combat the class imbalance issue by amplifying the per-instance loss by a larger weight when seeing “smiling” examples.

Now that we’ve computed our class weights, we can move on to partitioning our data into training and testing splits, using 80% of the data for training and 20% for testing:

# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
	labels, test_size=0.20, stratify=labels, random_state=42)

Finally, we are ready to train LeNet:

# initialize the model
print("[INFO] compiling model...")
model = LeNet.build(width=28, height=28, depth=1, classes=2)
model.compile(loss="binary_crossentropy", optimizer="adam",
	metrics=["accuracy"])

# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, validation_data=(testX, testY),
	class_weight=classWeight, batch_size=64, epochs=15, verbose=1)

Line 67 initializes the LeNet architecture that will accept 28×28 single channel images. Given that there are only two classes (smiling versus not smiling), we set classes=2.

We’ll also be using binary_crossentropy rather than categorical_crossentropy as our loss function. Again, categorical cross-entropy is only used when the number of classes is more than two.

Up until this point, we’ve been using the SGD optimizer to train our network. Here, we’ll be using Adam (Kingma and Ba, 2014) (Line 68).

Again, the optimizer and associated parameters are often considered hyperparameters that you need to tune when training your network. When I put this example together, I found that Adam performed substantially better than SGD.

Lines 73 and 74 train LeNet for a total of 15 epochs using our supplied classWeight to combat class imbalance.

Once our network is trained, we can evaluate it and serialize the weights to disk:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1),
	predictions.argmax(axis=1), target_names=le.classes_))

# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"])

We’ll also construct a learning curve for our network so we can visualize performance:

# plot the training + testing loss and accuracy
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 15), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 15), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 15), H.history["accuracy"], label="acc")
plt.plot(np.arange(0, 15), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.show()

To train our smile detector, execute the following command:

$ python train_model.py --dataset ../datasets/SMILEsmileD \
	--model output/lenet.hdf5
[INFO] compiling model...
[INFO] training network...
Train on 10532 samples, validate on 2633 samples
Epoch 1/15
8s - loss: 0.3970 - acc: 0.8161 - val_loss: 0.2771 - val_acc: 0.8872
Epoch 2/15
8s - loss: 0.2572 - acc: 0.8919 - val_loss: 0.2620 - val_acc: 0.8899
Epoch 3/15
7s - loss: 0.2322 - acc: 0.9079 - val_loss: 0.2433 - val_acc: 0.9062
...
Epoch 15/15
8s - loss: 0.0791 - acc: 0.9716 - val_loss: 0.2148 - val_acc: 0.9351
[INFO] evaluating network...
             precision    recall  f1-score   support

not_smiling       0.95      0.97      0.96      1890
    smiling       0.91      0.86      0.88       743

avg / total       0.93      0.94      0.93      2633

[INFO] serializing network...

After 15 epochs, we can see that our network is obtaining 93% classification accuracy. Figure 3 plots our learning curve:

**Figure 3:** A plot of the learning curve for the LeNet architecture trained on the SMILES dataset. After fifteen epochs we are obtaining ≈93% classification accuracy on our testing set.

Past epoch six our validation loss starts to stagnate — further training past epoch 15 would result in overfitting. If desired, we would improve the accuracy of our smile detector by using more training data, either by:

Gathering additional training data.
Applying data augmentation to randomly translate, rotate, and shift our existing training set.

Running the Smile CNN in Real-time

Now that we’ve trained our model, the next step is to build the Python script to access our webcam/video file and apply smile detection to each frame. To accomplish this step, open a new file, name it detect_smile.py, and we’ll get to work.

# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import cv2

Lines 2-7 import our required Python packages. The img_to_array function will be used to convert each individual frame from our video stream to a properly channel ordered array. The load_model function will be used to load the weights of our trained LeNet model from disk.

The detect_smile.py script requires two command line arguments followed by a third optional one:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-c", "--cascade", required=True,
	help="path to where the face cascade resides")
ap.add_argument("-m", "--model", required=True,
	help="path to pre-trained smile detector CNN")
ap.add_argument("-v", "--video",
	help="path to the (optional) video file")
args = vars(ap.parse_args())

The first argument, --cascade is the path to a Haar cascade used to detect faces in images. First published in 2001, Paul Viola and Michael Jones detail the Haar cascade in their work, Rapid Object Detection using a Boosted Cascade of Simple Features. This publication has become one of the most cited papers in the computer vision literature.

The Haar cascade algorithm is capable of detecting objects in images, regardless of their location and scale. Perhaps most intriguing (and relevant to our application), the detector can run in real-time on modern hardware. In fact, the motivation behind Viola and Jones’ work was to create a face detector.

The second common line argument, --model, specifies the path to our serialized LeNet weights on disk. Our script will default to reading frames from a built-in/USB webcam; however, if we instead want to read frames from a file, we can specify the file via the optional --video switch.

Before we can detect smiles, we first need to perform some initializations:

# load the face detector cascade and smile detector CNN
detector = cv2.CascadeClassifier(args["cascade"])
model = load_model(args["model"])

# if a video path was not supplied, grab the reference to the webcam
if not args.get("video", False):
	camera = cv2.VideoCapture(0)

# otherwise, load the video
else:
	camera = cv2.VideoCapture(args["video"])

Lines 20 and 21 load the Haar cascade face detector and the pre-trained LeNet model, respectively. If a video path was not supplied, we grab a pointer to our webcam (Lines 24 and 25). Otherwise, we open a pointer to the video file on disk (Lines 28 and 29).

We have now reached the main processing pipeline of our application:

# keep looping
while True:
	# grab the current frame
	(grabbed, frame) = camera.read()

	# if we are viewing a video and we did not grab a frame, then we
	# have reached the end of the video
	if args.get("video") and not grabbed:
		break

	# resize the frame, convert it to grayscale, and then clone the
	# original frame so we can draw on it later in the program
	frame = imutils.resize(frame, width=300)
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	frameClone = frame.copy()

Line 32 starts a loop that will continue until (1) we stop the script or (2) we reach the end of the video file (provided a --video path was applied).

Line 34 grabs the next frame from the video stream. If the frame could not be grabbed, then we have reached the end of the video file. Otherwise, we pre-process the frame for face detection by resizing it to have a width of 300 pixels (Line 43) and converting it to grayscale (Line 44).

The .detectMultiScale method handles detecting the bounding box (x, y)-coordinates of faces in the frame:

	# detect faces in the input frame, then clone the frame so that
	# we can draw on it
	rects = detector.detectMultiScale(gray, scaleFactor=1.1, 
		minNeighbors=5, minSize=(30, 30),
		flags=cv2.CASCADE_SCALE_IMAGE)

Here, we pass in our grayscale image and indicate that for a given region to be considered a face it must have a minimum width of 30×30 pixels. The minNeighbors attribute helps prune false positives while the scaleFactor controls the number of image pyramid (http://pyimg.co/rtped) levels generated.

Again, a detailed review of Haar cascades for object detection is outside the scope of this tutorial.

The .detectMultiScale method returns a list of 4-tuples that make up the rectangle that bounds the face in the frame. The first two values in this list are the starting (x, y)-coordinates. The second two values in the rects list are the width and height of the bounding box, respectively.

We loop over each set of bounding boxes below:

	# loop over the face bounding boxes
	for (fX, fY, fW, fH) in rects:
		# extract the ROI of the face from the grayscale image,
		# resize it to a fixed 28x28 pixels, and then prepare the
		# ROI for classification via the CNN
		roi = gray[fY:fY + fH, fX:fX + fW]
		roi = cv2.resize(roi, (28, 28))
		roi = roi.astype("float") / 255.0
		roi = img_to_array(roi)
		roi = np.expand_dims(roi, axis=0)

For each of the bounding boxes, we use NumPy array slicing to extract the face ROI (Line 58). Once we have the ROI, we preprocess it and prepare it for classification via LeNet by resizing it, scaling it, converting it to a Keras-compatible array, and padding the image with an extra dimension (Lines 59-62).

Once the roi is preprocessed, it can be passed through LeNet for classification:

		# determine the probabilities of both "smiling" and "not
		# smiling", then set the label accordingly
		(notSmiling, smiling) = model.predict(roi)[0]
		label = "Smiling" if smiling > notSmiling else "Not Smiling"

A call to .predict on Line 66 returns the probabilities of “not smiling” and “smiling,” respectively. Line 67 sets the label depending on which probability is larger.

Once we have the label, we can draw it, along with the corresponding bounding box on the frame:

		# display the label and bounding box rectangle on the output
		# frame
		cv2.putText(frameClone, label, (fX, fY - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)
		cv2.rectangle(frameClone, (fX, fY), (fX + fW, fY + fH),
			(0, 0, 255), 2)

Our final code block handles displaying the output frame on our screen:

	# show our detected faces along with smiling/not smiling labels
	cv2.imshow("Face", frameClone)

	# if the 'q' key is pressed, stop the loop
	if cv2.waitKey(1) & 0xFF == ord("q"):
		break

# cleanup the camera and close any open windows
camera.release()
cv2.destroyAllWindows()

If the q key is pressed, we exit the script.

To run detect_smile.py using your webcam, execute the following command:

$ python detect_smile.py --cascade haarcascade_frontalface_default.xml \
	--model output/lenet.hdf5

If you instead want to use a video file, you would update your command to use the --video switch:

$ python detect_smile.py --cascade haarcascade_frontalface_default.xml \
	--model output/lenet.hdf5 --video path/to/your/video.mov

I have included the results of the smile detection script in Figure 4:

**Figure 4:** Applying our CNN to recognize smiling vs. not-smiling in real-time video streams on a CPU.

Notice how LeNet is correctly predicting “smiling” or “not smiling” based on my facial expression.

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, we learned how to build an end-to-end computer vision and deep learning application to perform smile detection. To do so, we first trained the LeNet architecture on the SMILES dataset. Due to class imbalances in the SMILES dataset, we discovered how to compute class weights used to help mitigate the problem.

Once trained, we evaluated LeNet on our testing set and found the network obtained a respectable 93% classification accuracy. Higher classification accuracy can be obtained by gathering more training data or applying data augmentation to existing training data.

We then created a Python script to read frames from a webcam/video file, detect faces, and then apply our pre-trained network. To detect faces, we used OpenCV’s Haar cascades. Once a face was detected it was extracted from the frame and then passed through LeNet to determine if the person was smiling or not smiling. As a whole, our smile detection system can easily run in real-time on the CPU using modern hardware.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

Looking for the source code to this post?

Smile detection with OpenCV, Keras, and TensorFlow

Configuring your development environment

Having problems configuring your development environment?

The SMILES Dataset

Training the Smile CNN

Running the Smile CNN in Real-time

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Comment section

How to Build a Kick-Ass Mobile Document Scanner in Just 5 Minutes

Accessing RPi.GPIO and GPIO Zero with OpenCV + Python

Anomaly detection with Keras, TensorFlow, and Deep Learning

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Smile detection with OpenCV, Keras, and TensorFlow

Configuring your development environment

Having problems configuring your development environment?

The SMILES Dataset

Training the Smile CNN

Running the Smile CNN in Real-time

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Intro to PyTorch: Training your first neural network using PyTorch

PyTorch: Training your first Convolutional Neural Network (CNN)

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?