Autoencoders with Keras, TensorFlow, and Deep Learning

In this tutorial, you will learn how to implement and train autoencoders using Keras, TensorFlow, and Deep Learning.

Today’s tutorial kicks off a three-part series on the applications of autoencoders:

Autoencoders with Keras, TensorFlow, and Deep Learning (today’s tutorial)
Denoising autoenecoders with Keras and TensorFlow (next week’s tutorial)
Anomaly detection with Keras, TensorFlow, and Deep Learning (tutorial two weeks from now)

A few weeks ago, I published an introductory guide to anomaly/outlier detection using standard machine learning algorithms.

My intention was to immediately follow up that post with a a guide on deep learning-based anomaly detection; however, as I started writing the code for the tutorial, I realized I had never covered autoencoders on the PyImageSearch blog!

Trying to discuss deep learning-based anomaly detection without prior context on what autoencoders are and how they work would be challenging to follow, comprehend, and digest.

Therefore, we’re going to spend the next couple of weeks looking at autoencoder algorithms, including their practical, real-world applications.

To learn about the fundamentals of autoencoders using Keras and TensorFlow, just keep reading!

Looking for the source code to this post?

Autoencoders with Keras, TensorFlow, and Deep Learning

In the first part of this tutorial, we’ll discuss what autoencoders are, including how convolutional autoencoders can be applied to image data. We’ll also discuss the difference between autoencoders and other generative models, such as Generative Adversarial Networks (GANs).

From there, I’ll show you how to implement and train a convolutional autoencoder using Keras and TensorFlow.

We’ll then review the results of the training script, including visualizing how the autoencoder did at reconstructing the input data.

Finally, I’ll recommend next steps to you if you are interested in learning more about deep learning applied to image datasets.

What are autoencoders?

Autoencoders are a type of unsupervised neural network (i.e., no class labels or labeled data) that seek to:

Accept an input set of data (i.e., the input).
Internally compress the input data into a latent-space representation (i.e., a single vector that compresses and quantifies the input).
Reconstruct the input data from this latent representation (i.e., the output).

Typically, we think of an autoencoder having two components/subnetworks:

Encoder: Accepts the input data and compresses it into the latent-space. If we denote our input data as $x$ and the encoder as $E$ , then the output latent-space representation, $s$ , would be $s = E(x)$ .
Decoder: The decoder is responsible for accepting the latent-space representation $s$ and then reconstructing the original input. If we denote the decoder function as $D$ and the output of the detector as $o$ , then we can represent the decoder as $o = D(s)$ .

Using our mathematical notation, the entire training process of the autoencoder can be written as:

$o = D(E(x))$

Figure 1 below demonstrates the basic architecture of an autoencoder:

**Figure 1:** Autoencoders with Keras, TensorFlow, Python, and Deep Learning don’t have to be complex. Breaking the concept down to its parts, you’ll have an input image that is passed through the autoencoder which results in a similar output image. (figure inspired by Nathan Hubens’ article, *Deep inside: Autoencoders*)

Here you can see that:

We input a digit to the autoencoder.
The encoder subnetwork creates a latent representation of the digit. This latent representation is substantially smaller (in terms of dimensionality) than the input.
The decoder subnetwork then reconstructs the original digit from the latent representation.

You can thus think of an autoencoder as a network that reconstructs its input!

To train an autoencoder, we input our data, attempt to reconstruct it, and then minimize the mean squared error (or similar loss function).

Ideally, the output of the autoencoder will be near identical to the input.

An autoencoder reconstructs it’s input — so what’s the big deal?

**Figure 2:** Autoencoders are useful for compression, dimensionality reduction, denoising, and anomaly/outlier detection. In this tutorial, we’ll use Python and Keras/TensorFlow to train a deep learning autoencoder. (image source)

At this point, some of you might be thinking:

Adrian, what’s the big deal here?
If the goal of an autoencoder is just to reconstruct the input, why even use the network in the first place?
If I wanted a copy of my input data, I could literally just copy it with a single function call.
Why on earth would I apply deep learning and go through the trouble of training a network?

This question, although a legitimate one, does indeed contain a large misconception regarding autoencoders.

Yes, during the training process, our goal is to train a network that can learn how to reconstruct our input data — but the true value of the autoencoder lives inside that latent-space representation.

Keep in mind that autoencoders compress our input data and, more to the point, when we train autoencoders, what we really care about is the encoder, $E$ , and the latent-space representation, $s = E(x)$ .

The decoder, $o = D(s)$ , is used to train the autoencoder end-to-end, but in practical applications, we often (but not always) care more about the encoder and the latent-space.

Later in this tutorial, we’ll be training an autoencoder on the MNIST dataset. The MNIST dataset consists of digits that are 28×28 pixels with a single channel, implying that each digit is represented by 28 x 28 = 784 values. The autoencoder we’ll be training here will be able to compress those digits into a vector of only 16 values — that’s a reduction of nearly 98%!

So what can we do if an input data point is compressed into such a small vector?

That’s where things get really interesting.

What are applications of autoencoders?

Autoencoders are typically used for:

Dimensionality reduction (i.e., think PCA but more powerful/intelligent).
Denoising (ex., removing noise and preprocessing images to improve OCR accuracy).
Anomaly/outlier detection (ex., detecting mislabeled data points in a dataset or detecting when an input data point falls well outside our typical data distribution).

Outside of the computer vision field, you’ll see autoencoders applied to Natural Language Processing (NLP) and text comprehension problems, including understanding the semantic meaning of words, constructing word embeddings, and even text summarization.

How are autoencoders different from GANs?

If you’ve done any prior work with Generative Adversarial Networks (GANs), you might be wondering how autoencoders are different from GANs.

Both GANs and autoencoders are generative models; however, an autoencoder is essentially learning an identity function via compression.

The autoencoder will accept our input data, compress it down to the latent-space representation, and then attempt to reconstruct the input using just the latent-space vector.

Typically, the latent-space representation will have much fewer dimensions than the original input data.

GANs on the other hand:

Accept a low dimensional input.
Build a high dimensional space from it.
Generate the final output, which is not part of the original training data but ideally passes as such.

Furthermore, GANs have an evolving loss landscape, which autoencoders do not.

As a GAN is trained, the generative model generates “fake” images that are then mixed with actual “real” images — the discriminator model must then determine which images are “real” vs. “fake/generated”.

As the generative model becomes better and better at generating fake images that can fool the discriminator, the loss landscape evolves and changes (this is one of the reasons why training GANs is so damn hard).

While both GANs and autoencoders are generative models, most of their similarities end there.

Autoencoders cannot generate new, realistic data points that could be considered “passable” by humans. Instead, autoencoders are primarily used as a method to compress input data points into a latent-space representation. That latent-space representation can then be used for compression, denoising, anomaly detection, etc.

For more details on the differences between GANs and autoencoders, I suggest giving this thread on Quora a read.

Configuring your development environment

To follow along with today’s tutorial on autoencoders, you should use TensorFlow 2.0. I have two installation tutorials for TF 2.0 and associated packages to bring your development system up to speed:

How to install TensorFlow 2.0 on Ubuntu (Ubuntu 18.04 OS; CPU and optional NVIDIA GPU)
How to install TensorFlow 2.0 on macOS (Catalina and Mojave OSes)

Please note: PyImageSearch does not support Windows — refer to our FAQ.

Project structure

Be sure to grab the “Downloads” associated with the blog post. From there, extract the .zip and inspect the file/folder layout:

$ tree --dirsfirst
.
├── pyimagesearch
│   ├── __init__.py
│   └── convautoencoder.py
├── output.png
├── plot.png
└── train_conv_autoencoder.py

1 directory, 5 files

We will review two Python scripts today:

convautoencoder.py: Contains the ConvAutoencoder class and build method required to assemble our neural network with tf.keras.
train_conv_autoencoder.py: Trains a digits autoencoder on the MNIST dataset. Once the autoencoder is trained, we’ll loop over a number of output examples and write them to disk for later inspection.

Our training script results in both a plot.png figure and output.png image. The output image contains side-by-side samples of the original versus reconstructed image.

In the next section, we will implement our autoencoder with the high-level Keras API built into TensorFlow.

Implementing a convolutional autoencoder with Keras and TensorFlow

Before we can train an autoencoder, we first need to implement the autoencoder architecture itself.

To do so, we’ll be using Keras and TensorFlow.

My implementation loosely follows Francois Chollet’s own implementation of autoencoders on the official Keras blog. My primary contribution here is to go into a bit more detail regarding the implementation itself.

Open up the convautoencoder.py file in your project structure, and insert the following code:

# import the necessary packages
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import Conv2DTranspose
from tensorflow.keras.layers import LeakyReLU
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Reshape
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np

class ConvAutoencoder:
	@staticmethod
	def build(width, height, depth, filters=(32, 64), latentDim=16):
		# initialize the input shape to be "channels last" along with
		# the channels dimension itself
		# channels dimension itself
		inputShape = (height, width, depth)
		chanDim = -1

We begin with a selection of imports from tf.keras and one from NumPy. If you don’t have TensorFlow 2.0 installed on your system, refer to the “Configuring your development environment” section above.

Our ConvAutoencoder class contains one static method, build, which accepts five parameters:

width: Width of the input image in pixels.
height: Height of the input image in pixels.
depth: Number of channels (i.e., depth) of the input volume.
filters: A tuple that contains the set of filters for convolution operations. By default, this parameter includes both 32 and 64 filters.
latentDim: The number of neurons in our fully-connected (Dense) latent vector. By default, if this parameter is not passed, the value is set to 16.

From there, we initialize the inputShape and channel dimension (we assume “channels last” ordering).

We’re now ready to initialize our input and begin adding layers to our network:

		# define the input to the encoder
		inputs = Input(shape=inputShape)
		x = inputs

		# loop over the number of filters
		for f in filters:
			# apply a CONV => RELU => BN operation
			x = Conv2D(f, (3, 3), strides=2, padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)

		# flatten the network and then construct our latent vector
		volumeSize = K.int_shape(x)
		x = Flatten()(x)
		latent = Dense(latentDim)(x)

		# build the encoder model
		encoder = Model(inputs, latent, name="encoder")

Lines 25 and 26 define the input to the encoder.

With our inputs ready, we go loop over the number of filters and add our sets of CONV=>LeakyReLU=>BN layers (Lines 29-33).

Next, we flatten the network and construct our latent vector (Lines 36-38) — this is our actual latent-space representation (i.e., the “compressed” data representation).

We then build our encoder model (Line 41).

If we were to do a print(encoder.summary()) of the encoder, assuming 28×28 single channel images (depth=1) and filters=(32, 64) and latentDim=16, we would have the following:

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0
_________________________________________________________________
conv2d (Conv2D)              (None, 14, 14, 32)        320
_________________________________________________________________
leaky_re_lu (LeakyReLU)      (None, 14, 14, 32)        0
_________________________________________________________________
batch_normalization (BatchNo (None, 14, 14, 32)        128
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 7, 64)          18496
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 7, 7, 64)          0
_________________________________________________________________
batch_normalization_1 (Batch (None, 7, 7, 64)          256
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0
_________________________________________________________________
dense (Dense)                (None, 16)                50192
=================================================================
Total params: 69,392
Trainable params: 69,200
Non-trainable params: 192
_________________________________________________________________

Here we can observe that:

Our encoder begins by accepting a 28x28x1 input volume.
We then apply two rounds of CONV=>RELU=>BN, each with 3×3 strided convolution. The strided convolution allows us to reduce the spatial dimensions of our volumes.
After applying our final batch normalization, we end up with a 7x7x64 volume, which is flattened into a 3136-dim vector.
Our fully-connected layer (i.e., the Dense layer) serves our as our latent-space representation.

Next, let’s learn how the decoder model can take this latent-space representation and reconstruct the original input image:

		# start building the decoder model which will accept the
		# output of the encoder as its inputs
		latentInputs = Input(shape=(latentDim,))
		x = Dense(np.prod(volumeSize[1:]))(latentInputs)
		x = Reshape((volumeSize[1], volumeSize[2], volumeSize[3]))(x)

		# loop over our number of filters again, but this time in
		# reverse order
		for f in filters[::-1]:
			# apply a CONV_TRANSPOSE => RELU => BN operation
			x = Conv2DTranspose(f, (3, 3), strides=2,
				padding="same")(x)
			x = LeakyReLU(alpha=0.2)(x)
			x = BatchNormalization(axis=chanDim)(x)

To start building the decoder model, we:

Construct the input to the decoder model based on the latentDim. (Lines 45 and 46).
Accept the 1D latentDim vector and turn it into a 2D volume so that we can start applying convolution (Line 47).
Loop over the number of filters, this time in reverse order while applying a CONV_TRANSPOSE => RELU => BN operation (Lines 51-56).

Transposed convolution is used to increase the spatial dimensions (i.e., width and height) of the volume.

Let’s finish creating our autoencoder:

		# apply a single CONV_TRANSPOSE layer used to recover the
		# original depth of the image
		x = Conv2DTranspose(depth, (3, 3), padding="same")(x)
		outputs = Activation("sigmoid")(x)

		# build the decoder model
		decoder = Model(latentInputs, outputs, name="decoder")

		# our autoencoder is the encoder + decoder
		autoencoder = Model(inputs, decoder(encoder(inputs)),
			name="autoencoder")

		# return a 3-tuple of the encoder, decoder, and autoencoder
		return (encoder, decoder, autoencoder)

Wrapping up, we:

Apply a final CONV_TRANSPOSE layer used to recover the original channel depth of the image (1 channel for single channel/grayscale images or 3 channels for RGB images) on Line 60.
Apply a sigmoid activation function (Line 61).
Build the decoder model, and add it with the encoder to the autoencoder (Lines 64-68). The autoencoder becomes the encoder + decoder.
Return a 3-tuple of the encoder, decoder, and autoencoder.

If we were to complete a print(decoder.summary()) operation here, we would have the following:

Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(None, 16)]              0
_________________________________________________________________
dense_1 (Dense)              (None, 3136)              53312
_________________________________________________________________
reshape (Reshape)            (None, 7, 7, 64)          0
_________________________________________________________________
conv2d_transpose (Conv2DTran (None, 14, 14, 64)        36928
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 14, 14, 64)        0
_________________________________________________________________
batch_normalization_2 (Batch (None, 14, 14, 64)        256
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 28, 28, 32)        18464
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 28, 28, 32)        0
_________________________________________________________________
batch_normalization_3 (Batch (None, 28, 28, 32)        128
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 28, 28, 1)         289
_________________________________________________________________
activation (Activation)      (None, 28, 28, 1)         0
=================================================================
Total params: 109,377
Trainable params: 109,185
Non-trainable params: 192
_________________________________________________________________

The decoder accepts our 16-dim latent representation from the encoder and then builds a new fully-connected layer of 3136-dim, which is the product of 7 x 7 x 64 = 3136.

Using our new 3136-dim FC layer, we reshape it into a 3D volume of 7 x 7 x 64. From there we can start applying our CONV_TRANSPOSE=>RELU=>BN operation. Unlike standard strided convolution, which is used to decrease volume size, our transposed convolution is used to increase volume size.

Finally, a transposed convolution layer is applied to recover the original channel depth of the image. Since our images are grayscale, we learn a single filter, the output of which is a 28 x 28 x 1 volume (i.e., the dimensions of the original MNIST digit images).

A print(autoencoder.summary()) operation shows the composed nature of the encoder and decoder:

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 28, 28, 1)]       0
_________________________________________________________________
encoder (Model)              (None, 16)                69392
_________________________________________________________________
decoder (Model)              (None, 28, 28, 1)         109377
=================================================================
Total params: 178,769
Trainable params: 178,385
Non-trainable params: 384
_________________________________________________________________

The input to our encoder is the original 28 x 28 x 1 images from the MNIST dataset. Our encoder then learns a 16-dim latent-space representation of the data, after which the decoder reconstructs the original 28 x 28 x 1 images.

In the next section, we will develop our script to train our autoencoder.

Creating the convolutional autoencoder training script

With our autoencoder architecture implemented, let’s move on to the training script.

Open up the train_conv_autoencoder.py in your project directory structure, and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from pyimagesearch.convautoencoder import ConvAutoencoder
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import argparse
import cv2

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-s", "--samples", type=int, default=8,
	help="# number of samples to visualize when decoding")
ap.add_argument("-o", "--output", type=str, default="output.png",
	help="path to output visualization file")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output plot file")
args = vars(ap.parse_args())

On Lines 2-12, we handle our imports. We’ll use the "Agg" backend of matplotlib so that we can export our training plot to disk.

We need our custom ConvAutoencoder architecture class which we implemented in the previous section.

We will use the Adam optimizer as we train on the MNIST benchmarking dataset. For visualization, we’ll employ OpenCV.

Next, we’ll parse three command line arguments, all of which are optional:

--samples: The number of output samples for visualization. By default this value is set to 8.
--output: The path the output visualization image. We’ll name our visualization output.png by default
--plot: The path to our matplotlib output plot. A default of plot.png is assigned if this argument is not provided in the terminal.

Now we’ll set a couple hyperparameters and preprocess our MNIST dataset:

# initialize the number of epochs to train for and batch size
EPOCHS = 25
BS = 32

# load the MNIST dataset
print("[INFO] loading MNIST dataset...")
((trainX, _), (testX, _)) = mnist.load_data()

# add a channel dimension to every image in the dataset, then scale
# the pixel intensities to the range [0, 1]
trainX = np.expand_dims(trainX, axis=-1)
testX = np.expand_dims(testX, axis=-1)
trainX = trainX.astype("float32") / 255.0
testX = testX.astype("float32") / 255.0

Lines 25 and 26 initialize the batch size and number of training epochs.

From there, we’ll work with our MNIST dataset. TensorFlow/Keras has a handy load_data method that we can call on mnist to grab the data (Line 30). From there, Lines 34-37 (1) add a channel dimension to every image in the dataset and (2) scale the pixel intensities to the range [0, 1].

We’re now ready to build and train our autoencoder:

# construct our convolutional autoencoder
print("[INFO] building autoencoder...")
(encoder, decoder, autoencoder) = ConvAutoencoder.build(28, 28, 1)
opt = Adam(lr=1e-3)
autoencoder.compile(loss="mse", optimizer=opt)

# train the convolutional autoencoder
H = autoencoder.fit(
	trainX, trainX,
	validation_data=(testX, testX),
	epochs=EPOCHS,
	batch_size=BS)

To build the convolutional autoencoder, we call the build method on our ConvAutoencoder class and pass the necessary arguments (Line 41). Recall that this results in the (encoder, decoder, autoencoder) tuple — going forward in this script, we only need the autoencoder for training and predictions.

We initialize our Adam optimizer with an initial learning rate of 1e-3 and go ahead and compile it with mean-squared error loss (Lines 42 and 43).

From there, we fit (train) our autoencoder on the MNIST data (Lines 46-50).

Let’s go ahead and plot our training history:

# construct a plot that plots and saves the training history
N = np.arange(0, EPOCHS)
plt.style.use("ggplot")
plt.figure()
plt.plot(N, H.history["loss"], label="train_loss")
plt.plot(N, H.history["val_loss"], label="val_loss")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend(loc="lower left")
plt.savefig(args["plot"])

And from there, we’ll make predictions on our testing set:

# use the convolutional autoencoder to make predictions on the
# testing images, then initialize our list of output images
print("[INFO] making predictions...")
decoded = autoencoder.predict(testX)
outputs = None

# loop over our number of output samples
for i in range(0, args["samples"]):
	# grab the original image and reconstructed image
	original = (testX[i] * 255).astype("uint8")
	recon = (decoded[i] * 255).astype("uint8")

	# stack the original and reconstructed image side-by-side
	output = np.hstack([original, recon])

	# if the outputs array is empty, initialize it as the current
	# side-by-side image display
	if outputs is None:
		outputs = output

	# otherwise, vertically stack the outputs
	else:
		outputs = np.vstack([outputs, output])

# save the outputs image to disk
cv2.imwrite(args["output"], outputs)

Line 67 makes predictions on the test set. We then loop over the number of --samples passed as a command line argument (Line 71) so that we can build our visualization. Inside the loop, we:

Grab both the original and reconstructed images (Lines 73 and 74).
Stack the pair of images side-by-side (Line 77).
Stack the pairs vertically (Lines 81-86).
Finally, we output the visualization image to disk (Line 89).

In the next section, we’ll see the results of our hard work.

Training the convolutional autoencoder with Keras and TensorFlow

We are now ready to see our autoencoder in action!

Make sure you use the “Downloads” section of this post to download the source code — from there you can execute the following command:

$ python train_conv_autoencoder.py
[INFO] loading MNIST dataset...
[INFO] building autoencoder...
Train on 60000 samples, validate on 10000 samples
Epoch 1/25
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0188 - val_loss: 0.0108
Epoch 2/25
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0104 - val_loss: 0.0096
Epoch 3/25
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0094 - val_loss: 0.0086
Epoch 4/25
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0088 - val_loss: 0.0086
Epoch 5/25
60000/60000 [==============================] - 68s 1ms/sample - loss: 0.0084 - val_loss: 0.0080
...
Epoch 20/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0067 - val_loss: 0.0069
Epoch 21/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0066 - val_loss: 0.0069
Epoch 22/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0066 - val_loss: 0.0068
Epoch 23/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0066 - val_loss: 0.0068
Epoch 24/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0065 - val_loss: 0.0067
Epoch 25/25
60000/60000 [==============================] - 83s 1ms/sample - loss: 0.0065 - val_loss: 0.0068
[INFO] making predictions...

**Figure 4:** Our deep learning autoencoder training history plot was generated with matplotlib. Our autoencoder was trained with Keras, TensorFlow, and Deep Learning.

As Figure 4 and the terminal output demonstrate, our training process was able to minimize the reconstruction loss of the autoencoder.

But how well did the autoencoder do at reconstructing the training data?

The answer is very good:

**Figure 5:** A sample of of Keras/TensorFlow deep learning autoencoder inputs *(left)* and outputs *(right)*.

In Figure 5, on the left is our original image while the right is the reconstructed digit predicted by the autoencoder. As you can see, the digits are nearly indistinguishable from each other!

At this point, you may be thinking:

Great … so I can train a network to reconstruct my original image.

But you said that what really matters is the internal latent-space representation.

How can I access that representation, and how can I use it for denoising and anomaly/outlier detection?

Those are great questions — I’ll be addressing both in my next two tutorials here on PyImageSearch, so stay tuned!

What's next? I recommend PyImageSearch University.

Course information:
30+ total classes • 39h 44m video • Last updated: 12/2021
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

✓ 30+ courses on essential computer vision, deep learning, and OpenCV topics
✓ 30+ Certificates of Completion
✓ 39h 44m on-demand video
✓ Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
✓ Pre-configured Jupyter Notebooks in Google Colab
✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
✓ Access to centralized code repos for all 500+ tutorials on PyImageSearch
✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
✓ Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University

Summary

In this tutorial, you learned the fundamentals of autoencoders.

Autoencoders are generative models that consist of an encoder and a decoder model. When trained, the encoder takes input data point and learns a latent-space representation of the data. This latent-space representation is a compressed representation of the data, allowing the model to represent it in far fewer parameters than the original data.

The decoder model then takes the latent-space representation and attempts to reconstruct the original data point from it. When trained end-to-end, the encoder and decoder function in a composed manner.

In practice, we use autoencoders for dimensionality reduction, compression, denoising, and anomaly detection.

After we understood the fundamentals, we implemented a convolutional autoencoder using Keras and TensorFlow.

In next week’s tutorial, we’ll learn how to use a convolutional autoencoder for denoising.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Download the Source Code and FREE 17-page Resource Guide

Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!

About the Author

Hi there, I’m Adrian Rosebrock, PhD. All too often I see developers, students, and researchers wasting their time, studying the wrong things, and generally struggling to get started with Computer Vision, Deep Learning, and OpenCV. I created this website to show you what I believe is the best possible way to get your start.

20 responses to: Autoencoders with Keras, TensorFlow, and Deep Learning

Shyamala

February 17, 2020 at 7:00 pm

Hi sir ..I am a research scholar ..I need a guidance for doing text textt mining on deep learning using medical text..
- Adrian Rosebrock
  
  February 18, 2020 at 1:57 pm
  
  What do you mean by “medical text”?
- Gilles BONNET
  
  February 24, 2020 at 4:34 pm
  
  Catched your point : Medical doctors have awfull handwriting and only few can read them but medical world.. Sure a deep learning based system would be helpfull to “decode” their writings… but this is not the purpose of this article..
  BTW : very helpfull and well explained article.. Congratulations Adrian
  - Adrian Rosebrock
    
    February 27, 2020 at 9:03 am
    
    Thanks Gilles!
Akib

February 18, 2020 at 3:59 pm

Which is performing better nowadays in Anomaly Detection?
Does machine learning approach outperform deep learning approach from your experienc?
- Adrian Rosebrock
  
  February 20, 2020 at 9:14 am
  
  It’s really dependent on the project itself and how you define the “anomaly”. I’ll be going into more detail in the anomaly detection post so stay tuned!
panneer selvam

February 25, 2020 at 12:53 am

I have tensorflow 1.12.0 installed for my GPU. Please tell me whether your code of encoders will work in 1.12.0 version. Also I am using keras 2.2.4
- Adrian Rosebrock
  
  February 27, 2020 at 9:03 am
  
  This code was developed using TensorFlow 2.0. The code should still work but I have not tested with TensorFlow 1.12.
Alex Retana

February 25, 2020 at 4:13 pm

On lines 34-35, you extended the dimensions of the training and test data numpy array.

What purpose does this serve? I don’t quite see why it was done or where it becomes important for the future. Do you have any clarifications for this?
- Alex Retana
  
  February 25, 2020 at 4:16 pm
  
  I just thought about it more:
  
  Is this done because the MNIST Dataset comes in a single channel?
  - Adrian Rosebrock
    
    February 27, 2020 at 9:02 am
    
    Correct, we need to explicitly add the channel dimension.
Han

March 18, 2020 at 12:18 am

Thanks. Great post and well explained.
I have a question, I am using MobileNet (pre-trained from Keras), I want to apply autoencoders to it to enhance the result of the network (I want to build a small visual search).
But (from my understanding) Conv autoencoders are CNN itself, so, how can this be done?

Thanks again, and I appreciate your reply.
Mario

March 21, 2020 at 9:08 am

What is the difference between this model (encoder, decoder, autoencoder) and the sequential model? Does it have any advantages?

Thanks
Luke Seed

April 3, 2020 at 12:47 pm

Hi Adrian,

I went through this tutorial and had no problems with your code building (as I would expect), training, and reproducing the correct output.

I adapted the code for my image set (which has non-standard dimensions 74×385 and is color; so 74x385x3) which turned out to be quite the pain. Apparently the padding features of Conv2D don’t work very well and the final output shapes don’t match the input shapes. I eventually resized to 72x384x3 which seemed to allow me to train.

My problem is that when I go to see the output.png (final step) all I am getting is black output. Do you think the problem is that my autoencoder couldn’t train from the dataset or is there a code issue (from switching the resolution and to color rather than the greyscale MNIST)?
Thank you.
haha

April 4, 2020 at 10:12 am

I’m very appreciate about your wonderful post.
Just curious about the sequence between LeakyReLU and BN.
What is the difference between placing the BN layer before or after the activation function?
- Adrian Rosebrock
  
  April 9, 2020 at 8:43 am
  
  It depends on your architecture. I don’t like placing the activation before the BN as that squashes any activations less than 0. That doesn’t make sense prior to applying a BN layer. I discuss that in more detail inside Deep Learning for Computer Vision with Python.
Mubashir

April 10, 2020 at 7:14 am

Hi! Adrian thanks for your highly relevant tutorials. Would be grateful if you could kindly share a tutorial about using autoencoders for processing numeric data like those generated through batches from sensors. Definitely for such data, Convolutional Autoencoders might not be useful!
- Adrian Rosebrock
  
  April 15, 2020 at 1:06 pm
  
  Thanks Mubashir, I’m glad you enjoyed the blog post; however, PyImageSearch is a computer vision blog and I don’t typically cover non-CV tasks so it’s pretty unlikely I’ll cover a numeric data example.
Walid

April 11, 2020 at 6:19 pm

Great article

I think you meant: encoder.summary() and not print(encoder.summary())
- Adrian Rosebrock
  
  April 15, 2020 at 1:08 pm
  
  The “print” is required if you are executing it via the command line which this tutorial assumes you are doing.

Comment section

Hey, Adrian Rosebrock here, author and creator of PyImageSearch. While I love hearing from readers, a couple years ago I made the tough decision to no longer offer 1:1 help over blog post comments.

At the time I was receiving 200+ emails per day and another 100+ blog post comments. I simply did not have the time to moderate and respond to them all, and the sheer volume of requests was taking a toll on me.

Instead, my goal is to do the most good for the computer vision, deep learning, and OpenCV community at large by focusing my time on authoring high-quality blog posts, tutorials, and books/courses.

If you need help learning computer vision and deep learning, I suggest you refer to my full catalog of books and courses — they have helped tens of thousands of developers, students, and researchers just like yourself learn Computer Vision, Deep Learning, and OpenCV.

Click here to browse my full catalog.

Looking for the source code to this post?

Autoencoders with Keras, TensorFlow, and Deep Learning

What are autoencoders?

An autoencoder reconstructs it’s input — so what’s the big deal?

What are applications of autoencoders?

How are autoencoders different from GANs?

Configuring your development environment

Project structure

Implementing a convolutional autoencoder with Keras and TensorFlow

Creating the convolutional autoencoder training script

Training the convolutional autoencoder with Keras and TensorFlow

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

20 responses to: Autoencoders with Keras, TensorFlow, and Deep Learning

Comment section

An interview with Yi Shern, Machine Learning Engineer at 123RF R&D

Test iframe

Intro to anomaly detection with OpenCV, Computer Vision, and scikit-learn

Topics

Books & Courses

PyImageSearch

Looking for the source code to this post?

Autoencoders with Keras, TensorFlow, and Deep Learning

What are autoencoders?

An autoencoder reconstructs it’s input — so what’s the big deal?

What are applications of autoencoders?

How are autoencoders different from GANs?

Configuring your development environment

Project structure

Implementing a convolutional autoencoder with Keras and TensorFlow

Creating the convolutional autoencoder training script

Training the convolutional autoencoder with Keras and TensorFlow

What's next? I recommend PyImageSearch University.

Summary

Download the Source Code and FREE 17-page Resource Guide

About the Author

Reader Interactions

OpenCV ‘dnn’ with NVIDIA GPUs: 1549% faster YOLO, SSD, and Mask R-CNN

Denoising autoencoders with Keras, TensorFlow, and Deep Learning

20 responses to: Autoencoders with Keras, TensorFlow, and Deep Learning

Comment section

Similar articles

You can learn Computer Vision, Deep Learning, and OpenCV.

Footer

Topics

Books & Courses

PyImageSearch

Access the code to this tutorial and all other 500+ tutorials on PyImageSearch

What's included in PyImageSearch University?