×

Convolutional Autoencoders | OpenCV

Autoencoders are a type of neural network in deep learning that comes under the category of unsupervised learning. Autoencoders can be used to learn from the compressed representation of the raw data.

Autoencoders consists of two blocks, that is encoding and decoding. The raw image is converted into an encoded format and the model decodes the data into an output image. The Encoding part of the model helps to learn hidden features present in the image, using this data model adds weights such that it can create an image that is almost similar to the input this way noise cancellation is removed from images.

Variational Autoencoders are Beautiful | Blogs

Applications of Autoencoders

  • Noise Cancenllation:- When it comes to performing object detection or image classification on images with noise the accuracy rate might be very less because of false predictions. To remove noise and get clean images we use autoencoders.
  • Image Compression:- Handling high resolutions images takes more memory and increases processing time, to reduce the image size and extract all the features of the image autoencoders are used.
  • Fraud Detection:- Credit Card fraud detection involves training the model on all the legitimate transactions and comparing these transactions with other transactions to predict fraudulence.
  • Dimensionality reduction:- Reduce image dimensions similar to Principal Component Analysis.

Import Libraries

import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Input
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Reshape, Flatten

Loading MNIST dataset

We are using the Fashion MNIST dataset that has 10 different fashion categories with 60000 train images and 10000 test images with dimensions of 28X28. To directly load the dataset into the network we can use the following command

from tensorflow.keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train stores the training dataset and y_train stores the number of images and their classes. Similarly, X_test stores the test dataset, and y_test stores the number of images and their classes.

Train Dataset:-

X_train.shape, y_train.shape
((60000, 28, 28), (60000,))

Test Dataset:-

X_test.shape, y_test.shape
((10000, 28, 28), (10000,))

Visualizing the Images

Since the dataset consists of 60,000 train and 10,000 test images it’s technically impossible to go through each image and visualize them as it takes more amount of time. To overcome this situation we can randomly pick 100 images from the dataset and visualize them using matlplotlib.

To display 100 random images from the dataset we use plt.subplots(), and pass the dimensions of the plot such as a number of rows and columns. Since the plot is made up of 100 axes we can assign image data to each of the axes to represent them.

To represent images in axes we need the data to be converted from matrix form to flattened array, for this purpose ravel() is used

classes = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
width = 10
height = 10

fig, axes = plt.subplots(height, width, figsize = (15,15))
axes = axes.ravel()
for i in np.arange(0, width * height):
    index = np.random.randint(0, 60000)
    axes[i].imshow(X_train[index], cmap = 'gray' )
    axes[i].set_title(classes[y_train[index]], fontsize = 8)
    axes[i].axis('off')

plt.subplots_adjust(hspace=0.4)
Image 145

Pre-processing the Images

Every image is made up of pixels that range between 0 – 255. While performing operations on those pixels it becomes complicated to handle such values, so instead we can normalize all the pixels values to be between 0 – 1 by dividing each pixel’s value by 255.

X_train = X_train / 255
X_test = X_test / 255

In linear autoencoders, we directly flatten the images in pixels format itself. But when it comes to Convolutional Autoencoders image flattening is performed on a matrix of pixels, thus we need grayscale images and pass them in order to reduce the computational load.

Since the dataset itself is in grayscale we don’t need to perform any operations except that mentioning the datatype of the images.

X_train = X_train.reshape((len(X_train), 28, 28, 1))
X_test = X_test.reshape((len(X_test), 28, 28, 1))
X_train.shape, X_test.shape
((60000, 28, 28, 1), (10000, 28, 28, 1))

Building and Training the Convolutional Autoencoder

To understand how a convolutional layer is represented you can go through our previous tutorial on Convolutional Neural Networks for Image Classification.

Tensorflow supports two types of padding.

  • VALID returns output such that there is no padding applied. “Valid option discards the border elements”.
  • SAME returns output whose value can be computed by applying the same filter on all the values. Border elements are not discarded in SAME padding

Building an autoencoder involves converting the image into the encoded format and then reconstructing the image in the decode phase. Since we are using Convolutional networks it involves Conv2D() for encoding and reshape() for decoding.

autoencoder = Sequential()

# Encoder
autoencoder.add(Conv2D(filters = 16, kernel_size=(3,3), activation='relu', padding='valid', input_shape=(28,28,1)))
autoencoder.add(MaxPooling2D(pool_size=(2,2)))

autoencoder.add(Conv2D(filters = 8, kernel_size=(3,3), activation='relu', padding='same'))
autoencoder.add(MaxPooling2D(pool_size=(2,2), padding='same'))

autoencoder.add(Conv2D(filters=8, kernel_size=(3,3), activation='relu', padding = 'same', strides=(2,2)))
autoencoder.add(Flatten())

# Decoder

autoencoder.add(Reshape((4,4,8)))

autoencoder.add(Conv2D(filters = 8, kernel_size=(3,3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D(size=(2,2)))

autoencoder.add(Conv2D(filters = 8, kernel_size=(3,3), activation='relu', padding='same'))
autoencoder.add(UpSampling2D(size=(2,2)))

autoencoder.add(Conv2D(filters = 16, kernel_size=(3,3), activation='relu'))
autoencoder.add(UpSampling2D(size=(2,2)))

autoencoder.add(Conv2D(filters = 1, kernel_size=(3,3), activation='sigmoid', padding='same'))
autoencoder.summary()
Model: "sequential_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_55 (Conv2D)           (None, 26, 26, 16)        160       
_________________________________________________________________
max_pooling2d_32 (MaxPooling (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_56 (Conv2D)           (None, 13, 13, 8)         1160      
_________________________________________________________________
max_pooling2d_33 (MaxPooling (None, 7, 7, 8)           0         
_________________________________________________________________
conv2d_57 (Conv2D)           (None, 4, 4, 8)           584       
_________________________________________________________________
flatten_10 (Flatten)         (None, 128)               0         
_________________________________________________________________
reshape_7 (Reshape)          (None, 4, 4, 8)           0         
_________________________________________________________________
conv2d_58 (Conv2D)           (None, 4, 4, 8)           584       
_________________________________________________________________
up_sampling2d_11 (UpSampling (None, 8, 8, 8)           0         
_________________________________________________________________
conv2d_59 (Conv2D)           (None, 8, 8, 8)           584       
_________________________________________________________________
up_sampling2d_12 (UpSampling (None, 16, 16, 8)         0         
_________________________________________________________________
conv2d_60 (Conv2D)           (None, 14, 14, 16)        1168      
_________________________________________________________________
up_sampling2d_13 (UpSampling (None, 28, 28, 16)        0         
_________________________________________________________________
conv2d_61 (Conv2D)           (None, 28, 28, 1)         145       
=================================================================
Total params: 4,385
Trainable params: 4,385
Non-trainable params: 0
_________________________________________________________________

Once the network is built, the next step is compiling and training the model. Since the model learns itself to adjust the weights we use ‘adam optimizer’ instead of ‘stochastic gradient descent.

autoencoder.compile(optimizer='Adam', loss='binary_crossentropy', metrics = ['accuracy'])
autoencoder.fit(X_train, X_train, epochs = 50)

The training accuracy will not be 99% or 100% since the job of the autoencoder is to remove noise from images and build an image that’s learned from the encoded image.

Epoch 48/50
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2684 - accuracy: 0.5097
Epoch 49/50
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2686 - accuracy: 0.5087
Epoch 50/50
1875/1875 [==============================] - 6s 3ms/step - loss: 0.2680 - accuracy: 0.5096
<tensorflow.python.keras.callbacks.History at 0x7f78ca5dacc0>

Get Encoded images

Since the middle layer stores the encoded images, we are going to extract the output of that layer. Neural Networks provide feasibility to get the output of each and every layer using .get_layer() function. We can pass the layer name directly to the function.

encoder = Model(inputs = autoencoder.input, outputs = autoencoder.get_layer('flatten_10').output)
encoder.summary()
Model: "model_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_55_input (InputLayer) [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_55 (Conv2D)           (None, 26, 26, 16)        160       
_________________________________________________________________
max_pooling2d_32 (MaxPooling (None, 13, 13, 16)        0         
_________________________________________________________________
conv2d_56 (Conv2D)           (None, 13, 13, 8)         1160      
_________________________________________________________________
max_pooling2d_33 (MaxPooling (None, 7, 7, 8)           0         
_________________________________________________________________
conv2d_57 (Conv2D)           (None, 4, 4, 8)           584       
_________________________________________________________________
flatten_10 (Flatten)         (None, 128)               0         
=================================================================
Total params: 1,904
Trainable params: 1,904
Non-trainable params: 0
_________________________________________________________________
#storing encoded images from middle layer
coded_test_images = encoder.predict(X_test)

#storing decoded images from output layer
decoded_test_images = autoencoder.predict(X_test)

To better understand how an autoencoder works, we need to visualize the input dataset and output dataset along with the encoded format of images.

n_images = 10
test_images = np.random.randint(0, X_test.shape[0], size = n_images)
plt.figure(figsize=(18,18))
for i, image_index in enumerate(test_images):
  # Original images
  ax = plt.subplot(10,10, i + 1)
  plt.imshow(X_test[image_index].reshape(28,28), cmap='gray')
  plt.xticks(())
  plt.yticks(())

  # Coded images
  ax = plt.subplot(10,10, i + 1 + n_images)
  plt.imshow(coded_test_images[image_index].reshape(16,8), cmap='gray')
  plt.xticks(())
  plt.yticks(())

  # Decoded images
  ax = plt.subplot(10,10, i + 1 + n_images * 2)
  plt.imshow(decoded_test_images[image_index].reshape(28,28), cmap='gray')
  plt.xticks(())
  plt.yticks(())

The first row represents the test_images and the second row is the encoded format of the test_images, the third row represents the decoded format with removed noises in the images.

Image 147