Neural Networks for Human Expression classification in OpenCV

Face expression recognition is software technology that involves the computer to read the biometric data regarding the face and detect the emotions in the face. The emotion detection process involves detecting the exact location of the faces in the image and then classifying the emotions of the faces detected.

This is important as the systems can detect and adapt their response and behavioral patterns according to the emotions of the humans and make the interactions more natural. Emotion detection has many applications in the field of Computer Vision.

  • Automobile Industry:- Analyzing the face of the driver whether he/she is tired or, sleepy and notifying to take a break.
  • Hospitals:- Helps doctors to detect how much pain a patient is feeling
  • Online Interviews:- Employee morale can be perceived using this technology by holding and recording interaction on the job. As an HR tool it can help not only in devising recruiting strategies but also in designing HR policies.
  • Testing video games:- Video games are designed particularly to target specific audience. When testing the video games when users play it, the facial emotions are recorded and anlyzed which makes designers to detect at what points different emotions ae experienced.

It is widely supported by the scientific community that there are 7 basic or universal emotions and they are:-

  • Angry
  • Disgust
  • Fear
  • Happy
  • Neutral
  • Sad
  • Surprise

Importing the Libraries

import cv2
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab.patches import cv2_imshow
import zipfile
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten, BatchNormalization

Loading the Dataset

You can download the dataset fer_images.zip from the link provided. After downloading upload it to your google drive to access it from your collab notebook.

You need to mount your google drive to the collab notebook you are working on. It can be done easily by typing the command drive.mount('/content/drive').

After successful connection of drive copy the path of the zip file you have stored in your drive and extract it using zip_object.extractall(‘/’).

Generate Train and Test Dataset

We can create the train and test set for the model by loading the data from the respective directories using a dedicated generator for each of them. But the first process involved is to develop an image generator.

# We have previously imported ImageDataGenerator so you dont have to load it again
#from tensorflow.keras.preprocessing.image import ImageDataGenerator
training_generator = ImageDataGenerator(rescale=1./255,
train_dataset = training_generator.flow_from_directory('/content/fer2013/train',
                                                        target_size = (48, 48),
                                                        batch_size = 16,
                                                        class_mode = 'categorical',
                                                        shuffle = True)

We can get the classes involved in the training dataset using train_dataset.class_indices

{'Angry': 0,
 'Disgust': 1,
 'Fear': 2,
 'Happy': 3,
 'Neutral': 4,
 'Sad': 5,
 'Surprise': 6}

To visualize the number of images present in each class of the train_dataset we can use sns.countplot()

sns.countplot(x = train_dataset.classes)
Image 123
test_generator = ImageDataGenerator(rescale=1./255)
test_dataset = test_generator.flow_from_directory('/content/fer2013/validation',
                                                  target_size = (48, 48),
                                                  batch_size = 1,
                                                  class_mode = 'categorical',
                                                  shuffle = False)

Now the Train and Test sets are perfectly loaded we can proceed with building the neural network.

Building and Training the Convolutional Neural Network

Before Building the Neural Network you need to understand the process of how a neural network works and what are the layers are involved, and few other terms such as Feature detector, feature map, max-pooling et.,. you can go get a quick glimpse about these terms from our previous tutorials on Basics of Neural Networks and CNN for Image Classification.

To provide a quick recap of how a Convolutional layer is represented is shown below

  • Image + multiple feature detectors —-> feature maps
  • feature maps + padding (we can apply parameters valid or same) —–> max-pool
  • max-pool —-> flattening
  • flattening —-> input layer

For Expression recognition, we need to add few more convolutional layers since the network needs to extract only the most important pixels from the image to reduce the number of neurons in the input layer which can indirectly affect the neuron complexity while adding hidden layers for the network.

To overcome this situation we initially add repeated convolutional layers with increasing feature maps in each layer. Also, the kernel size is set to (3, 3) with activation function relu for each feature map.

For every layer we have built, we apply batch normalization. BatchNormalization() applies transformation that maintains the mean output close to ‘0’ and output standard deviation close to ‘1’.

network = Sequential()
network.add(Conv2D(32, (3,3), activation='relu', padding = 'same', input_shape = (48, 48, 3)))

As we keep adding convolution layers to avoid overfitting of the neuron in the network we can apply the Dropout() function. In our example, we remove 20% of the neurons. This reduces the network complexity.

# dropout layer

We can keep adding the convolutional layers to achieve a better accuracy rate, once we get the max-pooling matrix we can apply the flattening layer. To add a hidden layer we use the network.add(dense()), also applying the batch normalization and dropout remains the same for each hidden layer we add.

Finally while adding the output layer we need to provide the softmax function. A softmax function returns a list of scores of each image as to how close the image is matched to a particular class. The maximum of the scores list for a particular image determines t which class the image belongs to. For this purpose, we define 7 outputs on the output layer.

network = Sequential()

#values of parameters
num_detectors = 32
num_classes = 7
width, height = 48, 48

network.add(Conv2D(num_detectors, (3,3), activation='relu', padding = 'same', input_shape = (width, height, 3)))
network.add(Conv2D(num_detectors, (3,3), activation='relu', padding = 'same'))

network.add(Conv2D(2*num_detectors, (3,3), activation='relu', padding = 'same'))
network.add(Conv2D(2*num_detectors, (3,3), activation='relu', padding = 'same'))

network.add(Conv2D(2*2*num_detectors, (3,3), activation='relu', padding = 'same'))
network.add(Conv2D(2*2*num_detectors, (3,3), activation='relu', padding = 'same'))

network.add(Conv2D(2*2*2*num_detectors, (3,3), activation='relu', padding = 'same'))
network.add(Conv2D(2*2*2*num_detectors, (3,3), activation='relu', padding = 'same'))


network.add(Dense(2 * num_detectors, activation='relu'))

network.add(Dense(2 * num_detectors, activation='relu'))

network.add(Dense(num_classes, activation='softmax'))
Model: "sequential_1"
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 48, 48, 32)        896       
batch_normalization_10 (Batc (None, 48, 48, 32)        128       
conv2d_9 (Conv2D)            (None, 48, 48, 32)        9248      
batch_normalization_11 (Batc (None, 48, 48, 32)        128       
max_pooling2d_4 (MaxPooling2 (None, 24, 24, 32)        0         
dropout_10 (Dropout)         (None, 64)                0         
dense_4 (Dense)              (None, 64)                4160      
batch_normalization_19 (Batc (None, 64)                256       
dropout_11 (Dropout)         (None, 64)                0         
dense_5 (Dense)              (None, 7)                 455       
Total params: 1,328,743
Trainable params: 1,326,567
Non-trainable params: 2,176

Once our neural network is completely built we can compile the network and run it to train the model. If a model runs for 5 times from start to end it is considered as 5 epochs. Over here we need to run our model 70 times to get the required accuracy rate.

network.compile(optimizer='Adam', loss='categorical_crossentropy', metrics=['accuracy'])
network.fit(train_dataset, epochs=70)

Based on the Train result we can change the epochs. With a simple architecture and having epochs of 70, it takes quite a bit of time to train the model.

Epoch 1/70
1795/1795 [==============================] - 38s 20ms/step - loss: 2.0256 - accuracy: 0.2246
Epoch 2/70
1795/1795 [==============================] - 37s 21ms/step - loss: 1.5614 - accuracy: 0.3895
Epoch 3/70
1795/1795 [==============================] - 37s 20ms/step - loss: 1.4009 - accuracy: 0.4619
Epoch 34/70
1795/1795 [==============================] - 37s 21ms/step - loss: 0.8696 - accuracy: 0.6851
Epoch 35/70
1795/1795 [==============================] - 37s 21ms/step - loss: 0.8498 - accuracy: 0.6913
Epoch 36/70
1795/1795 [==============================] - 37s 21ms/step - loss: 0.8514 - accuracy: 0.6937
Epoch 69/70
1795/1795 [==============================] - 38s 21ms/step - loss: 0.6768 - accuracy: 0.7601
Epoch 70/70
1795/1795 [==============================] - 38s 21ms/step - loss: 0.6884 - accuracy: 0.7520
<tensorflow.python.keras.callbacks.History at 0x7f0db020b550>

Evaluating the Network

Once the network is trained it can be saved and loaded for using it with a different dataset. To save the network you can check out our previous tutorial saving and loading neural network since we have done the save and load operations of a network at the end of the tutorial.

Predicting the test_dataset and determining the list of class scores of each image, and also checking the classification report between test_dataset and predictions.

predictions = network_loaded.predict(test_dataset)
array([[9.1399026e-01, 5.2431659e-03, 2.0065000e-02, ..., 7.5534433e-03,
        3.9156139e-02, 1.2308680e-02],
       [8.3489549e-01, 1.3114771e-04, 1.6049346e-01, ..., 8.8334584e-04,
        3.4225811e-03, 1.4467876e-07],
       [9.6874458e-01, 7.4295443e-04, 1.7095286e-02, ..., 1.2566445e-03,
        7.5712660e-03, 4.9268949e-04],
       [2.6819399e-02, 2.2662610e-03, 1.1300583e-01, ..., 4.7751674e-03,
        7.1213390e-03, 8.4558320e-01]], dtype=float32)
from sklearn.metrics import accuracy_score
accuracy_score(test_dataset.classes, predictions)
# indicates that our model predicts 57% of images and their expressions correctly

To visualize the classification we can draw a heatmap of the confusion matrix. A confusion matrix is used to determine the performance of the model on the given test data. Whereas the heatmap of the confusion matrix gives the prediction of the number of images that lie in each class.

from sklearn.metrics import confusion_matrix
cmap = confusion_matrix(test_dataset.classes, predictions)
sns.heatmap(cm, annot=True)
array([[276,   8,  53,  14,  64,  71,   5],
       [ 13,  36,   2,   1,   2,   1,   0],
       [ 51,   3, 244,  19,  78,  91,  42],
       [ 10,   2,  16, 775,  46,  18,  12],
       [ 87,  11,  84, 160, 115, 111,  58],
       [ 44,   1,  53,  25, 146, 319,   6],
       [  4,   3,  56,  25,  14,   5, 309]])
Image 126

To get a clear report of how the network has classified the images into different classes we use a classification report. It’s very interesting to go through the values.

  • Support determines the number of images the models has used of a particular class to test them
  • Recall determines the number of images it has identifies them as a particular class and categoried them
  • Precision determines the number times/images the model classification made by the model is correct
from sklearn.metrics import classification_report
print(classification_report(test_dataset.classes, predictions))
              precision    recall  f1-score   support

           0       0.57      0.56      0.57       491
           1       0.56      0.65      0.61        55
           2       0.48      0.46      0.47       528
           3       0.76      0.88      0.82       879
           4       0.25      0.18      0.21       626
           5       0.52      0.54      0.53       594
           6       0.72      0.74      0.73       416

    accuracy                           0.58      3589
   macro avg       0.55      0.57      0.56      3589
weighted avg       0.56      0.58      0.56      3589

Detecting Expressions of Multiple faces in a image

image = cv2.imread('/content/drive/MyDrive/Cursos - recursos/Computer Vision Masterclass/Images/faces_emotions.png')
Image 127
faces = face_detector.detectMultiScale(image)
emotions = ['Angry', 'Disgust', 'Fear', 'Happy', 'Neutral', 'Sad', 'Surprise']
for (x, y, w, h) in faces:
  cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 1)
  roi = image[y:y + h, x:x + w]
  roi = cv2.resize(roi, (48, 48))
  roi = roi / 255
  roi = np.expand_dims(roi, axis = 0)
  prediction = network_loaded.predict(roi)
  cv2.putText(image, emotions[np.argmax(prediction)], (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,255,0), 2, cv2.LINE_AA)
Image 128