Recognizing gestures and actions using OpenCV

Gesture recognition refers to processing digital images collected by a camera or any external device through gesture recognition algorithms. Users can use simple gestures to control or interact with the devices without physically touching them. Gestures can originate from any bodily motion but commonly originates from the face or hand.

These gesture recognition algorithms analyze the movements and actions performed by humans and predict if a certain gesture is performed completely or not.

Steps involved in building a gesture recognition algorithm:-

  • Loading the pre-trained models and their weights
  • Loading dataset
  • Detecting body points and Calculating pose estimation
  • Detecting the gestures

In this tutorial, we are using a pre-trained neural network and you can use the links provided below to download the model and its weights

After downloading these file upload them into your google drive to load them into you colab notebook.

MPII Human Pose Model

MPII is a state of art benchmark for evaluating human pose estimation. MPII proposes an approach that can jointly serve tasks of detection and pose estimation.

Unlike previous strategies that work on detecting people and estimating their pose, MPII performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints.

MPII applies partitions and formulates labeling for a set of body-part hypotheses generated with CNN-based part detectors. The image is passed through a set of convolution and max-pooling layers that reduce the dimensions of the image and provide more features of the image.

As we proceed layer-by-layer the number of feature detectors keeps on increasing while the dimensions of the image are reduced.

Image 152

The output layer for the MPII CNN has 15 neurons which represent the probabilities of all the body points in the images. Each neuron returns the list of probabilities of each body point.

Image 155

Importing Libraries and Loading pre-trained network

Since the model is having a pre-trained network and weights we can load them into your colab notebook from your google drive, but make sure your google drive has the required files that you have downloaded from the dataset provided.

To connect google drive to our colab notebook we need to enter the specific authorization key that is required to access the files from the colab file.

from google.colab import drive
Mounted at /content/drive
import cv2
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
import numpy as np
image = cv2.imread('/content/drive/MyDrive/Images/megan.jpg')
Image 156

After loading the image we can check the dimensions of the image using image.shape and it returns the value ( 337, 600, 3 ) which represents height, width, number of color channels ( BGR ).

Before loading the model we need to perform few operations on the dataset which we feed to the neural network. When the image is sent through series of convolutional layers there is a chance of false predictions on the image due to changes in illuminations on different objects of the image.

To remove such kinds of illuminations and pass a normalized image we use a function provided by the deep neural network known as blobFromImage.

blobFromImage() performs mean subtraction and image scaling.

  • Mean subtraction is useful to remove unpredicted illuminations from the image
  • Image scaling normalizes the pixels values between 0 – 1 ( dividing the pixel values by 255 )
image_blob = cv2.dnn.blobFromImage(image = image, scalefactor = 1.0 / 255,
                                   size = (image.shape[1], image.shape[0]))

Note:- blobFromImage() does not change the datatype of the dataset, but changes the format of the image. We can visualize it from image.shape method. The format of the dataset is the batch size, color spaces, height, width.

>>> (1, 3, 337, 600)

>>> numpy.ndarray

Caffe Deep Learning Framework

Caffe is a deep learning framework that is made for speed and modularity and is developed at the University of California, Berkeley. Its latest version Caffe2 is superior in deploying because it can run on any platform once coded.

To load the model and its weights we can use cv2.dnn.readNetFromCaffe() which takes weights and the network as input parameters.

network = cv2.dnn.readNetFromCaffe('/content/drive/MyDrive/pose_deploy_linevec_faster_4_stages.prototxt', 

Predicting Body Points

Once the network is loaded we can pass the image as input using network.setInput(). After sending the loaded image we need to pass the image through each and every layer of the network. To perform such an operation we use network.forward() and the returned value is stored in a variable.

output = network.forward()
>>> (1, 44, 43, 75)

The returned output is the batch size, probabilities of each of the body points, height, & width.

Let’s see the data inside the output variable. The total number of body points is 15. Output stores the confidence map of all the 15 body points and we can loop over the output variable to build a confidence_map. The item with index value 1 in the output variable returns the confidence or probabilities of all the body points.

num_points = 15
for i in range(num_points):
  confidence_map = output[0, i, :, :]
[[0.00055962 0.00058117 0.00053579 ... 0.00046494 0.00052144 0.00070126]
 [0.00056341 0.00061412 0.00052243 ... 0.00043261 0.00049986 0.00065831]
 [0.00053554 0.00056641 0.00048143 ... 0.00043169 0.00049813 0.00059231]
[[0.00112741 0.00127489 0.00119981 ... 0.00097799 0.00112679 0.0014676 ]
 [0.00114545 0.00124767 0.00108922 ... 0.0009252  0.00102618 0.00139268]
 [0.00108267 0.00113741 0.00100161 ... 0.00092294 0.00102551 0.00127125]
 [0.00111238 0.00109071 0.00112681 ... 0.00102942 0.0010519  0.00102278]
 [0.00113586 0.00110732 0.00117968 ... 0.001041   0.00109921 0.0010737 ]
 [0.00114552 0.00113262 0.00120307 ... 0.00106178 0.00113123 0.00113649]]

Once the values are retrieved they need to be categorized into minimum and maximum values. The maximum of each of the body points ( 0 – 14 ) will be having high confidence score and is considered as the actual body point. To perform such an operation OpenCV provides cv2.minMaxLoc().

cv2.minMaxLoc() is used to find the minimum, maximum values of an object. The confidence_map we have built in the previous section stores all the body points values and their locations.

min_confidence, max_confidence, min_point, max_point = cv2.minMaxLoc(confidence_map)
print('confidence: ', max_confidence)
print('point: ', max_point)
confidence:  0.7554607391357422
point:  (28, 3)

Before drawing all the body points at their respective points it’s recommended to set a threshold value such that the predicted values confidence would be considerably high. This is done to remove false predictions from the image.

#total number of body points
num_points = 15

#storing all the body points
points = []

threshold = 0.1
for i in range(num_points):
   # stores the location coordinates and the coordinates
  confidence_map = output[0, i, :, :]
  _, confidence, _, point = cv2.minMaxLoc(confidence_map)
  #normlaizing the points to match the dimensions of the image
  x = int((image.shape[1] * point[0]) / position_width)
  y = int((image.shape[0] * point[1]) / position_heigth)

  # eliminating body points that are less than threshold value
  if confidence > threshold:
    cv2.circle(image, (x, y), 2, (0, 0,0), thickness = -1)
    cv2.putText(image, '{}'.format(i), (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255))
[(224, 23), (248, 62), (232, 94), (208, 125), (176, 148), (280, 62), (320, 39), (360, 15), (288, 180), (320, 250),(320, 321),(312, 164), (336, 250), (296, 305), (280, 125)]

# by default matplotlib displays images in RGB mode
# so we need to convert the color from BGR to RGB
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB));
Image 163

To connect all the body points in the image we need a list of all the connections between the 15 points, the points to be connected are:-

  • Head and Neck ( 0–>1 )
  • Neck and right shoulder ( 1–>2 )
  • Right shoulder and Right elbow ( 2–>3 )
  • Right elbow and Right wrist ( 3–>4 )
  • Neck to Left shoulder ( 1–>5 )
  • Left shoulder to Left elbow ( 5–>6 )
  • Left elbow to Left wrist ( 6–>7 )
  • Neck to Chest ( 1–>14 )
  • Chest to Right hip ( 14–>8 )
  • Right hip to Right knee ( 8–>9 )
  • Right knee to Right ankle ( 9–>10 )
  • Chest to Left hip ( 14–>11 )
  • Left hip to Left knee ( 11–>12 )
  • Left knee to Left ankle ( 12–>13 )
point_connections = [[0,1], [1,2], [2,3], [3,4], [1,5], [5,6], [6,7],[1,14],
                     [14,8], [8,9], [9,10], [14,11], [11,12], [12,13]]

Joining all the connections and drawing a line between them

for connection in point_connections:
  partA = connection[0]
  partB = connection[1]
  #checking if the points do exist and drawing the line
  if points[partA] and points[partB]:
    cv2.line(image, points[partA], points[partB], (255,0,0))
Image 164

Detecting the Movements from the Image

To analyze the Movement and Gestures in the image we need the body points of hands, legs, elbows, etc., based on the requirement of what kind of gestures the program is predicting. Gestures are easy to analyze by performing basic mathematical calculations and arithmetic comparisons between the body points.

For example, to detect if a person in the image has raised the arms up, we need to compare the values of the wrist, elbow with the head level. In this scenario, we need to compare the values along the y-axis, since the movement of the wrist and elbow are along the y-axis.

Image 166

To understand easily, if the points of the wrist are less than the point of the head along the y-axis, it determines the arms are up. Because the values of points along the y-axis keep increasing as we go down.

# drawing the skeleton of the body

image2 = cv2.imread('/content/drive/MyDrive/Colab Notebooks/images/football.png')
image_blob2 = cv2.dnn.blobFromImage(image = image2, scalefactor = 1.0 / 255, size = (image2.shape[1], image2.shape[0]))
output2 = network.forward()
position_width = output2.shape[3]
position_height = output2.shape[2]
num_points = 15
points = []
threshold = 0.1
for i in range(num_points):
  confidence_map = output2[0, i, :, :]
  _, confidence, _, point = cv2.minMaxLoc(confidence_map) 
  x = int((image2.shape[1] * point[0]) / position_width)
  y = int((image2.shape[0] * point[1]) / position_height)
  if confidence > threshold:
    cv2.circle(image, (x, y), 2, (0, 0,0), thickness = -1)
    cv2.putText(image, '{}'.format(i), (x,y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255))
    points.append((x, y))

plt.figure(figsize = [14,10])
plt.imshow(cv2.cvtColor(image2, cv2.COLOR_BGR2RGB));
Image 170
for i in range(num_points):
  confidence_map = output2[0, i, :, :]
  _, confidence, _, point = cv2.minMaxLoc(confidence_map) 
  x = int((image2.shape[1] * point[0]) / position_width)
  y = int((image2.shape[0] * point[1]) / position_height)
  if confidence > threshold:
    cv2.circle(image2, (x, y), 3, (0,255,0), thickness = -1)
    cv2.putText(image2, "{}".format(i), (x, y), cv2.FONT_HERSHEY_SIMPLEX, .3, (0, 0, 255))
    cv2.putText(image2, '{}-{}'.format(point[0], point[1]), (x, y + 10), cv2.FONT_HERSHEY_SIMPLEX, .5, (0,0,255))
    points.append((x, y))

plt.figure(figsize = [14,10])
plt.imshow(cv2.cvtColor(image2, cv2.COLOR_BGR2RGB));
Image 169

Performing arithmetic comparison between the body points Right arm, left arm, and head. If the values on the y-axis for the arms are less than the head then it is considered the arms are up.

def verify_arms_up(points):
  head, right_wrist, left_wrist = 0, 0, 0
  for i, point in enumerate(points):
    if i == 0:
      head = point[1]
    elif i == 4:
      right_wrist = point[1]
    elif i == 7:
      left_wrist = point[1]
  #print(head, right_wrist, left_wrist)
  if right_wrist < head and left_wrist < head:
    return True
    return False
a = verify_arms_up(points)

# loading a cleaner image to remove all the previous drawing made
if a==True:
  cv2.putText(image2, "arms up", (10, 15), cv2.FONT_HERSHEY_SIMPLEX, .9, (0, 255, 255))
  cv2.putText(image2, "arms down", (10, 15), cv2.FONT_HERSHEY_SIMPLEX, .9, (0, 255, 255))

plt.figure(figsize = [14,10])

plt.imshow(cv2.cvtColor(image2, cv2.COLOR_BGR2RGB));
Image 171