We humans can detect images, and the objects that are present in the image. But when it comes to computers, an image is just a representation of 0’s and 1’s.
Face Detection is an Artificial Intelligence-based computer technology that is used to detect faces from digital images. When we feed a digital image face detection uses machine learning algorithms to extract faces from images. The key applications of face detections include face tracking and detecting, face analysis, security access, etc.,
Face detection is the first step towards many face-related applications. In this section, we go through different face detection methods. The different techniques are
- Face detection using Haar Cascades
- Face detection using HOG and Dlib
- Face detection with CNN and Dlib
We detect faces using all these pre-built models and compare them with each other by evaluating certain factors such as a number of positive and negative detections, performance, ease of implementation.
What is an Image ?
An image is a combination of pixels that have different intensity values, stored in an array of matrices. Each pixel has some intensity value based on the color of the pixel. Image classification models detect objects in the image by calculating the change in pixel values.
We can get the dimensions of the image such as height, width, and color channels of the image using image.shape
. Let’s compare the shape of the image of the BGR image and the Gray image.
import cv2
img = cv2.imread('C:\images\publicplace.jpg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(img.shape, gray_img.shape)
Output:- (427, 640, 3) , (427, 640)
While performing any operation on the image we handle almost 273,280 pixels, each having 3 different color codes. So the amount of data regarding the pixels is more in the BGR channel when compared to the Gray channel.
When it comes to detecting faces by using the pre-trained models it’s recommended to pass more valuable data rather than sending all the data regarding the pixels in the image, so that it has less load on the computation power. Thus we use the gray-colored images for detecting the faces.
Face Detection using Haar Cascades
Haar Cascade is an object detection algorithm used for face detection in digital images or live video. In 2001 viola and Jones proposed a research paper “Rapid Object Detection using a Boosted Cascade of Simple” that describes the machine learning approach for visual objects detection capable of processing images extremely rapidly and achieving high detection rates.
To detect faces using Haar Cascade we need to install python or anaconda and OpenCV, Matplotlib. Once installed, you can download the Haar Cascade algorithm XML file by visiting the link provided.
https://github.com/opencv/opencv/tree/master/data/haarcascades
Once the XML file is downloaded or copied into a local disk, you can load it into your program using cv2.CascadeClassifier()
.
import cv2
face_detect = cv2.CascadeClassifier('C:\Users\face_detect.xml')
Once the Cascade Classifier is loaded we can detect the face using, cv2.detectMultiScale()
and it returns the coordinates of the faces in an array of matrices. Using these detected points we can draw a rectangle or bounding boxes around the faces using the coordinates
- x ( coordinate along x-axis )
- y ( coordinate along y-axis )
- w (width of the face)
- h (height of the face)
import cv2
face_detect = cv2.CascadeClassifier('C:\Users\face_detect.xml')
detections = face_detect.detectMultiScale(gray_img, scaleFactor = 1.2)
Output:-
[[ 76 96 78 78]
[268 81 89 89]]
import cv2
image = cv2.imread('C:\images\meet.jpg')
gray_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
#load XML Cascade classifier
face_detect = cv2.CascadeClassifier('C:\Users\face_detect.xml')
# detect and store the coordinates of the faces in the image
detections = face_detect.detectMultiScale(gray_img)
#draw bounding boxes on the faces using x, y, w, h
#x+w gives end coordinate of x
#y+h gives end coordinate of y
for (x, y, w, h) in detections:
cv2.rectangle(img, (x, y),(x+w, y+h), (0, 255, 255), 2)
cv2.imshow("image", img)
Output:-
Parameters in Haar Cascades
Parameters in Haar Cascades allow us to eliminate false detections that are detected by the model, so the parameters we can pass for achieving higher detection rates are:-
detectMultiScale(gray_image, scaleFactor, minsize, maxsize, minNeighbours, maxNeighbours)
- scaleFactor :- By default scale factor is set to 1.1 in the cascade model, but when it comes to images having faces of different sizes( big faces and small faces ) the cascade model detects the faces only in the size it has been trained on, thus other faces might not be detected because of over sized or under sized. To train model to detect other sizes of face we can apply scaling factor. the bigger the scaliing factor the bigger sized faces it can detect.
- minsize :- Sometimes model detects small points as faces, by assigning minsize for detecting we eliminate the false predictions that might.
- maxsize :- maxsize keeps maximum limit for the bounding box to be drawn, faces more than maxsize willnot be considered and will be filtered out.
- minNeighbours :- It’s a parameter specifying how many minimum neighbour rectangles a detection should have inorder to retain it.
- maxNeighbours :- This parameter specify maximum neighbour rectangles a detection should can have inorder to retain it.
Face detection using HOG and dlib
Dlib is an open-source C++ library containing different machine learning algorithms and tools for creating complex software and solving real-world problems. It is used in both industry and academia in a wide range of domains including detection, classification, clustering, regression, and many others.
Using dlib we can detect faces at a higher accuracy level than compared to Haar Cascades, without converting the image into a grayscale image. dlib using Histogram of gradients with a linear classifier detects the change in pixel values using gradient technique. It calculates the magnitude of change and the direction of change in intensity of pixels and then plots the detection of the human face.
We can either install dlib from pip command or else directly import it in google collab since google collab provides a python IDE with pre-installed libraries that run on the cloud.
Installing dlib using pip command
To install dlib using pip command first we need to install CMake. CMake can be installed using the following command,
pip install cmake
Once the CMake is installed successfully since the path is not included you need to set the path of the CMake directory in the system variable which comes under the environment variable.
Once the path is set you can access the CMake from the base directory in the command prompt, because to install dlib need access to CMake from the command prompt.
To install dlib you can type the following command, and once installed you can directly import it into your program.
pip install dlib
Importing dlib in Google Collab
You can directly import dlib in google collab because it offers pre-installed libraries in the cloud that are ready to be imported into your programs also it provides a Jupyter notebook which you can connect to your drive to access all the datasets.
import dlib
After importing the dlib library you can store the face detector using dlib.get_frontal_face_detector().
import cv2
import dlib
face_detect = dlib.get_frontal_face_detector()
Detecting Faces
In this example, we will be using Google Collab for loading libraries and detecting images. After storing the dlib detector in a variable we can pass an image into the variable with the scaling factor based on the sizes of the images.
While using Collab we need to connect our notebook with the dataset that’s been already stored in the same google account’s drive.
from google.colab import drive
drive.mount('/content/drive')
detections is a list of coordinates of the faces that are detected using the dlib and hog method, and the coordinates are detections.left()
, detections.top()
, detections.right()
, detections.bottom()
.
import cv2
import dlib
#in collab cv2.imshow is not defined so need to be imported
from google.colab.patches import cv2_imshow
image = cv2.imread('/content/drive/MyDrive/Colab Notebooks/images/people-faces.jpg')
face_detect = dlib.get_frontal_face_detector()
#we can pass scaling factor based on size of faces in image
detections = face_detect(image,4)
#detections has a list of [left, top, right, bottom]
for i in detections:
cv2.rectangle(image, (i.left(), i.top()), (i.right(), i.bottom()), (0, 255, 255), 2)
cv2_imshow(image)
Output:- Based on the scaling factor we pass into the detector the number of faces detected varies, and almost all the parameters that we pass are the same as in Haar Cascades.
Face detection with CNN and Dlib
Face detection using CNN classifier with the Dlib library is the most efficient and trending classifier to detect human faces. For this classification, you need to download and extract the CNN classifier from mmod_human_face_detector.dat and store it in the drive.
Also since this a classifier you need a CUDA-capable device to run the notebook. For changing a Hardware accelerator type you can change it over, view resources> Change runtime type section, and set the accelerator to GPU, and click save.
After making appropriate settings you need to load the classifier into your notebook once the dive is mounted, using dlib.cnn_face_detection_model_v1()
import cv2
import dlib
cnn_detect = dlib.cnn_face_detection_model_v1('/content/drive/MyDrive/weights/mmod_human_face_detector (1).dat')
detections = cnn_detect(image,3)
In detection using CNN and dlib, it returns the coordinates of the rectangles such as face.rect.left(), face.rect.top(), face.rect.right(), face.rect.bottom() and the Confidence score.
A Confidence Score is a threshold that determines what the lowest matching score acceptable to trigger an interaction is. If the matching score falls below the confidence score, the bot will trigger fallback interaction, an interaction that asks the user to repeat the query.
We can loop over the list of detections and print the confidence score and draw the rectangle using the following code snippet.
for i in detections:
print(i.confidence)
cv2.rectangle(image, (i.rect.left(), i.rect.top()), (i.rect.right(), i.rect.bottom()), (0, 255, 255), 2)
Output:-
1.0813664197921753
1.0796440839767456
1.0767256021499634
1.062295913696289
1.0565769672393799
1.0129503011703491
0.999666690826416
0.754454493522644
0.7508873343467712
0.7199753522872925
0.13065484166145325
0.09079638123512268
0.016606777906417847
import cv2
import dlib
from google.colab.patches import cv2_imshow
image = cv2.imread('/content/drive/MyDrive/Colab Notebooks/images/people-faces.jpg')
cnn_detect = dlib.cnn_face_detection_model_v1('/content/drive/MyDrive/weights/mmod_human_face_detector.dat')
#passing the scaling factor '2' for detecting all faces
detections = cnn_detect(image,2)
for face in detections:
print(i.confidence)
cv2.rectangle(image, (face.rect.left(), face.rect.top()), (face.rect.right(), face.rect.bottom()), (0, 255, 255), 2)
cv2_imshow(image)
Output:- The Classifier has detected more faces than compared to HOG and Haar Cascades with an average scaling size, thus the CNN classifier with dlib library is providing more efficiency, with fewer parameters.
Haar Cascade vs HOG vs CNN
- Haar Cascade is easy to implement but when it comes to face detection it predicts so many false detections, and need to pass so many parameters for improving the performance of the model and have to trained on many face and non-face images.
- Histogram of Gradients (HOG) is a model that detects faces using dlib library. The predictions of this model are pretty accurate when compared to the Haar Cascades. It extracts features into a vector and feed it into classification algorithm, with less parameters we can detect more faces from the images.
- Convolutional Neural Networks ( CNN ) using dlib is the most popular neural network in machine learning. The main advantage of CNN is it’s ability to detect the features of the image easily and accurately without much of human supervision.
Face detection from live feed
Detecting faces from a live feed is a combination of extracting the frames of the video and feeding the frames to the model one by one. The program checks for the frames being returned at any point if there is any failure in reading the frame it immediately terminates the process.
import cv2
#reading live video from the webcam
cap = cv2.VideoCapture(0)
faces = cv2.CascadeClassifier('C:\Users\face_detect.xml')
while cap.isOpened():
ret, frame = cap.read()
# if no frame is returned else block gets executed
if ret == True:
gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
detections = faces.detectMultiScale(gray_frame, scaleFactor = 1.1)
for (x, y, w, h) in detections:
cv2.rectangle(frame, (x, y),(x+w, y+h), (0, 255, 255), 2)
cv2.imshow("face_detect", frame)
if cv2.waitKey(1) & 0XFF==ord('q'):
break
else:
break
cap.release()
cv2.destroyAllWindows()
Output:-