myxzlpltk

Posted on Apr 15, 2022

Face Mask Detection With ResNet50 and SVM + Decision Tree

#machinelearning #computervision #resnet

Welcome, this post is a quick explanation on how I build mask detection using ResNet50 as feature extractor and then use Support Vector Machine (SVM) + Decision Tree with stacking ensemble method as classifier.

As tribute to fellow researcher, this app was based on research paper with title "A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic" written by Mohamed Loey, et al.

Table of contents:

Dataset Retrieval
Preprocessing
Feature Extraction
Split Dataset
Define Model Classifier
Tuning Model
Create Final Model
Deploy Real App

Dataset Retrieval

This application uses a dataset from Kaggle. This dataset contains 853 images belonging to the 3 classes, as well as their bounding boxes in the PASCAL VOC format. The classes are with_mask, without_mask, and mask_weared_incorrect. For some reason, I only use the with_mask and without_mask labels. Check out this image sample below.

You can access this dataset via this url below.
https://www.kaggle.com/datasets/andrewmvd/face-mask-detection

Preprocessing

Preprocessing can be achieved by cropping face area based on bounding box information. First, read all xml file and image file from dataset folder.

import os

img_names = []
xml_names = []
for dirname, _, filenames in os.walk('./face-mask-detection'):
  for filename in filenames:
    if os.path.join(dirname, filename)[-3:] != "xml":
      img_names.append(filename)
    else:
      xml_names.append(filename)

print(len(img_names), "images")

Then crop all images by its bounding box and read the label.

import xmltodict
from matplotlib import pyplot as plt
from skimage.io import imread

path_annotations = "face-mask-detection/annotations/"
path_images = "face-mask-detection/images/"

class_names = ['with_mask', 'without_mask']
images = []
target = []

def crop_bounding_box(img, bnd):
  x1, y1, x2, y2 = list(map(int, bnd.values()))
  _img = img.copy()
  _img = _img[y1:y2, x1:x2]
  _img = _img[:,:,:3]
  return _img

for img_name in img_names[:]:
  with open(path_annotations+img_name[:-4]+".xml") as fd:
    doc = xmltodict.parse(fd.read())

  img = imread(path_images+img_name)
  temp = doc["annotation"]["object"]
  if type(temp) == list:
    for i in range(len(temp)):
      if temp[i]["name"] not in class_names:
        continue
      images.append(crop_bounding_box(img, temp[i]["bndbox"]))
      target.append(temp[i]["name"])
  else:
    if temp["name"] not in class_names:
        continue
    images.append(crop_bounding_box(img, temp["bndbox"]))
    target.append(temp["name"])

Based on labels, this dataset consists of 3232 with mask faces and 717 without mask faces.

This preprocessing also contains resize and normalization steps for ImageNet.

import torch

from torchvision import transforms

# Define preprocessing
preprocess = transforms.Compose([
  transforms.ToPILImage(),
  transforms.Resize((128, 128)),
  transforms.ToTensor(),
  transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])

# Apply preprocess
image_tensor = torch.stack([preprocess(image) for image in images])
image_tensor.shape

Feature Extraction

Feature extraction is needed to gather information from images using spatial operations to extract something that represents a label. In this application, I use ResNet50 as a feature extractor. The last layer of ResNet, which is a fully connected layer with 1.000 neurons, needs to be deleted.

from torchvision import models

# Download model
resnet = models.resnet50(pretrained=True)
resnet = torch.nn.Sequential(*(list(resnet.children())[:-1]))

To freeze and keep the convolutional part of ResNet50 fixed, I need to set requires_grad to False.

for param in resnet.parameters():
    param.requires_grad = False

I also need to call eval() to set ResNet50's batch normalization to disabled. Which will interfere with model accuracy and make sure ResNet50 only acts as a feature extractor.

resnet.eval()

Last step apply ResNet50 to extract feature. Then ResNet will return a vector with 2048 features for each image.

import numpy as np

result = np.empty((len(image_tensor), 2048))
for i, data in enumerate(image_tensor):
  output = resnet(data.unsqueeze(0))
  output = torch.flatten(output, 1)
  result[i] = output[0].numpy()

Split Dataset

To prevent the model from overfitting, I needed to split the data into 70% train data and 30% test data. Train data will be used to train the model and test data will be used to test or validate the model.

from sklearn.model_selection import train_test_split

X, y = result, np.array(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print("Training data\n", np.asarray(np.unique(y_train, return_counts=True)).T)
print("Test data\n", np.asarray(np.unique(y_test, return_counts=True)).T)

Define Model Classifier

As I have teased before, the proposed model is a stacking classifier (ensemble method) that will use SVM and decision tree as weak learners. Logistic regression will be the final estimator. In short definition, ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produce more accurate solutions than a single model would.

from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression

clf = StackingClassifier(
    estimators=[('svm', SVC(random_state=42)),
                ('tree', DecisionTreeClassifier(random_state=42))],
    final_estimator=LogisticRegression(random_state=42),
    n_jobs=-1)

Tuning Model

Tuning is the process of maximizing a model's performance without overfitting or creating too high of a variance. In machine learning, this is accomplished by selecting appropriate "hyperparameters". You can define your own tuning method what ever you want. But here is mine.

from sklearn.model_selection import GridSearchCV

param_grid = {
    'svm__C': [1.6, 1.7, 1.8],
    'svm__kernel': ['rbf'],
    'tree__criterion': ['entropy'],
    'tree__max_depth': [9, 10, 11],
    'final_estimator__C': [1.3, 1.4, 1.5]
}

grid = GridSearchCV(
    estimator=clf,
    param_grid=param_grid,
    scoring='accuracy',
    n_jobs=-1)

grid.fit(X_train, y_train)

print('Best parameters: %s' % grid.best_params_)
print('Accuracy: %.2f' % grid.best_score_)

Based on the tuning process, the best hyperparameters are:

Best parameters: {'final_estimator__C': 1.3, 'svm__C': 1.6, 'svm__kernel': 'rbf', 'tree__criterion': 'entropy', 'tree__max_depth': 11}
Accuracy: 0.98

Create Final Model

Finally, I can create a final model with the best hyperparameters. I hope this model will not overfit.

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

final_clf = StackingClassifier(
    estimators=[('svm', SVC(C=1.6, kernel='rbf', random_state=42)),
                ('tree', DecisionTreeClassifier(criterion='entropy', max_depth=11, random_state=42))],
    final_estimator=LogisticRegression(C=1.3, random_state=42),
    n_jobs=-1)

final_clf.fit(X_train, y_train)
y_pred = final_clf.predict(X_test)

print('Accuracy score : ', accuracy_score(y_test, y_pred))
print('Precision score : ', precision_score(y_test, y_pred, average='weighted'))
print('Recall score : ', recall_score(y_test, y_pred, average='weighted'))
print('F1 score : ', f1_score(y_test, y_pred, average='weighted'))

Then I test the model with test data based on accuracy, precision, recall, and f1 score. The result are:

Accuracy score :  0.9721518987341772
Precision score :  0.9719379890530496
Recall score :  0.9721518987341772
F1 score :  0.9717932606523529

Looks pretty good! Check out this confusion matrix. If it's biased, please comment 😁.

Deploy Real App

This step is not required. But if you are interested, you must export the model first. Only the stacking classifier model, which was trained before. So you can load again in another program.

import pickle

pkl_filename = 'face_mask_detection.pkl'
with open(pkl_filename, 'wb') as file:
  pickle.dump(final_clf, file)

This process might be simple, but first you need to check out this diagram below.

Important thing to remember is you need to implement your own face detection model and crop it. For my example of program, check out my Github Repository.

DEV Community

Face Mask Detection With ResNet50 and SVM + Decision Tree

Dataset Retrieval

Preprocessing

Feature Extraction

Split Dataset

Define Model Classifier

Tuning Model

Create Final Model

Deploy Real App

Top comments (0)

Read next

Fast, Efficient Text Generation with Block-Attention for Retrieval-Augmented AI Models

CLIP Updated to Prefer Detailed Image Descriptions Over Captions

The Fastest Way to Start Your AI Project–Quickstart ModelKits

Genmo Mochi 1 — SOTA Video Generation Model — Full Tutorial With SwarmUI — Locally Generate Amazing AI Videos for Free