Computer vision with deep convolutional neural networks has seen a boom during recent years, and it is now applied in everything from computer-aided medical diagnosis to Unmanned Aerial Vehicles, Augmented Reality and visual search engines. Despite the amazing capabilities of modern deep learning techniques, certain challenges remain, such as getting enough data. One of the remedies is data augmentation. In this blog post, I will briefly explain a handful of data augmentation techniques in image recognition and show some examples of how to easily implement these with the Albumentations library.
Data Augmentation - What is it good for?
One major issue is that the performance of deep learning models is highly dependent on the amount of data available for training. Lack of data can result in overfitting of our model, and since relevant, good-quality data can be difficult and/or expensive to obtain, we need to squeeze out every drop of nectar from the data we have on hand. Data augmentation has proven to be particularly useful in this regard.
When we perform data augmentation, we create manipulated versions of the images in our training dataset so that we end up with many altered versions of the same images. By doing this, we achieve two things:
- We increase the amount of samples in our training data.
- By exposing the model to images with a higher variety of features, it gets forced to ignore non-relevant features, making it generalize better.
Disclaimer: Choose and use image augmentation techniques with care. Some methods may not be applicable to your use case. Wrong use and overuse can harm the performance of the model.
Albumentations
There are many good options when it comes to tools and libraries for implementing data augmentation into our deep learning pipeline. You could for instance do your own augmentations using NumPy or Pillow. Some of the most popular dedicated libraries for image augmentation include Albumentations, imgaug, and Augmentor. Both TensorFlow and PyTorch even come with their own packages dedicated to image augmentation.
Dedicated libraries for image augmentation provide many advantages that makes our lives easier. For one, they allow us to declare our data augmentation pipeline in a single place for use through a unified interface. In Albumentations, this interface is available as A.Compose()
which lets us define the augmentation pipeline with the list of augmentations we want to use:
import albumentations as A
import cv2
# Load image
im = cv2.imread("your_image_path.png")
# Define augmentation pipeline
transform = A.Compose([
'''
List of augmentation methods.
'''
])
To get an image with the augmentations applied to it, we can simply run transformed = transform(im)
with the image as input.
Image augmentation libraries also make it easy to apply the same augmentations on an input image and the respective mask when working with image segmentation tasks. In addition, libraries allow for simple control of the probabilities and magnitudes for each of our transformations. For instance, if we want rotation to be applied 50% of the time, with a maximum of 45 degrees, we can write it as A.Rotate(limit=45, p=0.5)
.
Albumentations, which I will use for the examples in this post, supports 60 different image augmentations while also allowing us to add other augmentations to the pipeline. They also boast with the highest performance in terms of speed among the most popular augmentation libraries. A simple pip install albumentations
and we are ready to go!
Defining an augmentation pipeline
The Compose class lets us define the augmentation pipeline with the list of augmentations we want to use. Calling Compose returns a function that applies the image augmentation.
import albumentations as A
import cv2
# Load image
im = cv2.imread("your_image_path.png")
# Define augmentation pipeline
transform = A.Compose([
'''
List of augmentation methods.
'''
])
# Apply augmentations
transformed = transform(im)
Affine transformations
Affine transformations involve geometric transformations to images such as translation, rotation, scaling (zoom) and shearing. ShiftScaleRotate lets us apply several affine transformations and adjust their respective magnitudes:
# Load image
im = cv2.imread("robin.png")
# Define augmentation pipeline
transform = A.Compose([
A.ShiftScaleRotate(shift_limit=0.0625,
scale_limit=0.1,
rotate_limit=45,
p=0.5),
])
# Apply augmentations
transformed = transform(im)
Noise
There are a lot of ways we can inject different types of noise into an image, including blur, gaussian noise, shuffling of channels in a color image, changes in brightness, colors, contrast, and the list goes on. Below is just a few examples, but Albumentations allows many more.
# Load image
im = cv2.imread("robin.png")
# Define augmentation pipeline
transform = A.Compose([
A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.2,p=0.5),
A.MotionBlur(blur_limit=33, p=0.1),
A.GaussNoise(var_limit=(0, 255), p=0.1)
])
# Apply augmentations
transformed = transform(im)
Dropout transformations
We can remove part of the information in an image using dropout transformations. This includes, but is not limited to, replacing regions of the image with zero or random values and removing a channel if it is a color image.
# Load image
im = cv2.imread("robin.png")
# Define augmentation pipeline
transform = A.Compose([
A.CoarseDropout(max_holes=6, max_height=32, max_width=32, p=0.1),
A.ChannelDropout(p=0.05)
])
# Apply augmentations
transformed = transform(im)
Other Spatial Distortions
Another cool way to augment an image is to use different distortion techniques for altering the shapes portrayed in the image. I added a grid to the illustrations to better visualize the augmentations.
# Load image
im = cv2.imread("robin.png")
# Define augmentation pipeline
transform = A.Compose([
A.GridDistortion(num_steps=5, distort_limit=0.1, p=0.1),
A.OpticalDistortion(distort_limit=0.2, shift_limit=0.05, p=0.1)
])
# Apply augmentations
transformed = transform(im)
Finally, gathering all the augmentations from this blog post into an augmentation pipeline would look something like this:
transform = A.Compose([
A.ShiftScaleRotate(shift_limit=0.0625, scale_limit=0.1, rotate_limit=45, p=0.5),
A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.2, hue=0.2,p=0.5),
A.MotionBlur(blur_limit=33, p=0.1),
A.GaussNoise(var_limit=(0, 255), p=0.1),
A.CoarseDropout(max_holes=6, max_height=32, max_width=32, p=0.1),
A.ChannelDropout(p=0.05),
A.GridDistortion(num_steps=5, distort_limit=0.1, p=0.1),
A.OpticalDistortion(distort_limit=0.2, shift_limit=0.05, p=0.1)
])
Time to try it yourself!
Hopefully, you now have a basic understanding of what data augmentation is and how we can create augmentation pipelines for use in training deep learning models. There are of course a lot more augmentations that could be useful beyond the examples in this blog post, many of which are available in Albumentations. This means, it is time for you to try it out yourself. Experiment with different methods and see how they impact the performance of your models!
Further reading:
If you want to dig deeper into image augmentation techniques or look at more examples with Albumentations, I recommend to start here:
Top comments (0)