If you ever meet a developer who says size doesn't matter then you'd expect them to have one sizable cloud budget to work with! For everyone else though, size absolutely matters, especially when dealing with image storage on the cloud.
Almost every web application I have worked on over the few years has had some form of requirement for image hosting, be it a simple image gallery or user profile picture. So with the high availability of cloud storage options, and the low cost to stash away gigabytes of data, it's very easy for most of us to dismiss the any concerns about hosting data on the cloud. But we can all forget too easily when estimating our cloud storage budget that we're not just required to pay to store the total volume of our data in the cloud. But we also have to pay for each and every time our data needs to leave the cloud as well.
Lets imagine that we have an application that allows users to upload photos to use as their profile avatar. The user jumps onto their phone and grabs their latest insta/tinder-worthy pic and uploads this to our server. Lets assume that the image they upload is of decent quality and about 4mb
in size. Now because our app is super awesome, we start going viral and land ourselves about 10,000 daily active users. Nice!
Now lets also imagine that each one of our 10,000 users uploaded the equivalent 4mb profile picture. Then we would be storing 40GB worth of profile pictures into our cloud storage. This isn't too bad when vendors like AWS are charging about $0.025AUD per GB of storage. We can handle that pretty well. But remember, we have 10,000 daily active users, and each time they access our app they will be loading 1 or many more other users profile pictures into their feed. This means our app will be dishing out at a minimum of 40GB of data per day -> 1200GB per month!
This is going to get expensive real fast!
Image Compression to the rescue!
Luckily for us, we live in a day and age where image compression and optimization is a walk in the park, and we can easily whittle our bloated users 4MB profile pic down to a nice couple of kilobytes, making a much nicer web friendly image. So over the next few steps I'll show you how you can quickly achieve a nice little image compression pipeline for your application built using a couple of S3 buckets, and a single Lambda function on AWS.
Our general processing pipeline will look something like this. At one end we have an application which allows users to upload profile images through to an S3 bucket. This bucket will only serve as a landing zone for the full resolution images provided by our user to be uploaded in to. We then setup our S3 bucket with a trigger to notify our Lambda function that a new image has arrived, and is ready to be compressed. Our Lambda function can then download the file from the source bucket, and using the Node.js Sharp package, we will shrink the image down to a more appropriate 200x200 avatar image size. The Lambda function will then save the transformed image into our second S3 bucket, which in turn will allow our app users to read in our compressed images, saving us a stack of data transfer fees.
Why two buckets?
You could absolutely get away with using just one bucket. But my personal preference is to use two buckets as a risk mitigation strategy against some dangerous, and extremely expensive recursive event loops. As you can see from the image below, with one S3 bucket our user would upload an image to our bucket. That bucket generates a notification out to our lambda function to compress an image. When the lambda function is finished, the image gets saved back into the bucket. Which in turn fires off another notification that a new image has been uploaded to the bucket, which fires off our lambda ... and so on and so on.
You get it. We could end up in a cycle where we are recursively compressing an image and that (speaking from experience) is one costly mistake (about $700 AUD per day for those interested!).
Now if you really want to use a single bucket architecture, you could mitigate this risk by doing some smart things with object prefixes used for the S3 event trigger, or using metadata descriptors to help identify which objects should be processed. But by far the safest approach I know is to use two completely independent buckets where by one emits an event to compress an image, and the other simply receives compressed files. So this is the approach I will be demonstrating.
Building the Image Compression Pipeline
To make the setup and tear down of this application nice and quick, I have put everything together using an AWS SAM. Using SAM we can define and deploy our AWS resources using a nice yaml template, and the SAM CLI tools. If you're new to AWS SAM, I'd suggest taking some time to read up on it's functionality before pushing too much further ahead.
1. Create a new SAM project
First off we will create a new SAM project. Assuming you have the SAM CLI tools installed, the from the command line we can run
sam init
Stepping through the init options I've used the following for my project configuration.
Which template source would you like to use?
1 - AWS Quick Start Template
What package type would you like to use?
1 - Zip (artifact is a zip uploaded to S3)
Which runtime would you like to use?
1 - nodejs14.x
Project name [sam-app]: sizematters
2. Define the SAM template.yaml
Once SAM has initialized our project, we can step into our project directory and setup customize our template.yaml
. This template holds all of our logic we will pass to AWS CloudFormation
to setup and provision our S3 buckets, and Lambda function, and to configure the event notifications from S3
.
Our finished template will look something like this
# <rootDir>/template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Size Matters image compression pipeline
Parameters:
UncompressedBucketName:
Type: String
Description: "Bucket for storing full resolution images"
CompressedBucketName:
Type: String
Description: "Bucket for storing compressed images"
Resources:
UncompressedBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref UncompressedBucketName
CompressedBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Ref CompressedBucketName
ImageCompressorLambda:
Type: AWS::Serverless::Function
Properties:
Handler: src/index.handler
Runtime: nodejs14.x
MemorySize: 1536
Timeout: 60
Environment:
Variables:
UNCOMPRESSED_BUCKET: !Ref UncompressedBucketName
COMPRESSED_BUCKET: !Ref CompressedBucketName
Policies:
- S3ReadPolicy:
BucketName: !Ref UncompressedBucketName
- S3WritePolicy:
BucketName: !Ref CompressedBucketName
Events:
CompressImageEvent:
Type: S3
Properties:
Bucket: !Ref UncompressedBucket
Events: s3:ObjectCreated:*
Walking through our template.yaml
, from the top we have our Parameters
block. These parameters will allow us to pass in some names for our S3 buckets when deploying our SAM template.
Next we have our Resources
block. The first two resources referenced are the S3 buckets we will be created, named UncompressedBucket
and CompressedBucket
. One bucket will serve as the landing zone for our image uploads, and the other for the compressed image outputs. Both buckets then have their respective bucket names set from the parameters we previously defined.
Next within our Resources
block we have our Lambda function ImageCompressorLambda
. Within our function we will be using a Node.js runtime, and I have pointed the Lambda handler towards the src/index.hanlder
location. We are passing in a couple of environment variables in the Environment
section referencing both of our S3 buckets previously defined, to make life easier when building out our Lambda function logic. I have also attached a couple of the SAM helper policies under the Policies
block, giving the lambda function the appropriate permissions to read data from the Uncompressed image bucket, and write data to the Compressed image bucket.
Lastly, we can configure our event trigger for our lambda function. The event structure used in this template is set to be fired any time an object is created within our Uncompressed
S3 bucket. If you like, you can add additional rules and logic here to only fire events for certain file types, or object key prefix/suffixes. But again, in the name of simplicity for a demo, I've left this to handle all files, at any path.
3. Add Sharp as a dependency to Lambda
To do the heaving lifting of image compression and manipulation, we will be using the Node.js Sharp package. This is one mighty powerful library, and we will only be using a tiny element of it to shrink our image sizes. But I encourage you to explore their documentation and see all the possibilities on offer.
To setup our lambda function, we first need to add sharp
as a dependency. Looking at the documentation provided by the Sharp team, we can see that in order to run Sharp on AWS Lambda, we need to make sure the binaries present within our node_modules
are targeted for a Linux x64 platform, and depending on which OS we are installing the package from may result in some incompatible binaries being loaded. So to install sharp
for our lambda, we can run the following from our project directory.
# windows users
rmdir /s /q node_modules/sharp
npm install --arch=x64 --platform=linux sharp
# mac users
rm -rf node_modules/sharp
SHARP_IGNORE_GLOBAL_LIBVIPS=1 npm install --arch=x64 --platform=linux sharp
In short - this will hard remove Sharp from our node_modules if it exists, and provide an install dedicated to Linux x64 systems, best suited for AWS Lambda.
4. Setup the Lambda logic
With sharp
now installed, we can configure our Lambda logic. Back in the template.yaml
we defined earlier, we specified the lambda handler to exist at src/index.handler
. So within our projects src
folder, lets created an index.js
file. Then we can use the following code snippet to build out our function logic.
// src/index.js
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const sharp = require('sharp');
exports.handler = async (event) => {
// Collect the object key from the S3 event record
const { key } = event.Records[0].s3.object;
console.log({ triggerObject: key });
// Collect the full resolution image from s3 using the object key
const uncompressedImage = await S3.getObject({
Bucket: process.env.UNCOMPRESSED_BUCKET,
Key: key,
}).promise();
// Compress the image to a 200x200 avatar square as a buffer, without stretching
const compressedImageBuffer = await sharp(uncompressedImage.Body)
.resize({
width: 200,
height: 200,
fit: 'cover'
})
.toBuffer();
// Upload the compressed image buffer to the Compressed Images bucket
await S3.putObject({
Bucket: process.env.COMPRESSED_BUCKET,
Key: key,
Body: compressedImageBuffer,
ContentType: "image"
}).promise();
console.log(`Compressing ${key} complete!`)
}
Stepping through the pieces, we first require in our AWS-SDK
, S3
, and sharp
packages. We also define our general lambda handler function, passing in the event to operate with.
// <rootDir>/src/index.js
const AWS = require('aws-sdk');
const S3 = new AWS.S3();
const sharp = require('sharp');
exports.handler = async (event) => {
...
}
Next, we can extract the image object key of the from the event that triggered the lambdas execution.
// <rootDir>/src/index.js
const { key } = event.Records[0].s3.object;
Using the AWS S3 SDK, we can then download the image to our lambda using the key
previously collected. Note, that because we defined our environment variables back in our template.yaml
for our lambda function, we can use process.env.UNCOMPRESSED_BUCKET
to reference our Uncompressed bucket name.
// <rootDir>/src/index.js
const uncompressedImage = await S3.getObject({
Bucket: process.env.UNCOMPRESSED_BUCKET,
Key: key,
}).promise();
Now, with the result of our downloaded image, we can pass the buffer data into sharp
. Again, we are only making a very simple change here with sharp. We are shrinking the source image down to a 200x200 square, without stretching any of the image aspects to make a nice web friendly avatar image. You could do a lot more here like changing the compression level, or file type. But for this example, again we're keeping it nice and simple.
// <rootDir>/src/index.js
const compressedImageBuffer = await sharp(uncompressedImage.Body)
.resize({
width: 200,
height: 200,
fit: 'cover'
})
.toBuffer();
Then with the transformed image from sharp
, we can take the response buffer and save that into our Compressed bucket. Because we are uploading this into our second bucket, I'm simply using the exact same key to save the file in the same relative location. So no need to worry about overwriting the original here.
// <rootDir>/src/index.js
await S3.putObject({
Bucket: process.env.COMPRESSED_BUCKET,
Key: key,
Body: compressedImageBuffer,
ContentType: "image"
}).promise();
With all the pieces put together, it's time to build and deploy our pipeline!
5. Build and Deploy
To build the project from the command line run
sam build --use-container
This will check your template.yaml
is valid, and prepare the lambda function assets ready for uploading.
Once that is complete we can then run the following to push our build up to AWS.
sam deploy --guided
Stepping through the guided deployment options, we are given some options to specify our application stack name, region, and our parameters we defined within our template.yaml
.
Setting default arguments for 'sam deploy'
=========================================
Stack Name [<your-stack-name>]:
AWS Region [<your-aws-region>]:
Parameter UncompressedBucketName []:
Parameter CompressedBucketName []:
If all has gone to plan, you should be able to log into your console and see the two new buckets have been created, and your lambda function is ready to start crushing those image sizes!
6. Test it out
The easiest way to test out or new image compressing pipeline is to simply log into your AWS Console, and upload an image file into your Uncompressed
bucket. This will fire off the notification event to our Lambda function to compress the image, and if all has gone to plan, you should be able to check your Compressed
bucket and see your compressed file has been created.
From a quick test I ran, we can see that after uploading a 3MB full size image, we were able to shrink this down to just under 10KB. Awesome!
Recap
So going back to our application example. If we were so lucky to have 10,000 daily active users hitting our awesome application, which is now supported with a nice image compression and optimization pipeline, then we would still be having a solid 40GB of pictures being uploaded by the user base over a year. But by shrinking and compressing the images down to a more reasonable 10KB or smaller size, we are now able to stem our data out charges dramatically, changing our data out rate from a potential 40GB per day to around 100MB per day! That's a massive 400% decrease in data out! So I think it's fair to say, of course size matters!
Cover Photo by Galen Crout on Unsplash
Top comments (14)
If you don't need to store full quality images in S3 another option is to create a custom
Sharp
script to run during the image upload phase using their SDK. This way you could avoid Lambda function costs as well 😉Uploading images during the processing step might slow down your application as a compression is CPU intensive and would likely block your threads leading to less throughput if you had a lot of requests coming in at the same time.
do you have any clue how can we do that
This is a very nice and very detailed guide. I had to create something similar for my company's website a while ago and I wish I had something clear like this as a starting point.
My company also wanted to be able to responsively display different image sizes depending on the device's resolution (via srcset). We also wanted the ability to crop and rotate images while still maintaining the original image.
Originally I was doing something similar to this, except I was creating 4 differently sized images all at once, but that didn't give us enough flexibility as well as made changing the rotation/cropping later on trickier as we didn't want to upload another image to get the lambda trigger to activate again.
So I was inspired by cloudinary and imgix in particular to create something that would apply transformations to images on-the-fly depending on the URL.
It basically boiled down to using Cloudfront CDN as an endpoint that would call a Lambda@Edge Origin Response trigger which would in turn grab the image from S3, modify it with Sharp, and then store the new cropped image back into S3 [optional].
For example:
imgs.mysite.com/test.jpg
is the original raw image.imgs.mysite.com/c/0,0,100,100/q/80/r/90/test.jpg
would crop it to 100x100, set the quality to 80%, and rotate it 90 degrees. All we do is store he original image name in our database as well as crop and rotation boundaries we get from an image uploading component and then it's easy to reconstruct the appropriate urls client-side.Thanks Khauri!
That's a really neat process, and absolutely the next logical step for working with a much wider range of images than just simple avatar pics. I'm assuming this is something similar to what Next.js is currently doing in the background with their image optimization process, but it sounds like your setup may offer a lot more flexibility with cropping and rotation options as well. That's pretty awesome!
It's a really nice article. Got me up and running. But after I made changes to the script how do I re-deploy? It shows error that it already exist. So how do I deploy only the updated code?
Hi Sayantan,
Hard to say what the actual problem might be without seeing all the error logs, but typically when redeploying cloud formation assets there could be a couple of issues that trip it up. The most common would be if one of the resources created by the cloud formation stack has been manually modified in the aws console. For example, if you create the s3 buckets using AWS Sam, then edit the bucket in S3, this can potentially break the logical ids used between Sam/Cloudformation and the actual resource. So when you try to update, cloud formation is unsure of how to resolve the changes. If this is the case, and your not yet running production workload, it’s best to delete the modified resource and re-deploy again.
Hope this helps.
Thanks for your response. I was also doing the same. But even in this way everytime I need to delete the Cloudformation, Buckets etc. I need is to re-deploy the lamdba. Bucket and other things are fine already. Could you please point me towards right direction? I'm actually new to lambda. Created my first ever lambda with the help of your article. Thanks for that.
Yeah it definitely shouldn’t require everything to be manually deleted each time. I’d take a look at the aws SAM docs and walk through some of the examples there to see what might be going wrong.
docs.aws.amazon.com/serverless-app...
docs.aws.amazon.com/serverless-app...
Thanks a lot. 👍👍
Thanks for the awesome article! Did your final images drop noticeably in quality using this method? I have a similar pipeline and my images look noticeably worse when the original image is from a high-end mobile device with a powerful camera. I've tried using max quality for sharp but no luck. Any ideas? Thanks!
Thanks Uche,
No significant quality loss in my end unfortunately? The size of the image you are setting may come into play as well as quality settings? E.g compressing an image down to 200px width and height but displaying at 800px will always look like poor quality?
Nice article! I’m assuming the output bucket stores the image under the same key as was provided to the input bucket — yeah? So for example if I uploaded a profile picture, I can save the ouput_bucket/key combo to my profile picture field in the database. Is that how you’re approaching this here from the frontend’s perspective? Obviously the frontend is clueless as to when the transformation is finished, which is why I’m asking.
Thanks! Yep, spot on. The image uses the same key in both buckets. As for the front end, there’s a couple of options you could do. Update your db record with the compressed bucket / key path immediately, but also keep a cached version of the uploaded file in the browser session after the user uploads. This way you can use the original image straight away, and wait for the compression to complete a second or two later. Alternatively, you could have a fallback strategy when loading the image - e.g try loading the compressed bucket key path and if that fails load in the source bucket key path.
Depending on the size of the images you are working with, the actual compression/conversion time is pretty damn quick, and you could bump up your lambda memory allocation to try and process things a little faster. So it’s possible that you could just use the compressed bucket key path directly without skipping a beat.