Renato Losio 💭💥 for AWS Heroes

Posted on Dec 13 • Originally published at cloudiamo.com on Dec 13

S3 Lifecycle or Intelligent-Tiering? Object Size Always Matters

#cloud #aws #costoptimization #s3

As a cloud architect and storage expert, I often hear the question: Which storage class should I use on Amazon S3? The answer is simple: as always, it depends. Today, we’ll explore some anti-patterns you might overlook when transitioning data between storage classes, whether using lifecycle rules or a managed class like S3 Intelligent-Tiering.

A Short Journey Back in Time

Amazon S3 was announced almost 19 years ago. It’s one of the oldest services on the platform, second only to Amazon SQS. What we see today as Amazon S3 vastly differs from what was available many years ago.

Even just a few years ago, there was no S3 Storage Lens, no conditional writes (introduced earlier this year), and no S3 Intelligent-Tiering storage class. There were no S3 triggers, there was no replication across buckets or regions. When I started working with S3 (around 2010), it was a fairly limited service but, in some ways, simpler than it is today.

At the same time, we tend to forget how expensive it was. As part of the announcement 19 years ago, Jeff Barr wrote:

“1 GB of data for 1 month costs just 15 cents.”

The same storage class today costs just 2.3 cents (or less) per GB in the US East region. With additional storage classes designed to optimize costs, it can go below 0.1 cents per GB.

Storage Class	Storage Price per GB
Standard	2.3 cents (or less)
Standard Infrequent Access	1.25 cents
Glacier Instant Retrieval	0.4 cents
Glacier Deep Archive	Less than 0.1 cents

AWS keeps iterating, adding new features and options. As a cloud architect, your job is to stay up to date. If you deployed and optimized costs on S3 five years ago, it’s time to revisit your setup. There’s a good chance your deployment is no longer optimized, and you might be leaving money on the table. Keep iterating!

How to Choose a Storage Class

Choosing the right storage class depends on the scenario, use case, and associated costs. The price isn’t just about storage. Costs also depend on how you access your data, how you upload it, and how you move it around. So, deciding on the best storage class isn’t always straightforward.

Hot storage refers to data that is frequently accessed and requires fast, consistent response times. On the other hand, cold storage is designed for data accessed infrequently and without the urgency required for hot data. The challenge? The same data often transitions from hot to cold over time.

Different classes come with varying retrieval costs, latency, availability, and minimum time commitments. Some classes store every copy of the data in a single Availability Zone, while others replicate data across three different ones. Different requirements mean different use cases. So, the big question is: how do you choose your storage class?

There are some common patterns. Typically, new data is “warmer.” Over time, the likelihood of accessing that data decreases. This isn’t unique to AWS or object storage. For example, Dropbox has shown that even in backup solutions, most retrieved data is newly uploaded:

“As we have observed that 90% of retrievals are for data uploaded in the last year and 80% of retrievals happen within the first 100 days”

Ideally, you want recent data in hotter storage classes and older data in colder ones.

Lifecycle Rules Help

How can you move data across storage classes? On S3, the easiest way is with S3 Lifecycle. These rules are incredibly powerful—you can decide which storage class to use, the size of objects to move, and when to move them. For example, I can move my data to a cheaper storage class after 45 days because I know I will not need to access it anymore. I can also specify that only large files are moved to colder storage classes, while smaller files (like thumbnails) stay in the Standard class to avoid a higher chance of retrieval fees.

Why do I focus on size? Object size matters for lifecycle rules and, as we will see shortly, for S3 Intelligent-Tiering as well. You pay for operations (transitions) by the number of objects, while storage is charged by size. What does that mean in practice? Let’s break it down with a simple example.

Imagine I have 1 PB of data to move from Standard Infrequent Access (Standard-IA) to Glacier Instant Retrieval (Glacier-IR). How much does the transition cost? How does it impact my overall S3 costs? And how long does it take to recoup my transition expenses?

	1GB size objects	1MB size objects	200KB size objects
Transition to GIR:	1M	1000M	5000M
Cost of transitions (one off):	20 USD	20000 USD	100000 USD
Storage saving (monthly):	17000 USD	17000 USD	17000 USD
Recoup time:	< 1 day	1+ month	6 months

If my 1 PB consists of 1 GB objects, I’ll pay about $20 for the transition, while monthly storage costs would be approximately $17000. However, as object sizes decrease, transition costs become less trivial. You might still choose to move the data, but it’s important to understand your upfront costs.

The best way to estimate these costs is to use S3 Storage Lens. Ideally, you want most small objects in the Standard class and larger objects in colder classes. Below is an example of a distribution for an almost 20 PB bucket.

A more detailed example was shared by Canva last year on the AWS Storage Blog: How Canva saves over $3 million annually in Amazon S3 costs

What about S3 Intelligent-Tiering?

Why develop your logic when Amazon provides a managed class that handles this for you, without charging transition or retrieval fees? S3 Intelligent-Tiering is indeed the ideal choice for most use cases and the default choice for many workloads.

However, it’s still important to understand how the S3 Intelligent-Tiering class works under the hood and why object size remains a key factor. Let’s start with the AWS documentation:

“For a small monthly object monitoring and automation charge S3 Intelligent-Tiering moves objects that have not been accessed for 30 consecutive days to the Infrequent Access tier for savings of 40%; and after 90 days of no access they’re moved to the Archive Instant Access tier with savings of 68%. If the objects are accessed later S3 Intelligent-Tiering moves the objects back to the Frequent Access tier .”

The bold emphasis is mine. Now, how much is the “small monthly object monitoring and automation charge”? What are the implications of moving objects back? Let’s revisit the example of 1 PB of data. How much would this small management fee cost?

1GB objects	1MB objects	200KB objects
1M	1000M	5000M
2.5 USD	2500 USD	12500 USD

Once more, the answers depend on the average size of your objects. The management fee might be negligible, or it might cost you more each month than the storage itself.

With S3 Intelligent-Tiering, there are no retrieval fees, but the object is moved back to the most expensive Frequent Access Tier (Standard class) for 30 days and then stays for another 60 days in the Infrequent Access tier. Retrieving one object means you’re not paying a retrieval fee, but you are paying more for the storage of the object for the next 90 days. How much is that extra overhead?

	1GB objects	1MB objects	200K objects
Intelligent-Tiering	48.5 USD	48.5 USD	48.5 USD
GIR (once)	30 USD	30 USD	30 USD
GIR (10x)	300 USD	300 USD	300 USD

Here, the key metric is the number of times you expect to retrieve that specific (colder) object after the first retrieval. Should you not use the managed storage class? Far from it. S3 Intelligent-Tiering is an amazing option, but—as with any managed service—it’s critical to understand the cost implications. Your retrieval patterns and the average storage size of your object matter.

Whenever you develop your logic with Lifecycle Rules or you delegate the logic to the S3 Intelligent-Tiering class, the average size of your objects on S3 is a significant factor in your storage costs. Move larger objects first; that’s where most of the storage saving lies.

DEV Community

S3 Lifecycle or Intelligent-Tiering? Object Size Always Matters

Top comments (0)

Read next

EKS Auto Mode Arrives in Terraform – Simplify Kubernetes Today

Practical DynamoDB - Locking Reads

Highlights from Werner Vogels Keynote at re:Invent 2024

⚙️ "Dynamic Scaling & Performance: EKS Auto Mode Insights" 🔄