At this point most AWS professionals are familiar with the various options for how to store data in S3, but there is a point of confusion that can be expensive. At a previous company this confusion led them to spend three million dollars a year that they didn't have to.
Part of the confusion is understandable as it seem that AWS tweaks the S3 storage options at every re:Invent. Another part of the confusion however comes from thinking that if an answer is too good to be true it must be false.
The key distinction lies in a subtlety of Intelligent Tiering. We tend to be familiar with the "normal" storage classes of Standard-Access, Infrequent-Access, and "Glacier". We know the mantra that if you know you will not access an object much and can tolerate some delay when you do access it you can save money.
What is often missed however is that there is a cost to retrieve objects from Infrequent Access. There you are happily saving 50% of the cost after moving your objects to IA when someone accesses all that data, or worse yet accesses it multiple times (think data science jobs). Suddenly the access charges hit you with 100% or 200% of your cost. The sudden giant bill sours you (and your organization) on IA and it becomes a forbidden topic.
This is where the subtlety of Intelligent-Tiering (IT)comes in. Objects in the IT storage class are moved between the "normal" storage classes based on usage. When an object is unused it moves 'down' to storage classes and when it is used it moves back to Standard Access. The key point is that there is no access charge for these movements. So if an object in IT-IA is accessed and moved back to SA you do not pay the fee that you would have if the object had been in "normal" IA.
So, what's the catch? Well, there is monitoring charge per object in IT. This is not size based but simply based on the number of objects. So, if all of your objects were accessed all of the time IT would not save anything (because the objects would live in IT-SA) and there would in fact be the additional monitoring charge. So, if your objects are all under constant use then IT is not for you.
One middle ground that some companies embrace is to only set their largest objects to IT. The large objects are by definition the most expensive ones and so will gain the most from the cost saving of IT while minimizing the relative effect of the monitoring cost. These companies use S3 Lifecyle rules to move objects over a certain size to IT.
Another fear factor for companies is retrieval time. When Glacier was first introduced it was understood that not only would retrieval from Glacier be costly, it would also be slow. As more storage options were introduced each had its own retrieval characteristics. Glacier Deep Archive is the slowest and least expensive option (until you retrieve from it) because its basically a tape archive. On the other hand, Glacier Instant Access a bit of a self contradiction. It's glacier so it's "slow" but it's instant access so it's "fast?".
If you configure Intelligent Tiering to just use Standard Access, Infrequent Access and the strangely named Glacier Instant Access then you get the performance characteristics of normal Standard Access.
Some companies decide this is too good to be true and so assume its false. Other companies run performance tests of their own. These tests confirm that the top three levels of IT have (within 3%) the same first-byte-access times of Standard Access.
So, the bottom line is you can save half of your S3 costs with no change to your overall system performance. Once you see it in that light the choice seems clear.
If you configure Intelligent Tiering to just use Standard Access, Infrequent Access and the strangely named Glacier Instant Access then you get the performance characteristics of normal Standard Access.
Some companies decide this is too good to be true and so assume its false. Other companies run performance tests of their own. These tests confirm that the top three levels of IT have (within 3%) the same first-byte-access times of Standard Access.
So, the bottom line is you can save half of your S3 costs with no change to your overall system performance. Once you see it in that light the choice seems clear.
Top comments (0)