Andrew Betts for Fastly

Posted on May 1

Is purging still the hardest problem in computer science?

#caching #fastly #performance #devops

One of the most common reasons for customers sending support tickets to Fastly is for help with purging content from cache - either it stays too long, or doesn't stay long enough. We have the best purging mechanism of any edge network, so what's going on?

It's said there are only two hard problems in computer science: cache invalidation, naming things and off-by-one errors. Caching stuff close to where your users are is just one of many things you can do with Fastly these days, but it's still one of the easiest and most effective ways to make your website faster, more reliable and more engaging. Traditionally, you just set a TTL (time to live) and let the cache just hold onto it for that amount of time.

However, while serving outdated content might have been acceptable in the past it just doesn't cut the mustard today. News editors need stories to be up to date. Prices in stores must be correct and consistent. Application versions must be compatible. Purging the right things quickly is essential.

The three types of purge

How do you purge the right set of things? We offer three flavors of purging:

Single item / URL purge
Purge all
Surrogate key

On the off chance that you just want to purge one single item by its URL, that's a URL purge and on Fastly you do it by sending an HTTP PURGE request to the very URL you want to purge:

curl -X PURGE "https://www.example.com/hot-deals"

URL purges are simple and easy, but affecting just a single URL is not really very scalable, and if that URL is ever requested with a query string like ?page=2, that's a different URL! Sorry, you will have to purge that and every other variant as well.

The natural complement to the single purge, then, is the purge all. Very much for the "nuke it from orbit" crowd, this blunt instrument has no equal. A quick call to a fixed API endpoint and your entire service's cache is wiped:

curl -X POST "https://api.fastly.com/service/{SERVICE_ID}/purge_all" -H "Fastly-Key: {YOUR_FASTLY_API_TOKEN}"

Most cases are more nuanced than either of these scenarios. Purge anything that mentions product 36253; purge all images; purge all URLs under the /products/shirts prefix; purge all the 720p variants from all season 2 episodes of show 414562; purge all the resources associated with the "about us" page… the list of possible purge criteria is endless. For all your precision purging needs, look no further than surrogate keys, which let you apply tags to content and then purge everything that has a particular tag.

For example, you could serve a response from your origin server with the following header listing out all the tags you want to put on that response:

Surrogate-Key: html product-123 product-456 region-eu /products /products/shirts /products/shirts/123

Then, later, send a purge for one of the tags you put on that response:

curl -X POST "https://api.fastly.com/service/{SERVICE_ID}/purge/product-123" -H "Fastly-Key: {YOUR_FASTLY_API_TOKEN}"

This will surgically remove all the cached content that has that tag, without affecting the rest of your cache.

The fastest purge... in the world.

Fastly purges your stuff really fast. When you send us a URL or surrogate key purge, it will be received by the Fastly POP (point-of-presence) closest to you, and then from there it is replicated rapidly and efficiently to every other POP in the world. Copies of your objects will start to purge within 5ms and every Fastly POP in the world should be done in around 150ms. It's almost always complete within 250ms.

That's the fastest purge of any edge network or CDN (by quite a margin). The laws of physics give us a theoretical fastest purge time of about 65ms, the time it takes light to travel (one way) from one side of the planet to the other, but we also need to account for some processing time, buffering and retransmit delays, and the fact that cables do not go in completely straight lines, so 150ms is about the fastest you can ever expect a global purge to be.

(Note that "purge all" invokes a completely different mechanism and takes up to 1 minute to complete)

Cache more stuff!

When you can purge things in 150ms, you can cache more, and not worry about it. Historically it was risky to cache content on a CDN if it might change, or if it contained anything that was specific to the user browsing the site.

Some things never change

Content that never changes is great for caching, obviously. And you can help create more of that by using immutable URLs for all your assets. These contain a hash of the content of the file, like /scripts/main.142c04bb.min.js, and should be served with a cache policy such as:

Cache-Control: public, max-age=31536000, immutable

Web frameworks like Next.js will usually include this feature, but do check that they set the caching headers correctly!

For content that changes constantly or is otherwise genuinely uncacheable, use:

Cache-Control: private, no-store

But is it uncacheable, really?

Event driven content

What about content that doesn't change often but when it does, it needs to update immediately? Things like news articles - which typically remain static after publishing but may need an urgent update to correct an error?

We call this event-driven content. For this use case, you'll want to configure caching like this:

Cache-Control: public, max-age=3600
Surrogate-Control: public, max-age=31536000
Surrogate-Key: content article-12345

This tells Fastly to cache the content "forever" (for a year) but the browser should only cache for a short period (1 hour in this example). We also tag the content with any relevant surrogate key tags that will help you target it when you want to purge. When the article is updated, send a URL purge or a surrogate key purge. If you are using a Fastly plugin in your CMS, it might do this for you automatically!

Private or authenticated content

What if the content changes when the user logs in, or is subtly different depending on the user's privileges, memberships or subscriptions? If there are a lot of possible variations, consider personalizing the content at the edge, but if there's only a few, you can use the Vary header! Start by normalizing the authentication state on the request before it is forwarded to the origin (e.g. by decoding a JSON web token session cookie). You can then add headers to the request such as:

User-role: admin
User-region: europe
User-is-subscriber: true

And then your response content can be marked up with headers such as:

Cache-Control: private, no-store
Surrogate-Control: public, max-age=31536000
Surrogate-Key: content article-12345
Vary: User-is-subscriber, User-region

This works well if the number of permutations of the request headers you are varying on is small. We allow up to 50 variations per object (in each POP) in our VCL services and have slightly different rules for Compute services.

Purge power moves

So you can often cache more than you think you can, and using cache at the edge is one of the most effective ways to improve end-user experiences and reduce costs. Customers often come up with really interesting patterns for using purging - here are some to inspire you!

Soft purge and serving stale

Generally when you purge things you expect them to die immediately, the clue being very much in the word "purge". But there are a couple of scenarios where maybe we can do something better. What about things that are super popular, being requested thousands of times a minute, or even per second? When you purge things like that, it can create a poor experience for not just one user but everyone who wants that resource before we've managed to re-cache it.

Or what about when your origin servers go down? Some system could unhelpfully purge content at a bad moment when the origin is not available to serve a new version.

Both of these scenarios can be improved by supporting stale serving. Add the stale-while-revalidate and stale-if-error directives to your Cache-Control header to activate this behavior:

Cache-Control: public, max-age=31536000, stale-while-revalidate=60, stale-if-error=31536000

Now, when issuing your purge, set the soft purge flag, and instead of purging the object, we'll instead mark it as stale. This is a real superpower and frankly unless you have a good reason not to, you should consider doing this in pretty much every scenario where you purge.

Path prefix purging

Want to purge everything under /products/? Use surrogate keys to tag your responses with every path segment prefix:

Surrogate-Key: /products/winter-season/shirts/378245 /products/winter-season/shirts /products/winter-season /products

Now issue a surrogate key purge for "/products" or "/products/winter-season" and we'll delete everything with those prefixes (in 150ms!)

Slow purges

Sometimes you want to purge a lot of content, but you don't want it to all disappear at once, because that will cause unreasonably high load on your origin servers. Yann Harmon from Contentful has a great solution to this, adding a random-number surrogate key tag (from a set of 100) to every response. For example:

Surrogate-Key: purge-group-23

You can now issue purges for all 100 purge group tags (purge-group-1, purge-group-2, purge-group-3, etc), with whatever pause you want in between each one, to draw out the purge over a longer period of time, and give content time to re-cache. It's unusual to have customers want to purge more slowly but sometimes it's useful!

Calculated TTLs at the edge

It's generally best to determine cache policy at the point that content is generated, typically on your origin servers, and write the instructions for Fastly into headers like Cache-Control and Surrogate-Control. But you can also set the TTL of content within the logic of your Fastly service if you like.

This can allow you to do almost anything, for example one customer decided to have their cache reset once a day at a predetermined time, but didn't want to rely on issuing a purge on a schedule, so instead wrote logic to adjust the TTL of each response so that they always expire at the next reset point.

Combating race conditions

Sometimes it seems like a purge didn't work, and that's typically because it happened too early. If you purge Fastly before the upstream server has updated, we might just pull the old version of the content and re-cache it. Take care to ensure that your origin is serving the new version before you purge Fastly.

This also often happens when you're using our shielding feature, which puts two layers of Fastly between the end user and your origin. If your purge is processed by other Fastly POPs before it reaches the shield POP, the purged POPs might re-cache from the shield cache. This is typically only a problem with purge all, because it takes significantly longer than a URL purge or a surrogate key purge.

The way a "purge all" works actually provides a solution to this problem because this kind of purge is done by increasing the cache version number, and you can read the current version number from the configuration of your Fastly service. By comparing the version number of the upstream cache with the version of the local cache, it's possible to identify when a purge is in progress and responses should not yet be cached.

Or you could, you know, just purge twice.

Conclusion

Edge networks can now do incredibly smart stuff, and at Fastly we spent a lot of time building more and more features you can make use of to improve the security, performance and scalability of your apps and websites. But caching is still one of the most powerful things you can do, and something you should spend time getting right. It can have dramatic effects on your costs and environmental impact too.

Taking advantage of the speed and precision of Fastly's purging machinery allows you to get the maximum value out of edge caching. Tell us how you're doing it at community.fastly.com.

DEV Community