I first got into the topic of Cache when I was implementing a TV Set-Top Box software using Vue.js. I suppose that when creating a Vue.js library Evan You never expected a frontend JavaScript framework to be used in Televison Software :D . Ok, back to the topic. So, back then, everyone was talking about Cache. "Cache this, cache that, but make sure to not cache it as it contains sensitive data".
Since then, I have been working with several cache implementations so now I decided to dig into the topic a bit more and present to you in a clean and interesting form.
What is a Web Cache?
As Wikipedia states:
A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of images and other files can result in less overall delay when browsing the Web.
So in other words, Cache is a system used for decreasing the time necessary to view the page (load all necessary static assets, content requests, and so on).
We can define two main types of cache; forward
and reverse
.
- Forward - A forward cache is a cache outside the web server's network, e.g. in the client's web browser. A network-aware forward cache only caches heavily accessed items. A proxy server sitting between the client and web server can evaluate HTTP headers and choose whether to store web content.
- Reverse - Sits in front of one or more web servers, accelerating requests from the Internet and reducing peak server load. This is usually a content delivery network (CDN) that retains copies of web content at various points throughout a network.
A CDN allows for the quick transfer of assets needed for loading Internet content including HTML pages, javascript files, stylesheets, images, and videos.
How does the cache work?
Imagine following request being sent from the frontend to your backend in a form of a method.
getDataFromDatabase()
-> takes X miliseconds to get the data and return it to the frontend.
If you have several users using your website (and believe me, you do) your server would most probably fail to deliver the data to users after certain amount of requests.
And what if we could somehow store the response in another tool so that instead of running the same method and requesting data from database over and over again, we could just return the result that was returned by a previous request? Wouldn't that be awesome?
Let's take a look at following pseudocode:
if request in cache {
return cache[request] // This is called a cache hit
} else {
req = getDataFromDatabase() //
cache[request] = req // This is called a cache miss
return req //
}
This is how cache actually works. If a certain request is in cache, it will return response from cache instead of requesting data from the server (and database).
When to use cache?
Cache is a powerful mechanism and can greatly improve the performance of our page, but should be used with caution.
We should use cache if:
- The computation of our requests is quite slow
- The computation will run several times in a row
- When the output is the same for a particular input
- Hosting provides charges for DB Access.
*Remember to not use cache for requests/routes/assets that are meant for the certain user. If you use cache for i.e. /get-user-data
endpoint you may end up serving users different user data. Ouch!
Types of cache
We can define three major types of cache; browser
, server
, and proxy
.
Server
In this case, caching mechanism is situated on the server in a form of an application, tool, or a software. The popular types of server caching software can be:
Browser
In this case, caching mechanism is situated in the browser and is mainly used for caching resources like images, styles, assets, etc.
Proxy
In this case, caching mechanism is situated in the proxy server or reverse proxy server like Nginx, Apache, or Varnish, and most probably it is a part of ISP (Internet Service Provider).
Benefits of using Cache
Cache is a powerful mechanism that if used well can greatly increase a performance of our website by:
- Reducing latency
- Cutting down the server bandtwidth
- Reducing load on the server
HTTP Headers
Each response from the server will return the data and certain headers. In these headers there will be instructions for our browser on how to handle cache to store certain requests in the browser cache. There are two main cache headers that we should focus on to get a better understanding on how they work; expires
and cache-control
Expires
The Expires HTTP header contains the date/time after which the response is considered expired.
Invalid expiration dates with value 0 represent a date in the past and mean that the resource is already expired.
Note: If there is a Cache-Control header with the max-age or s-maxage directive in the response, the Expires header is ignored.
Expires: Wed, 21 Oct 2015 07:28:00 GMT
Cache-Control
The Cache-Control HTTP header holds directives (instructions) for caching in both requests and responses. If a given directive is in a request, it does not mean this directive is in the response.
-
Private
- only cached in the client -
Public
- can be also cached in the proxies -
no-store
- content won't be cached -
no-cache
- content can be cached but require validation from the server -
max-age
- tells the browser to keep cache for a certain number of seconds
For more directives visit Developer Mozilla.
Cache Validation
To check the validation, server might send one or more validation headers in the response which are used by the client to make a conditional request to the server.
For that two elements are used; Etag
and '':
ETag
It is a shortcut for entity tag and a unique identifier, sent from the server, associated to the resource. Client then makes request to the server with that ETag to check if the content has been changed.
Cache-Control: max-age=600 Public
ETag: "123dadwad3211wda"
Client will keep using this image from the cache for 600 seconds. After this time, client will make a call to the server with If-None-Match
header and it will send as a value a previously mentioned ETag. The server will then the ETag with new content. If it does not mach, server will respond with the new ETag and the new resource which will be used to replace current image.
If it does mach the existing image, the server will respond with the status code 304 Not Modified
and the client will renew the cache for another 600 seconds.
There are two types of ETags:
- Strong -
ETag: "123dadwad3211wda"
- Two resources are exactly the same. - Weak -
ETag: W/"123dadwad3211wda"
- Two resources can be considered the same.
Last-Modified
Indicates the date and time when the content was last modified. When the content gets stale, it will make the conditional request with the last modifed date using the If-Modified-Since header which will then be used by the server to either return 304 Not Modified
or to return a new response.
Server
Last-Modified: Mon, 24 Mar 2021 11:15:30 GMT
Client
If-Modified-Since: Mon, 24 Mar 2021 11:15:30 GMT
Q&A
Q: What if both headers are present in the response?
A: Both params ETag and If-None-Match and Last-Modified and If-Modified-Since are being sent and the server checks both values to either return 304 Not Modified or to return a new content.
Q: What if none validation headersare present in the response?
A: There will be no calls to validate and refresh the existing cache. Fresh content will be requested as soon as the content gets stale.
Caching Strategy
There is no global answer to that as it varies on many factors but normally we can define two major cache categories:
- Light Caching - i.e. HTML. It is cached but client needs to validate with the server before using it. By using it we are making sure that the client always gets the latest HTML whenever we have it available on the server but if HTML files have not been updated, it can avoid downloading and serve the one that is cached in the browser.
Cache-Control: Private, no-cache
- Agressive Caching - i.e. CSS, JavaScript, Images. By using following example we are caching these files in the public caches for a long time.
Cache-Control: Public, max-age=23412213
Summary
Well done! Now, you should be more aware of the concept of Caching and how to leverage its potential to the full extend.
Bonus Links:
Top comments (2)
A good read. Thanks man. C
Some comments may only be visible to logged-in visitors. Sign in to view all comments.