Web Caching - ETag/If-None-Match

#webdev #cache #performance #tutorial

Support code

In the previous post, we explored the usefulness of the Last-Modified Response Header and If-Modified-Since Request Header. They work really well when dealing with an endpoint returning a file.

What about data retrieved from a database or assembled from different sources?

Request	Response	Value example
`Last-Modified`	`If-Modified-Since`	`Thu, 15 Nov 2023 19:18:46 GMT`
`ETag`	`If-None-Match`	`75e7b6f64078bb53b7aaab5c457de56f`

Also here, we have a tuple of headers. One must be provided by the requester (ETag), while the other is returned by the sender (If-None-Match). The value is a hash generated on the content of the response.

If you want to go directly to using headers, go to the endpoint. Otherwise, observe (but don't spend too much time on) the implementation.

Preparation

For simplicity, we use an in-memory DB. It is exposed via the endpoint /db. It contains a list of posts. Each post contains a title and a tag. Posts can be added via POST, and modified via PATCH.

Retrieval is via a GET function, which optionally filters by tag.

src/db.mjs



import { getJSONBody } from "./utils.mjs";

const POSTS = [
  { title: "Caching", tag: "code" },
  { title: "Headers", tag: "code" },
  { title: "Dogs", tag: "animals" },
];

export function GET(tag) {
  let posts = POSTS;
  if (tag) posts = posts.filter((post) => post.tag === tag);
  return posts;
}

export default async function db(req, res) {
  switch (req.method) {
    case "POST": {
      const [body, err] = await getJSONBody(req);
      if (err) {
        res.writeHead(500).end("Something went wrong");
        return;
      }

      POSTS.push(body);
      res.writeHead(201).end();
      return;
    }
    case "PATCH":
      const [body, err] = await getJSONBody(req);
      if (err) {
        res.writeHead(500).end("Something went wrong");
        return;
      }

      POSTS.at(body.index).title = body.title;
      res.writeHead(200).end();
      return;
  }
}

src/utils.mjs



export function getURL(req) {
  return new URL(req.url, `http://${req.headers.host}`);
}

export async function getJSONBody(req) {
  return new Promise((resolve) => {
    let body = "";
    req.on("data", (chunk) => (body += chunk));
    req.on("error", (err) => resolve([null, err]));
    req.on("end", () => resolve([JSON.parse(body), null]));
  });
}

Endpoint

By registering the db, we will be able to modify the content of the responses in real-time, appreciating the usefulness of ETag.
Also, let's register and create the /only-etag endpoint.



// src/index.mjs
import { createServer } from "http";
import db from ".src/db.mjs";
import onlyETag from "./src/only-etag.mjs";
import { getURL } from "./src/utils.mjs";

createServer(async (req, res) => {
  switch (getURL(req).pathname) {
    case "/only-etag":
      return await onlyETag(req, res);
    case "/db":
      return await db(req, res);
  }
}).listen(8000, "127.0.0.1", () =>
  console.info("Exposed on http://127.0.0.1:8000")
);

The onlyETag endpoint accepts an optional query parameter tag. If present, it is used to filter the retrieved posts.
Thus, the template is loaded in memory.

src/views/posts.html



<html>
  <body>
    <h1>Tag: %TAG%</h1>
    <ul>%POSTS%</ul>
    <form method="GET">
      <input type="text" name="tag" id="tag" autofocus />
      <input type="submit" value="filter" />
    </form>
  </body>
</html>

When submitted, the form uses as action the current route (/only-etag) appending as query parameter the name attribute. For example, typing code in the input and submitting the form would result in GET /only-etag?name=code), No JavaScript required!

And the posts are injected into it.



import * as db from "./db.mjs";
import { getURL, getView, createETag } from "./utils.mjs";

export default async (req, res) => {
  res.setHeader("Content-Type", "text/html");

  const tag = getURL(req).searchParams.get("tag");
  const posts = await db.GET(tag);

  let [html, errView] = await getView("posts");
  if (errView) {
    res.writeHead(500).end("Internal Server Error");
    return;
  }

  html = html.replace("%TAG%", tag ?? "all");
  html = html.replace(
    "%POSTS%",
    posts.map((post) => `<li>${post.title}</li>`).join("\n")
  );

  res.setHeader("ETag", createETag(html));

  res.writeHead(200).end(html);
};

As you notice, before dispatching the response, the ETag is generated and included under the ETag Response header.



// src/utils.mjs
import { createHash } from "crypto";

export function createETag(resource) {
  return createHash("md5").update(resource).digest("hex");
}

Changing the content of the resource changes the Entity Tag.

Performing the request from the browser you can inspect the Response Headers via the Network tab of the Developer Tools.



HTTP/1.1 200 OK
Content-Type: text/html
ETag: 4775245bd90ebbda2a81ccdd84da72b3

If you refresh the page, you'll notice the browser adding the If-None-Match header to the request. The value corresponds of course to the one it received before.



GET /only-etag HTTP/1.1
If-None-Match: 4775245bd90ebbda2a81ccdd84da72b3

As seen in the previous posts per Last-Modified and If-Modified-Since, let's instruct the endpoint to deal with If-None-Match.



export default async (req, res) => {
  res.setHeader("Content-Type", "text/html");

  retrieve (filtered) posts; // as seen before
  load html; // as seen before
  fill template; // as seen before

  const etag = createETag(html);
  res.setHeader("ETag", etag);
  const ifNoneMatch = new Headers(req.headers).get("If-None-Match");
  if (ifNoneMatch === etag) {
    res.writeHead(304).end();
    return;
  }

  res.writeHead(200).end(html);
};

Indeed, subsequent requests on the same resource return 304 Not Modified, instructing the browser to use previously stored resources. Let's request:

/only-etag three times in a row;
/only-etag?tag=code twice;
/only-etag?tag=animals twice;
/only-etag, without tag, once again;

The presence of the query parameter determines a change in response, thus in ETag.

Notice the last one. It does not matter that there have been other requests in the meantime; the browser keeps a map of requests (including the query parameters) and ETags.

Detect entity change

To further underscore the significance of this feature, let's add a new post to the DB from another process.



curl -X POST http://127.0.0.1:8000/db \
-d '{ "title": "ETag", "tag": "code" }'

And request again /only-etag?tag=code.

After the db has been updated, the same request generated a different ETag. Thus, the server sent the client a new version of the resource, with a newly generated ETag. Subsequent requests will fall back to the expected behavior.

The same happens if we modify an element of the response.



curl -X PATCH http://127.0.0.1:8000/db \
-d '{ "title": "Amazing Caching", "index": 0 }'

While ETag is a more versatile solution, applicable regardless of the data type since it is content-based, it should be considered that the server must still retrieve and assemble the response, then pass it into the hashing function and compare it with the received value.

Thanks to another header, Cache-Control, it is possible to optimize the number of requests the server has to process.