A humble beginning
I like node servers, I like writing bare servers and I also like Express.js. I consider myself a beginner in Node.JS but I just wanted to write an express middleware that generates ETag headers. ETags are a shiny/new and neat way to invalidate cache.
At first, when I read about ETag headers I thought that having 'strong' and 'weak' variants of it is useless and I came to know that I was wrong. 'strong' ETags represent that the header value is generated based on the bytes of the response body. Which means when I generate an ETag for any given content it needs to be unique for that content, in fancy words it means a strong ETag is a hash of the content which is generated by a collission resistent algorithm.
The flashy code
Capturing the response body is pretty easy from a middleware. It goes something like...
const crypto = require('crypto');
const taggart = opts => {
// standard express style
return (req, res, next) => {
// save methods
const write = res.write;
const end = res.end;
// sha1 ain't that bad
const hash = crypto.createHash('sha1');
// keep track, for content-length
let length = 0;
const onData = (chunk, encoding) => {
// sometimes chunk can be 'undefined'
if (!chunk) {
return;
}
// convert chunk to buffer
chunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk, encoding);
// update hash using chunk data
hash.update(chunk);
length += Buffer.byteLength(chunk, 'utf8');
};
const onEnd = (chunk, encoding) => {
onData(chunk, encoding);
// generate tag
const l = length.toString(16);
const h = hash.digest('hex');
// weak or strong? use length and hash as ETag
const tag = opts.weak ? `W/${l}-${h}` : `${l}-${h}`;
res.setHeader('ETag', tag);
};
// override the default methods
res.write = (...args) => {
onData(...args);
write.apply(res, [...args]);
};
res.end = (...args) => {
onEnd(...args);
end.apply(res, [...args]);
};
next();
};
};
module.exports = taggart;
The vision
What we are doing is that we are hijacking the res.write
and res.end
methods of res
object, which is an instance of http.ServerResponse
. The write
and end
methods of the res
object are used to write data to the response that is sent to the client and they are inherited from Stream
class.
In the beginning we are creating a hash and in the onData
method we are updating the hash using the chunks and getting rid of them(do not store the chunks, they can get pretty huge). We are also keeping track of the size of the response.
A call to the res.end
indicates that the response has ended and now we can finalize the hash in onEnd
function and set it as a header. But there is catch. In HTTP the requests and responses are streamed. For every call to res.write
the partial response is sent to the client when it can be, and the headers are sent in the first chunk. The headers are sent in the first chunk, which happens during the first call to res.write
.
The dead end
If you try to run the above code, you will get a fatal error. If your response is small enough(less than 65535 bytes or so ¯\_(ツ)_/¯), then it can fit in the first chunk. You get the whole data in a single chunk, you update the hash, and set the header in the call to res.end
. Totally works, but only if your responses are less than 64Kb or so.
But I want to send ETags for images and videos which are I'am pretty sure are not less than 64Kb. It would really help to send back a 302
http code for a 2Mb image, right? Due to the streaming nature of the responses we are unable to do this. The hash can be generated only when res.end
is called but by then the headers might have already been sent.
The only way is up
So, now for the compromises:
- We can settle with ETags only for small responses.
- We can generate hashes for static content before hand, maybe in a build process or something, save it in a dictionary and retrieve it later. Hassle.
- We can generate hashe on every request - first prepare the response, generate the hash and then stream the response with the proper header. Clean and practical.
-
Trailers
can be used, I am working on it, it doesn't look promissing, I might be wrong.
The conclusion
We can use modules like etag
which is kind of coupled with send
and which looks like it is intended to be used with server-static
. They are written by the same guy anyway. send
generates 'weak' ETags and it does so based on file stats.
I've come to realize that generating ETags through a module is hard. I has to be hooked into your server, its a low level component. Checking if the content is stale or not is pretty easy anyway.
References
- https://github.com/pillarjs/send/pull/105#issuecomment-198766116
- https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.3
Note: This is my first public blog post ever. Thanks for reading all the way through, hope you like it. I'am a non-native English speaker, I'am working on my skills there.
My so called failed middleware is on GitHub and Iam working on thefix-tagging
branch do not checkout the master branch, its dirty.
Please leave suggestions, it would help me much. Thanks.
Top comments (0)