The Problem
The industry standard approach for a token authentication system is to use a JWT with a limited lifetime - the "id token" - with service endpoints, forcing a stronger reauthentication with a central auth system periodically.
These id tokens are verified "offline" - meaning the central auth system is not contacted. This is a useful property as it allows a significantly improved response time for the endpoint. If the central auth system is located only on one side of the Atlantic, for example, then online verification would lead to around 200ms of additional service endpoint latency on the other side.
If these tokens are leaked, however, this means there is no way to signal this - the id token remains irrevocably valid until it expires. Adding an online revocation check would eradicate all the benefits of using the system in the first place.
A Solution
Firstly, every id token should have a unique identifier associated with it. There is no harm in these being eventually reused, as long as they are reused only well after the original token has been expired. I call this the revocation id.
If a token is to be marked as revoked - either automatically via taint failures or explicitly - this revocation id is added to a list (perhaps a database table).
A new central auth endpoint can now check if a token has been revoked - an online revocation check that we want to avoid if at all possible.
Next, I propose adding a shared Bloom filter to the system, maintained by the central auth system. It can be broadcast out on change to service entities via publish-subscribe (eg, Redis) or simply fetched on a short cadence (say, once a minute, or even less).
A Bloom filter is a probabalistic mechaninism for optimising the decision of whether a value is in a set, so by distributing just the Bloom filter values, this obviates the need to check most tokens in an online revocation check.
Bloom filters work by checking if the hash of some value has been binary ORed into a value - multiple hashes are used to reduce the probability of a false positive. For example, if we used SHA-256 and SHA-512, we could check like this:
# Pseudocode, really
def in_bloom(bloom, revocation_id):
if sha256(revocation_id) ^ bloom.sha256 != sha256(revocation_id):
return False
if sha512(revocation_id) ^ bloom.sha512 != sha512(revocation_id):
return False
return true
I propose using, as the hash algorithms, a set of HMAC-SHA-256. By using a single algorithm, adding an additional HMAC key is trivial in code, and allows this to be tweaked by operational experience.
This means the Bloom filter will be a short array of 256-bit long binary values. One might be sufficient for smaller systems; I don't think it'd need to be higher than four ever, though measure measure measure.
Finally, to reduce computational load, I suggest calculating the HMAC-SHA-256 values of the revocation id and placing this in the JWT; this essentially trades CPU time for JWT size.
Result
At this point, you have a recovable token with a very low revocation latency, but effective offline revocation checks (and the advantages in latency this gives you).
Top comments (0)