Random number generators are everywhere in computing. From hardware, to operating systems, kernels, games, you name it. You often hear how pseudo-random number generators (PRNG) like those typically used in programming languages (like JavaScript's Math.random for example) are not cryptographically secure (CSPRNG). But what does that mean?
First you need to understand how PRNG's work. As Yang Guo elegantly puts it...
"As with all PRNGs, the random number is derived from an internal state, which is altered by a fixed algorithm for every new random number. So for a given initial state, the sequence of random numbers is deterministic. Since the bit size n of the internal state is limited, the numbers that a PRNG generates will eventually repeat themselves."
In his quote, deterministic means that given some specific input you can always expect the same output. You can almost think of it as a pure function.
In the V8 implementation of the xorshift128+
algorithm you can see the underlying logic of Math.random()
.
uint64_t state0 = 1;
uint64_t state1 = 2;
uint64_t xorshift128plus() {
uint64_t s1 = state0;
uint64_t s0 = state1;
state0 = s0;
s1 ^= s1 << 23;
s1 ^= s1 >> 17;
s1 ^= s0;
s1 ^= s0 >> 26;
state1 = s1;
return state0 + state1;
}
Even if you don't know C this example should be pretty clear about two things.
- The state is global.
-
n
can never be larger than the largest bit size the platform supports. For Cuint64_t
that would be9223372036854775807
.
Cryptographically Secure
Now it's very easy to see why a PRNG would be a non-starter for cryptographic applications. Crytographically secure means that a random number you receive has been generated using a non-deterministic algorithm/function. If the previous example won't work, then how do we create these CSPRNG's?
Enter entropy. Entropy has somewhat ambiguous meaning in our field; it can describe how software projects begin to rot as more complexity is introduced over time. In the context of this article however, it essentially is the environmental noise in device drivers.
In Linux for example, the noise is captured into two files /dev/random
and /dev/urandom
. These files can then be accessed and consumed by the CSPRNG. When this data is consumed it is removed from the pool, that way the next number generated isn't using the same seed. These files naturally replenish themselves but it is important to note that since these are pools after all, they can be depleted if there has been no activity to replenish them.
I don't want to divert into a whole deep dive into these pools but an interesting side note is that on Linux servers this can be a problem, since there are usually no mice/keyboards directly hooked up to the machine, thus reducing the amount of noise it can generate.
Now you may be wondering, if Math.random
is insecure why don't we just use the Cryptography API
for everything instead? The answer is, like everything, there are trade-offs and in most cases you don't need a CSPRNG.
The primary trade-off is that this method is slower than using a PRNG, and in most cases that's usually what we need anyway. Here is a benchmark showing the stark differences in the speed of these functions.
The MDN docs also make it perfectly clear that unless you know exactly what you are needing and how to do it, you probably shouldn't be using the Cryptography API as it is easy to get wrong. That shouldn't stop you from experimenting and learning about it though, just be careful!
Conclusion
Really there is no conclusion here, I had randomly (pun intended) stumbled across entropy and found this whole dive into it really interesting. I'm sure there is something I missed or maybe got some details wrong so your feedback is welcome.
Special shout-out to my co-worker David Pagan for the edits/feedback.
Top comments (0)