2008-05-23

How to Prevent the Random Number Bug in Debian OpenSSL in Other Implementations

As probably the entire hacker community has heard by now, there was a bug recently discovered in Debian's OpenSSL implementation that crippled the random number generator. For background, see Schneier's Coverage, Slashdot's Coverage, Debian's Announcement, Ubuntu's Announcement, and a Cartoon.

On the surface, this just looks like a stupid mistake by one Debian maintainer. But if you look at the details, it's not that obvious. Here one of the two lines in question, within md_rand.c
MD_Update(&m,buf,j); /* purify complains */
This function seeds the cryptographically secure pseudo-random number generator, which then generates important things like cryptographic keys. The security of a cryptographic key is solely in the difficulty of an attacker to guess the value (like a house key's tumbler positions), and if it is predictable, there is no security. The maintainer removed this line because the Purify and Valgrind tools complained about uninitialized data.

Truthfully, if I were in the same position as a maintainer, there's a good chance I might have commented out these lines too. Code analysis tools are very useful in helping to maintain high code quality, and crippling these tools also has consequences. The right action is not always obvious, and when you go through hundreds of lines of code it's easy to forget the significance of a single line like this.

The fundamental issue with random number generators (RNGs) is that they are infamously difficult to test. A standard software regression test takes a known input to a program and checks for a known output. RNGs aren't like that -- at least when used with good seeds. A good seed never repeats and cannot be tested against known answers. Instead, you have to perform statistical tests on several samples from the seed source (examples: DIEHARD, NIST's RNG suite).

This problem is much more widespread than people think. This is also a very common problem in embedded systems. Many instantiations of SSL don't properly seed their RNG. This won't cause the system to fail, testers won't ever catch it, and customers won't complaint. So from the vendor perspective, there's really no incentive to make it work. Most of the engineers adding OpenSSL don't know much about cryptography, and often won't know to or even bother to hook up a seed. Some systems don't have a good way to generate this seed.

Even NIST' Computer Security Division (the owner of FIPS 140-2, a major cryptographic standard for government agencies) has mostly washed their hands of this problem. FIPS 140-2 used to include statistical tests on the entropy (i.e, 'randomness') source used to create the seed, but now the only requirement is that the vendor justify a certain entropy level.

To solve this problem, I think that OpenSSL (and other SSL implementations) needs to add some kind of sanity check to the seed to make sure this mistake doesn't happen again. Here's a rough outline of this test:
  1. During OpenSSL's initialization, immediately collect several samples from the seed source. The number depends on the constraints of the system, but NIST's old FIPS 140-2 statistical test collected 20,000 bytes, which is a reasonable number.
  2. Run simple statistical tests on these samples (see FIPS 140-2 for an example) and make sure the entropy source is reasonable.
  3. Store the first 100 bytes or so of this sample set in non-volatile memory, (e.g, hard disk, flash), and keep a history of several thousand of these initial samples. Discard the other samples (don't use them as part of the real seed)
  4. Run another statistical test on the series of first samples. If there is a correlation between these samples (i.e,. the values tend to be the sample after initialization), then fail with a big obnoxious error that you'd hope no distribution maintainer or embedded software engineer would miss.
This test would have caught the bug in questions because the same seed would likely have occurred across power cycles. The default for this seed is the process ID, which by default is at most 32,768. According to the birthday paradox, you will on average see a duplicate random number in the range 1 to N after roughly the square root of N samples. In this case, the square root of 32,768 is 181. Someone would have seen the horrible error message by then.

Caveat: Make sure that the files that store these seeds are properly protect from access and modification. If the entropy source is poor, it's possible to leak information about the rest of the system, or even give hints as to what the subsequent seed will be as used by the real random number generator.

When it comes to security, you really can't rely on people to catch these kinds of mistakes through code reviews. You need to have good tools to automatically catch this. Unfortunately, making the tools is difficult, which is why we still mostly rely on code reviews.

Although code reviews are better than nothing, and in this case, it sounds like even a code review would have helped...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.