Josh Duck

Archive for February, 2008

Rainbow Tables

Friday, February 8th, 2008

Rainbox over Vancouver - by Progie

To most of you the term ‘rainbow table’ is probably familiar. You are probably aware that they are used to aid the reversing of one-way hashes, usually when trying to crack a password. I personally think that they are a nifty little hack, and so I’d like to explain a little about how they are implemented.

Back to Square One

When storing a password, any developer worth their salt is going to hash it using a one-way hashing function. This lets them check that a given password is valid, without risking the possibility that someone could steal their user’s passwords plain text passwords *cough*.

Of course, if that was a foolproof method of protecting passwords then this would be a very short post. If you do happen to find yourself in possession of a hashed password and you want to recover the plain text password then you need to figure our how to crack it. You can’t reverse a hash but you can try and search for a text string that produces an identical hash. The obvious method of attacking a password hash is a brute force attack. You can generate a list of every possible password and hash each until you have output that matches you hashed password.

The simplicity of this solution is defeated by the fact that you are going to be trying a lot of combinations to find your target. Say you’re searching for a five character password containing lowercase letters only. A naïve search is going to produce 26^5 possibilities. That’s 11,881,376 tests to find a very trivial password. Even if you are testing thousands of hashes a second it could take hours to find a match. If you’re clever then you’ll prepare a collection of passwords that require cracking so you can process them all at once. One sweep through all possible passwords will be enough to complete your entire collection.

This reuse of generated hashes leads us to the next obvious improvement; what if we save all possible passwords and hashes in a table and just do a quick lookup whenever we have a hash to check? This seems like an ideal solution at first glance, however when you look more deeply you’ll begin to appreciate exactly how much memory this would require. Each password is 5 bytes and its associated MD5 hash is 16 bytes. The minimum storage required for a naïve lookup table is (16 + 5) bytes x 11,881,376 possibilities. That’s 200MB. Don’t forget that we’re talking about 5 letter passwords here. Any increase in complexity (either in length or possible characters) increases the size requirements exponentially. To crack a secure password would require petabytes of storage for our table. Storing and searching that data efficiently becomes a whole new problem (Google has managed to inadvertently solve this problem).

So there you have it, the two problems we face are time and memory. What we need is a compromise. Wouldn’t it be fantastic if we could only store every thousandth hash, and extrapolate from there? The most obvious problem with that is that hashes are evenly distributed. Two passwords that vary by a single letter produce hashes that are as different as passwords that are completely different. So as it becomes difficult to use the hash/plain text passwords pairs we know to crack hashes we aren’t storing a plain text value for. Rainbow tables get around this in an interesting way.

(more…)