preload
Oct 19

How long are your passwords?  Let’s say eight characters is the length.  How many possibilities are there?  Well, you can use any single-byte printable character (though I once used an escape key in an RS/6000 password; it worked, but isn’t a good idea everywhere), and any length from one to eight.

Printable ASCII is roughly codes 32 through 126, or 126-32+1 = 95 characters.  There are 95 passwords of length one, $95\times 95$ passwords of length two, $95^3$ passwords of length three, etc.  This gives a grand total of:

\[\sum_{i=1}^8 95^i = 6,704,780,954,517,120\]

This is a lot of passwords.  A lot.  That’s well over 6 quadrillion passwords.  It includes the passwords ~ and ~~~~~~~~.

Not all those are available for use in most cases.  First, you typically have to do the following.

  1. Choose a lower-case letter.  26 of those.
  2. Choose an upper-case letter.  26 of those.
  3. Choose a digit.  10 of those.
  4. Choose punctuation.  Let’s call that “everything else” and say that space is punctuation.  Thus there are $95-26-26-10 = 33$ punctuation characters.

So that limits our choice on four characters.  Let’s try again!  Now there is a grand total of:

\[26\times 26\times 10\times 33\times \sum_{i=0}^4 95^i = 223,080\times 82,317,121 = 18,363,303,352,680\]

That’s over 18 trillion passwords.  That’s actually way, way down from the full potential!

How much memory is required to store every single stinkin’ password’s crypt, MD5, etc.  Well, if we do it stupidly (i.e., ignore collisions, ignore smart data structures, etc.), then for MD5 we need 16 bytes for each.  That’s approximately $16\times 17~\mbox{Tib} \approx 272~\mbox{Tib}$.  That isn’t as much as you might think.  A single terabyte drive costs about \$100 (US) right now, and we’d need 272 of them.  Let’s say 300, just to be safe.  How much is that?  $300\times \$100 = \$30,000$.  Overhead, installation, maintenance, etc., all add to that, but not that much.  In short, it is very cost effective.

Computing them all?  We’d need to compute a lot of hashes.  Let’s pick on MD5 (even though it is almost cracked).  Suppose it takes us one millisecond to compute one hash.  We still need to compute 18.4 trillion hashes.  Serially, that’s $18.4\times 10^{12}~\mbox{passwords}\cdot 1\times 10^{-3}~\frac{\mbox{sec}}{\mbox{password}} = 1.84\times 10^{10}~\mbox{sec}$.  That’s just over 583 years.  Whew!  We’re all completely safe.

Not even close.  Suppose I have a 1024 node machine (small by today’s standards).  That cuts three orders of magnitude from the result, since every hash can be computed independently of the others (though smart storage would cause some serialization).  Now we’re down to under 6 years.  And how fast can we compute MD5 hashes???

Smart folks are making MD5 very fast.  See http://www.faqs.org/rfcs/rfc1810.html.  That’s from 1995.  Well, md5 -t, on my little 32-bit Intel MacBook, reports a speed of 313,406,912 bytes per second.  Eight bytes should take just about $2.55\times 10^{-8}~\mbox{seconds}$.  Yes, there’s overhead at the end of each hash, but I can counter with the following: I can hard code the eight byte limit to get a more efficient algorithm.  I can do a smart iteration with single-byte modifications and hash updates instead of a whole new hash computation.  Plus, there are much faster processors and, I feel confident, faster implementations.  I’ve got some confidence in my machine’s performance as an upper bound.  So, back to our 1024 node machine.

\[\frac{18.4\times 10^{12}~\mbox{passwords}}{1024~\mbox{node}} \cdot 2.55\times 10^{-8}~\frac{\mbox{seconds}}{\mbox{password}} = 458 \frac{\mbox{seconds}}{\mbox{node}}\]

That’s under eight minutes.  I can’t even manage to make coffee in that time.  Even assuming we add two orders of magnitude for storage and overhead, that’s just 800 minutes, or a few days.

I’d be surprised if there isn’t a database somewhere of all 8-byte (or shorter) passwords and their crypt, MD5, SHA1, and what-have-you.  That is, I expect 8-byte passwords have been solved.  Think about that next time you log in to a site on the web, secure in the knowledge that your password is being sent encrypted.

Who could do this?  If I were highly motivated, I could.  I’d do better data reduction (to save on data warehousing) and maybe even ship everything up to be computed on one of the “clouds” at Google, Dell, or Amazon.  If I can do this sitting in my house at my laptop…

  • Share/Bookmark

2 Responses to “Eight Character Passwords”

  1. Doug Says:

    I think it would be hard to parallelize MD5.
    http://en.wikipedia.org/wiki/Md5

    Hard, but not impossible. I think that forensics guys are looking at this, since they have to run MD5’s for hard drives, and that takes time. If you stripe them then you can at least parallelize the MD5’s of the stripes. I guess.

    But yes, I’m sure there are some agencies that have exactly such a database. I hadn’t really thought about how easy it was until I read your post. Of course they’ve done this. That’s just walking-around money.

  2. stacy Says:

    Doug: MD5 has been parallelized. Just google parallel MD5. Also, have a look here:
    http://www.isi.edu/touch/pubs/sigcomm95.html

Leave a Reply