[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: pwgen: non-uniform distribution of passwords
On Thu, Jan 19, 2012 at 11:34:12PM +0400, Solar Designer wrote:
> $ ./pwgen -1cn 8 1000000000 | dd obs=10M > 1g
> $ time ~/john/john-1.7.9-jumbo-5/run/unique -v -mem=25 1gu < 1g
> Total lines read 1000000000 Unique lines written 697066573
Here's some further analysis of the 1 billion sample used as a training
set along with a separate 1 million sample used as a test set:
Applying the 697 million unique passwords (from the 1 billion sample
above) as a wordlist (6 GB file size) to crack another 1 million of
pwgen'ed passwords cracks 418168 of them (41.8%). For a uniform
distribution (which is not the case), this would correspond to total
keyspace size of about 1.67 billion passwords (between 30 and 31 bits).
Focusing on more frequent pwgen'ed passwords only:
The most common passwords in my 1 billion sample happen to be, prefixed
by number of occurrences:
the next one is seen 120 times. Overall, there are 3452 unique
passwords with 100 occurrences or more (in 1 billion generated).
Taking these 3452 as a wordlist cracks 284 passwords in the separate
1 million sample. This is 0.0284%. However, 3452 is only %0.0002 of
the 1.67 billion estimate for the keyspace size that we arrived at
above. Hence, the distribution is non-uniform, and our speedup from
exploiting this property is at least 137x on this test. (284 / 2 is
obviously 142, but I used more precise numbers here.)
Checking this another way, the keyspace size estimate assuming uniform
distribution would be only 12 million based on the test above - a lot
lower than the previous estimate. This similarly confirms that the
distribution is non-uniform.
Top 1 million unique passwords from my 1 billion training set cracks
37149 in the test set (3.7%). The corresponding uniform keyspace size
estimate is 27 million.
Top 10 million unique passwords cracks 145179 (14.5%). The keyspace
size estimate is 69 million.
Top 100 million unique passwords cracks 262693 (26.3%). The keyspace
size estimate is 381 million.
Finally, only 115339574 unique passwords are seen in the 1 billion
sample more than once. (This is less than 1000-697 = 303 million
because many passwords are seen more than 2 times each.) Using them as
a wordlist cracks 276382 (27.6%). The keyspace size estimate is 417
Chances are that I won't spend further time on this, although a possible
project would be to create a program that would output all or top N of
pwgen'ed passwords using exact probabilities (based on analysis of
pwgen's source code or/and behavior of pwgen with non-random inputs
rather than based on normal pwgen invocations like I did so far, which
only provides estimates). This would result in more efficient attacks
(more passwords in the test set cracked per candidate passwords tested).