| • Science | • People | • Locations | • Timeline |
A technique called Pollard's rho algorithm (a cycle detection algorithm) is used to try and find a collision for MD5. The algorithm can be described by analogy with a random walk. Using the principle that any function with a finite number of possible outputs placed in a feedback loop will cycle, one can use a relatively small amount of memory to store outputs with particular structures and use them as "markers" to better detect when a marker has been "passed" before. These markers are called distinguished points, the point where two inputs produce the same output is called a collision point.
The expected time to find a collision is not equal to where is the number of bits in the digest output. It is in fact , where is the number of function outputs collected.
For this project, the probability of success after MD5 computations can be approximated by: .
The expected number of computations required to produce a collision in the 128-bit MD5 message digest function is thus:
To give some perspective to this, using Viginia Tech's System X with a max performance of 10 Teraflops, it would take approximately seconds or about 3 weeks. Or for commodity processors at 2 gigaflops it would take 5,000 machines approximately the same amount of time.