| • Science | • People | • Locations | • Timeline |
| Contents | ||
MD5 is one of a series of message digest algorithms designed by Professor Ronald Rivest of MIT (Rivest, 1994). When analytic work indicated that MD5's predecessor — MD4 — was likely to be insecure, MD5 was designed in 1991 to be a secure replacement; weaknesses were indeed subsequently found in MD4 by Hans Dobbertin .
In 1996, Dobbertin announced a collision of the compression function of MD5 (Dobbertin, 1996). This was not quite an attack on the full MD5 hash function, but it was close enough for cryptographers to recommend switching to a replacement, such as SHA-1 or RIPEMD-160. In August 2004, Chinese researchers found collisions for MD5. It is still unknown how this discovery will affect the widespread use of MD5.
The 128-bit MD5 hashes (also termed message digests) are typically represented as 32-digit hexadecimal numbers. The following demonstrates a 43-byte ASCII input and the corresponding MD5 hash:
Even a small change in the message will (with overwhelming probability) result in a completely different hash, e.g. changing d to c:
The hash of the zero-length string is:
MD5 processes a variable length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks; the message is padded so that its length is divisible by 512. The padding works as follows: first a single bit, 1, is appended to the end of the message. This is followed by as many zeros as are required to bring the length of the message up to 64 bits fewer than a multiple of 512. The remaining bits are filled up with a 64-bit integer representing the length of the original message.
The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit words, denoted A, B, C and D. These are initialised to certain fixed constants. The main algorithm then operates on each 512-bit message block in turn, each block modifying the state. The processing of a message block consists of four similar stages, termed rounds; each round is composed of 16 similar operations based on a non-linear function F, modular addition, and left rotation. Figure 1 illustrates one operation within a round. There are four possible functions F, a different one is used in each round:
Note: Instead of the formulation from the original RFC 1321 shown, the following may be used for improved efficiency:
(0 ≤ i ≤ 15): f := d xor (b and (c xor d)) (16 ≤ i ≤ 31): f := c xor (d and (b xor c))