| • Science | • People | • Locations | • Timeline |
More precisely, Benford's Law states that the leading digit n (n = 1, ..., 9) occurs with probability log10(n + 1) − log10(n), or
| Leading digit | Probability |
|---|---|
| 1 | 30.1 % |
| 2 | 17.6 % |
| 3 | 12.5 % |
| 4 | 9.7 % |
| 5 | 7.9 % |
| 6 | 6.7 % |
| 7 | 5.8 % |
| 8 | 5.1 % |
| 9 | 4.6 % |
One can also formulate a law for the first two digits: the probability that the first two-digit block is equal to n (n = 10, ..., 99) is log10(n+1) − log10(n), and similarly for three-blocks without leading zeros and longer blocks.
That in general the leading digit 1 should be more common than the other digits can be understood as follows: start counting from 1: 1, 2, 3, ... As you reach 9, every digit will have been equally likely. But then, from 10 to 19, you only have the leading digit 1, so 1 gets a huge head start. Only when you reach 99 will all digits be equally likely again. But then 1 gets another huge head start from 100 to 199. And so it continues: 1 has always a lead, except for very rare exceptions (9, 99, 999, 9999, ...). This is not particularly satisfactory as an explanation, unless some probability of stopping counting at some point is also included.
Perhaps somewhat more precisely, suppose (capital) X is a random variable whose probability of being equal to any positive integer (lower-case) x is a constant times x−s, where s > 1. The aforementioned "constant" must then be 1/ζ(s), where ζ is the Riemann zeta function (see zeta distribution). The probability that the first digit of X is n approaches log10(n + 1) − log10(n) as s approaches 1.
The precise form of Benford's law can be explained if one assumes that the logarithms of the numbers are uniformly distributed; this means that a number is for instance just as likely to be between 100 and 1000 (logarithm between 2 and 3) as it is between 10,000 and 100,000 (logarithm between 4 and 5). For many sets of numbers, especially ones that grow exponentially such as incomes and stock prices, this is a reasonable assumption.
Another explanation is that if a distribution of first digits exists, it should be scale invariant. For example the first (non-zero) digit of the lengths or distances of objects should have the same distribution whether the unit of measurement is planck lengths, inches, feet, yards, metres, miles, light yearA light year abbreviated ly is the distance light travels in one year: roughly 9. 46 × 1012 kilometres (9. 46 petametres, or about 5. 88 × 1012 miles). More specifically, a light year is defined as the distance that a photon would travel, in free space ans, or anything else. But, for example, there are three feet in a yard, so the probability that the first digit of a length (e.g. in yards) is 1 must be the same as the probability that the first digit of a length (e.g. in feet) starts 3, 4, 5, 6, 7, or 8. Applying this to all possible measurement scales gives a logarithmic distribution, and combined with the fact that log1(1)=0 and log10(1)=1 gives Benford's law.
Note that for numbers drawn from many distributions, for example IQ scores, human heights or other variables following normal distributionProbability density function of Gaussian distribution (bell curve). The normal distribution is an extremely important probability distribution in many fields. It is also called the Gaussian distribution especially in physics and engineering. It is actualls, the law is not valid. However, if one "mixes" number from those distributions, for example by taking numbers from newspaper articles, Benford's law reappears. This can be proven mathematically: if one repeatedly "randomly" chooses a probability distributionIn mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. In technical terms, a probability distribution is a probability measure whose domain is the Borel algebra and then randomly chooses a number according to that distribution, the resulting list of numbers will obey Benford's law.