| • Science | • People | • Locations | • Timeline |
It was (and still is) daring to substitute statistical estimation for a system of competitive rewards. Rating systems for many sports award points in accordance with subjective evaluations of the greatness of certain achievements. For example, winning an important golf tournament might be worth five times as many rating points as winning a lesser tournament, and taking third place might be worth half the points of taking first place, etc.
A statistical endeavor, in contrast, postulates a model of some aspect of reality, and seeks to mathematically estimate, based on observation, the variables in that model. Competitors may still feel that they are being rewarded and punished for good and bad results, but the lofty claim of a statistical system is that it estimates real unknowns, and thus mirrors some hidden truth.
Élo's specific assumptions about the nature of reality are open to doubt, but chess fans praise the accuracy of ELO ratings with a fervor unheard of in other sports. For example, professional tennis ratings are purely rewards based on tournament results. (Statistically rating tennis players would be complicated by variables chess doesn't have, particularly the playing surface, but the rating organizations don't even try for predictive accuracy.) As a result, it is routine for tennis fans to consider the higher-rated player an underdog in a given match. In chess the higher-rated player is regarded as the favorite in almost every case. Thus as of 2004, Garry KasparovGarry Kimovich Kasparov (born April 13, 1963) is a chess grandmaster and the strongest (highest rated on the FIDE October 2004 list at 2813) chess player in the world. He was classical world chess champion from 1985 until 2000. He was born as Garri Weinst, although he holds neither of the two World Championship titles, is universally regarded as the best chess player by virtue of having the highest rating.
Élo's central assumption was that the chess "performance" of each player in each game is a normally distributedProbability density function of Gaussian distribution (bell curve). The normal distribution is an extremely important probability distribution in many fields. It is also called the Gaussian distribution especially in physics and engineering. It is actuall random variable. Although a player might perform significantly better or worse from one game to the next, Élo assumed that the mean value of the performances of any given player changes only slowly over time. Élo thought of the mean of a player's performance random variable as that player's true skill.
A further assumption is necessary, because chess performance in the above sense is still not measurable. One cannot look at a sequence of moves and say, "That performance is 2039." Performance can only be inferred from wins, draws and losses. Therefore, if a player wins a game, he is assumed to have performed at a higher level than his opponent for that game. Conversely if he loses, he is assumed to have performed at a lower level. If the game is a draw, the two players are assumed to have performed at nearly the same level.
Élo waved his hands at several details of his model. For example, he did not specify exactly how close two performances ought to be to result in a draw rather than a decisive result. And while he thought it likely that each player might have a different standard deviationIn probability and statistics, the standard deviation is the most commonly used measure of statistical dispersion. Standard deviation is defined as the square root of the variance. It is defined this way in order to give us a measure of dispersion that is to his performance, he made a simplifying assumption to the contrary.
To simplify computation even further, Élo proposed a straightforward method of estimating the variables in his model i.e. the true skill of each player. One could calculate relatively easily, from tables, how many games a player is expected to win based on a comparison of his rating to the ratings of his opponents. If a player won more games than he was expected to win, his rating would be adjusted upward, while if he won fewer games than expected his rating would be adjusted downward. Moreover, that adjustment was to be in exact linear proportion to the number of wins by which the player had exceded or fallen short of his expected number of wins.
From a modern perspective, Élo's simplifying assumptions are not necessary, because computing power is inexpensive and widely available. Moreover, even within the simplified model, more efficient estimation techniques are well known. Several people, most notably Mark Glickman, have proposed using more sophisticated statistical machinery to estimate the same variables. On the other hand, the computational simplicity of the ELO system has proved to be one of its greatest assets. With the aid of a pocket calculator, an informed chess competitor can calculate to within one point what his next officially published rating will be, which helps promote a perception that the ratings are fair.