| • Science | • People | • Locations | • Timeline |
is a power-law distribution on ranked data , named after the Harvard linguistics professor George Kingsley Zipf ( 1902- 1950) who suggested regularity in texts, and the mathematician Benoit Mandelbrot (born November 20, 1924), who generalized it.
The distribution of words ranked by their frequency in a random
corpusIn law a corpus ( Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis. In linguistics, corpus (plural corpora is a large and structured set of texts (now usually electronically stored and processed). A corpus may conta of textIn language, text is something that contains words to express something. The term usually has broader meaning. In linguistics text enters at least two types of contrasts. One is that between system and text, system being understood as the ability of the s is generally a power-law distribution, knownas Zipf's lawOriginally the term Zipf's law meant the observation of Harvard linguist George Kingsley Zipf ( SAMPA: [zIf]) that the frequency of use of the n''th-most-frequently-used word in any natural language is approximately inversely proportional to n''. Zipf's l.
If one plots the frequency rank of words contained in a large
corpusIn law a corpus ( Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis. In linguistics, corpus (plural corpora is a large and structured set of texts (now usually electronically stored and processed). A corpus may conta of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution,with exponent close to one (but see Gelbukh and Sidoro 2001).