| • Science | • People | • Locations | • Timeline |
It groups English words into sets of synonyms called synsets, provides short definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style licence and can be downloaded and used freely. The database can also be browsed online.
WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985. Over the years, the project received about $3 million of funding, mainly from government agencies interested in machine translation.
As of 2003, the database contains about 140,000 words organized in over 110,000 synsets for a total of 195,000 word-sense pairs; in compressed form, it is about 12 megabytes large.
WordNet distinguishes between nouns, verbs, adjectives and adverbs on the assumption that these are stored differently in the human brain. Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as "car pool"); words typically participate in several synsets. The meaning of the synsets is further clarified with short definining glosses. A typical example synset with gloss is
Every synset is connected to other synsets via a number of relations. These relation vary based on the type of word:
WordNet also provides the polysemy count of a word: the number of synsets that contain the word. If a word participates in several synsets (i.e. has several senses), then typically some senses are much more common than others. WordNet quantifies this by the frequency score: in several sample texts all words were semantically tagged with the corresponding synset, and then it was counted how often a word appeared in a specific sense.
The database's interface is able to deduce the root form of a word from the user's input; only the root form is stored in the database.