Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Unicode


 

Unicode
series
Unicode
UCS
UTF-7
UTF-8
UTF-16
UTF-32
SCSU
Punycode
Bi-directional text
BOM
Han unification
Unicode and HTML

In computing, Unicode is the international standard whose goal is to provide the means to encode the text of every document people want to store in computers. This includes all scripts still in active use today, many scripts known only by scholars, and symbols which do not strictly represent scripts, like mathematics, linguistics and APL.

The creation of Unicode is an ambitious project to replace existing character sets, many of which are short in size and are problematic in multilingual environments. Despite technical problems and limitations and criticism on process, today Unicode is considered the most complete character set and one of the largest, and has become the dominant encoding scheme in internationalization of software and multilingual environments. Many recent standards such as XML and system software like operating systems have adopted Unicode as an underlying scheme to represent text. Still, Unicode is not used to write documents as widely as anticipated. Many documents stored on the computer, for instance, are still represented in other character sets.

To address the short coming, Unicode is being revised periodically with the addition of more characters and increase in the size of characters potentially represented in unicode.

1 Origin and development

It is the explicit aim of Unicode to transcend the limitations of traditional character encodings such as those defined by the ISO 8859 standard, which are used in the various countries of the world, but are largely incompatible with each other. One problem with traditional character encodings is that they allow for bilingual computer processing (usually Roman characters and the local language), but not for multilingual computer processing (computer processing of arbitrary languages mixed with each other).

Unicode in intent encodes the underlying characters and not variant glyphs for such characters. In the case of Chinese characters, this sometimes leads to controversies over what is the underlying character and what is the variant glyph (see Han unification).

Unicode aims to provide a code point for each character, but not for each glyph—or to put this in more common (but less accurate) terms, Unicode aims to provide a unique number for each letter, without regard to typographic variations used by printers.

This simple aim is greatly complicated by another aim, which is to provide lossless conversion amongst different existing encodings in order to ease the transition.

The Unicode standard also includes a number of related items, such as character properties, text normalisation forms, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts).

In 1997 a proposal was made by Michael Everson to encode the characters of the Klingon language in Plane 1 of ISO/IEC 10646-2. The proposal was rejected in 2001 as "inappropriate for encoding" — not because the proposal was technically faulty, but because users of Klingon normally read and write and exchange data in LatinAlternative meanings: See Latin (disambiguation Latin was the language originally spoken in the region around Rome called Latium. It gained great importance as the formal language of the Roman Empire. All Romance languages are descended from Latin, and ma transliteration. The elvishThe Elves (always spelt such, never "Elfs") are one of the races that appear in the work of J. Their complex history is described in full only in The Silmarillion and it is mentioned tangentially in The Lord of the Rings''. Elves were the first inhabitant scripts TengwarTolkien (in English) Tengwar is an artificial script which was invented by J. In his works, the Tengwar script, supposedly invented by Feanor, was used to write a number of the languages of Middle-earth, including Quenya and Sindarin. However it can also and CirthThe Return of the King''. Some of the cirth had different values for the Elvish and Dwarvish languages and some were used in only one system or the other. The Cirth ( Runes ) are the letters of an artificial script which was invented by J. Tolkien for the from J. R. R. TolkienHe is wearing a WWI-era British Army uniform in this photograph. John Ronald Reuel Tolkien ( January 3, 1892 September 2, 1973) was the author of The Hobbit and its sequel The Lord of the Rings his most famous work. A former pupil of King Edward's School,'s Middle-earthMiddle-earth is the name for the lands on J. Tolkien's fictional ancient Earth where most of the tales of his legendarium take place. Middle-earth is a literal translation of the Old Norse mythological term Midgard, referring to this world, the realm of h setting were proposed for inclusion in Plane 1 in 19931993 is a common year starting on Friday and marked the Beginning of the International Decade to Combat Racism and Racial Discrimination (1993-2003 Events January January 1 Czechoslovakia divides. Establishment of independent Slovakia and Czech Republic.. The draft was withdrawn to incorporate changes suggested by Tolkienists, and is as of 2004 still under consideration.



Read more »

Non User