Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Byte Order Mark


 Contents
Unicode
series
Unicode
UCS
UTF-7
UTF-8
UTF-16
UTF-32
SCSU
Punycode
Bi-directional text
BOM
Han unification
Unicode and HTML

A Byte Order Mark (BOM) is the character at code point FEFF (ZERO-WIDTH NO-BREAK SPACE), when that character is used to denote the Endianness of an encoded string of UCS/ Unicode characters.

A BOM can be used to indicate that unlabeled text is UTF-16 or UTF-8 encoded, as well as indicating the byte-order of UTF-16 text, whether labeled or not.

In UTF-16, a BOM is expressed as the 8-bit byte sequence FE FF at the beginning of the encoded string, to indicate that the encoded characters that follow it use big-endian byte order; or it is expressed as the byte sequence FF FE to indicate little-endian order.

UTF-8 text can also use a BOM, although this is rare, since UTF-8 prescribes a fixed byte order, and since UTF-8 is often assumed or implicit, so it doesn't need a signature. The UTF-8 representation of the BOM is the byte sequence EF BB BF.

1 Representations of Byte Order Marks by Encoding

2 External links

1 See also

Unicode

Read more »

Non User