The Unicode standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world.

Unicode encoding form

A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence. The Unicode Standard defines three Unicode encoding forms: UTF-8, UTF-16, and UTF-32. * For historical ...

Unicode encoding scheme

A specified byte serialization for a Unicode encoding form, including the specification of the handling of a byte order mark (BOM), if allowed. * For historical reasons, the Unicode encoding schemes ...

Unicode locale data markup language (LDML)

The XML specification for the exchange of locale data, defined by Unicode Technical Standard #35, "Unicode Locale Data Markup Language (LDML)." (See also Unicode Common Locale Data Repository.)

Unicode scalar value

ny Unicode code point except high-surrogate and low-surrogate code points. In other words, the ranges of integers 0 to D7FF16 and E00016 to 10FFFF16 inclusive. * As a result of this definition, the ...

Unicode signature

An implicit marker to identify a file as containing Unicode text in a particular encoding form. An initial byte order mark (BOM) may be used as a Unicode signature.

Unicode Standard Annex (UAX)

An integral part of the Unicode Standard published as a separate document.

Unicode string

A code unit sequence containing code units of a particular Unicode encoding form (whether well-formed or not). * In the rawest form, Unicode strings may be implemented simply as arrays of the ...

