RFC - 4042
UTF-9 and UTF-18 Efficient Transformation Formats of Unicode
| Original: | ftp://ftp.isi.edu/in-notes/rfc4042.txt |
|---|---|
| Authors: | M. Crispin [Panda Programming] |
| Date: | |
| Category: | Informational |
| Referred by: | 0 RFC |
| Refers to: | 6 RFC |
Status
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2005).
Abstract
ISO-10646 defines a large character set called the Universal Character Set (UCS), which encompasses most of the world's writing systems. The same set of codepoints is defined by Unicode, which further defines additional character properties and other implementation details. By policy of the relevant standardization committees, changes to Unicode and amendments and additions to ISO/IEC 646 track each other, so that the character repertoires and code point assignments remain in synchronization.
The current representation formats for Unicode (UTF-7, UTF-8, UTF-16) are not storage and computation efficient on platforms that utilize the 9 bit nonet as a natural storage unit instead of the 8 bit octet.
This document describes a transformation format of Unicode that takes advantage of the nonet so that the format will be storage and computation efficient.
-
prepared by Miloslav Nic
- the founder of Zvon.org and Law-Ref.org
- the head of B.Sc. program Informatics and chemistry [in Czech]
- the founder of Lidem.org - Volby 2006 - parliamentary elections in the Czech Republic [in Czech]
- the chief consultant of the publishing house ICT Press
- and Pavel Srb, a student of B.Sc. program Informatics and chemistry
