1. Introduction
[IDNA] describes an architecture for supporting internationalized domain names. Labels containing non-ASCII characters can be represented by ACE labels, which begin with a special ACE prefix and contain only ASCII characters. The remainder of the label after the prefix is a Punycode encoding of a Unicode string satisfying certain constraints. For the details of the prefix and constraints, see [IDNA] and [NAMEPREP]. Punycode is an instance of a more general algorithm called Bootstring, which allows strings composed from a small set of "basic" code points to uniquely represent any string of code points drawn from a larger set. Punycode is Bootstring with particular parameter values appropriate for IDNA.
1.1. Features
Bootstring has been designed to have the following features: * Completeness: Every extended string (sequence of arbitrary code points) can be represented by a basic string (sequence of basic code points). Restrictions on what strings are allowed, and on length, can be imposed by higher layers. * Uniqueness: There is at most one basic string that represents a given extended string. * Reversibility: Any extended string mapped to a basic string can be recovered from that basic string. * Efficient encoding: The ratio of basic string length to extended string length is small. This is important in the context of domain names because RFC 1034std13 [RFC1034] restricts the length of a domain label to 63 characters. * Simplicity: The encoding and decoding algorithms are reasonably simple to implement. The goals of efficiency and simplicity are at odds; Bootstring aims at a good balance between them. * Readability: Basic code points appearing in the extended string are represented as themselves in the basic string (although the main purpose is to improve efficiency, not readability). Punycode can also support an additional feature that is not used by the ToASCII and ToUnicode operations of [IDNA]. When extended strings are case-folded prior to encoding, the basic string can use mixed case to tell how to convert the folded string into a mixed-case string. See appendix A "Mixed-case annotation".
1.2. Interaction of protocol parts
Punycode is used by the IDNA protocol [IDNA] for converting domain labels into ASCII; it is not designed for any other purpose. It is explicitly not designed for processing arbitrary free text.
