Bit
Click on the red underlined text to get to the source
... A number of Internet sites utilize platforms that are not based upon
the traditional 8-bit byte or octet. One such platform is the PDP-
10, which is based upon a 36-bit ...
... 8-bit byte or octet. One such platform is the PDP-
10, which is based upon a 36-bit word. On these platforms, it is
wasteful to represent data in octets, since 4 bits are left unused in
...
... 10, which is based upon a 36-bit word. On these platforms, it is
wasteful to represent data in octets, since 4 bits are left unused in
each word. The 9-bit nonet is a much more sensible representation.
...
... wasteful to represent data in octets, since 4 bits are left unused in
each word. The 9-bit nonet is a much more sensible representation.
Although these platforms support IETF standards ...
... UNICODE] codepoints. When stored in nonets,
this results in as many as four wasted bits per [UNICODE]
character.
...
... codepoints outside the
BMP. When stored in nonet pairs, this results in as many as
four wasted bits per [UNICODE] character. This transformation
format requires complex surrogates to represent codepoints ...
... codepoints
outside the BMP. When stored in nonets, this results in as
many as sixteen wasted bits per character. This transformation
format requires very complex and computationally expensive
shifting and "modified BASE64 ...
... UNICODE]
codepoints. There are no wasted bits, and as the examples in this
document demonstrate, the computational processing is minimal.
...
... UTF-9 encodes [UNICODE] codepoints in the low order 8 bits of a
nonet, using the high order bit to indicate continuation. Surrogates
...
... codepoints in the low order 8 bits of a
nonet, using the high order bit to indicate continuation. Surrogates
are not used.
...
... SIP, plane 2), and Supplementary
Special-purpose Plane (SSP, plane 14) in a single 18-bit value. It
does not encode planes 3 though 13, which are currently unused; nor
planes 15 or 16, which are private spaces.
...
... Normally, UTF-9 and UTF-18 should only be used in the context of 9
bit storage and transport. Although some protocols, e.g., [FTP],
...
... stream represents [ISO-10646] codepoints using 9 bit nonets.
The low order 8-bits of a nonet is an octet, and the high order bit ...
... codepoints using 9 bit nonets.
The low order 8-bits of a nonet is an octet, and the high order bit
indicates continuation.
...
... bit nonets.
The low order 8-bits of a nonet is an octet, and the high order bit
indicates continuation.
...
... starting with the most-significant non-zero
octet. All but the least significant octet have the continuation bit
set in the associated nonet.
...
... codepoints using a pair of 9
bit nonets to form an 18-bit value.
UTF-18 does not use surrogates; consequently a UTF-16 ...
... PDP-10 assembly version)
; Accepts: P1/ 9-bit byte pointer to UTF-9 string
; Returns +1: Always, T1/ UCS ...
...
U42UT9: SETO T2, ; we'll need some of these 1-bits later
ASHC T1,-^D8 ; low octet becomes nonet with high 0-bit ...
... bits later
ASHC T1,-^D8 ; low octet becomes nonet with high 0-bit
U32U91: JUMPE T1,U42U9X ; done if no more octets
...
... T2
ROT T2,-1 ; turn it into nonet with high 1 bit
PUSHJ P,U42U91 ; recurse for remainder
U42U9X: LSHC T1 ...
... American National Standards Institute, "Coded Character Set - 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. ...
