UCS
Click on the red underlined text to get to the source
... UNICODE] codepoints and UTF-18 is
also quite simple. Although (like UCS-2) UTF-18 only represents a
subset of the available [UNICODE] codepoints ...
... UTF-9 does not use surrogates; consequently a UTF-16 value must be
transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never
transmitted in UTF-9.
...
... U+10FFFD <Plane 16 Private Use, Last> 420 777 375
0x345ecf1b (UCS-4 value not in [UNICODE]) 464 536 717 33
...
... UTF-18 does not use surrogates; consequently a UTF-16 value must be
transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never
transmitted in UTF-18.
...
...
The following routines demonstrate conversion from UCS-4 to UTF-9.
For simplicity, these routines do not do any validity checking.
...
... bit byte pointer to UTF-9 string
; Returns +1: Always, T1/ UCS-4 value, P1/ updated byte pointer
; Clobbers T2
...
... POPJ P,
/* Return UCS-4 value from UTF-9 string (C version)
* Accepts: pointer to pointer to UTF-9 string
...
... version)
* Accepts: pointer to pointer to UTF-9 string
* Returns: UCS-4 character, nonet pointer updated
*/
...
... UTF-9 to UCS-4 Conversion ...
...
The following routines demonstrate conversion from UTF-9 to UCS-4.
For simplicity, these routines do not do any validity checking.
...
... For simplicity, these routines do not do any validity checking.
Routines used in applications SHOULD reject invalid UCS-4 codepoints;
that is, codepoints ...
... bit byte pointer to UTF-9 string
; T1/ UCS-4 character to write
; Returns +1: Always, P1/ updated byte pointer
; Clobbers T1 ...
... POPJ P,
/* Write UCS-4 character to UTF-9 string (C version)
* Accepts: pointer to nonet string
...
... version)
* Accepts: pointer to nonet string
* UCS-4 character to write
* Returns: updated pointer
*/
...
... International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS)", ISO/IEC Standard 10646, comprised of ISO/IEC 10646-1:2000, "Information technology - Universal Multiple-Octet Coded Character Set ...
... ISO/IEC Standard 10646, comprised of ISO/IEC 10646-1:2000, "Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO/IEC 10646-2:2001, "Information technology - Universal Multiple-Octet Coded Character Set ...
... Architecture and Basic Multilingual Plane", ISO/IEC 10646-2:2001, "Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 2: Supplementary Planes" and ISO/IEC 10646-1:2000/Amd 1:2002, "Mathematical symbols and other characters". ...
