RFC 4042:UTF-9 and UTF-18 Efficient Transformation...
RFC-Ref

UCS


Click on the red underlined text to get to the source

... UNICODE] codepoints and UTF-18 is also quite simple. Although (like UCS-2) UTF-18 only represents a subset of the available [UNICODE] codepoints ...


... UTF-9 does not use surrogates; consequently a UTF-16 value must be transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never transmitted in UTF-9. ...
... U+10FFFD <Plane 16 Private Use, Last> 420 777 375 0x345ecf1b (UCS-4 value not in [UNICODE]) 464 536 717 33 ...


... UTF-18 does not use surrogates; consequently a UTF-16 value must be transformed into the UCS-4 equivalent, and U+D800 - U+DBFF are never transmitted in UTF-18. ...


... The following routines demonstrate conversion from UCS-4 to UTF-9. For simplicity, these routines do not do any validity checking. ...
... UTF-16 surrogates. ; Return UCS-4 value from UTF-9 string (PDP-10 assembly version) ...
... bit byte pointer to UTF-9 string ; Returns +1: Always, T1/ UCS-4 value, P1/ updated byte pointer ; Clobbers T2 ...
... XOR T1,T2 ; insert octet into UCS-4 value LSH T1,^D8 ; shift UCS ...
... UCS-4 value LSH T1,^D8 ; shift UCS-4 value ILDB T2,P1 ; get next nonet ...
... POPJ P, /* Return UCS-4 value from UTF-9 string (C version) * Accepts: pointer to pointer to UTF-9 string ...
... version) * Accepts: pointer to pointer to UTF-9 string * Returns: UCS-4 character, nonet pointer updated */ ...
... UTF-9 to UCS-4 Conversion ...
... The following routines demonstrate conversion from UTF-9 to UCS-4. For simplicity, these routines do not do any validity checking. ...
... For simplicity, these routines do not do any validity checking. Routines used in applications SHOULD reject invalid UCS-4 codepoints; that is, codepoints ...
... UNICODE]. ; Write UCS-4 character to UTF-9 string (PDP-10 assembly version) ...
... bit byte pointer to UTF-9 string ; T1/ UCS-4 character to write ; Returns +1: Always, P1/ updated byte pointer ; Clobbers T1 ...
... POPJ P, /* Write UCS-4 character to UTF-9 string (C version) * Accepts: pointer to nonet string ...
... version) * Accepts: pointer to nonet string * UCS-4 character to write * Returns: updated pointer */ ...


... International Organization for Standardization, "Information Technology - Universal Multiple-octet coded Character Set (UCS)", ISO/IEC Standard 10646, comprised of ISO/IEC 10646-1:2000, "Information technology - Universal Multiple-Octet Coded Character Set ...
... ISO/IEC Standard 10646, comprised of ISO/IEC 10646-1:2000, "Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO/IEC 10646-2:2001, "Information technology - Universal Multiple-Octet Coded Character Set ...
... Architecture and Basic Multilingual Plane", ISO/IEC 10646-2:2001, "Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 2: Supplementary Planes" and ISO/IEC 10646-1:2000/Amd 1:2002, "Mathematical symbols and other characters". ...



Google
Web
RFC-Ref