RFC 2640:Internationalization of the File Transfer...
RFC-Ref

character set


Click on the red underlined text to get to the source

... Internet grows throughout the world the requirement to support character sets outside of the ASCII [ASCII] / Latin-1 [ISO-8859 ...
... ASCII] / Latin-1 [ISO-8859] character set becomes ever more urgent. For FTP, because of the large installed base, it is paramount that this is done without ...
... client commands and server responses, RECOMMENDs the use of a Universal Character Set (UCS) ISO/IEC 10646 [ISO-10646 ...
... The recommendations made in this document are consistent with the recommendations expressed by the IETF policy related to character sets and languages as defined in RFC 2277 [RFC2277 ...


... The File Transfer Protocol was developed when the predominate character sets were 7 bit ASCII and 8 bit ...
... 8 bit EBCDIC. Today these character sets cannot support the wide range of characters needed by multinational systems. Given that there are a number of character sets ...
... character sets cannot support the wide range of characters needed by multinational systems. Given that there are a number of character sets in current use that provide more characters than 7-bit ASCII, it ...
... makes sense to decide on a convenient way to represent the union of those possibilities. To work globally either requires support of a number of character sets and to be able to convert between them, or the use of a single preferred character set. To assure global ...
... number of character sets and to be able to convert between them, or the use of a single preferred character set. To assure global interoperability this document RECOMMENDS the latter approach and ...
... interoperability this document RECOMMENDS the latter approach and defines a single character set, in addition to NVT ASCII and EBCDIC ...
... EBCDIC, which is understandable by all systems. For FTP this character set SHALL be ISO/IEC 10646:1993. For support of global compatibility ...
... The character set used to store files SHALL remain a local decision and MAY depend on the capability of local operating systems. Prior to ...
... Sections 2.1 and 2.2 give a brief description of the international character set and transfer encoding RECOMMENDED by this document. A more thorough description of UTF-8 ...
... International Character Set ...
... The character set defined for international support of FTP SHALL be the Universal Character Set ...
... character set defined for international support of FTP SHALL be the Universal Character Set as defined in ISO 10646:1993 as amended. This standard incorporates the character sets ...
... Character Set as defined in ISO 10646:1993 as amended. This standard incorporates the character sets of many existing international, national, and corporate standards. ISO/IEC 10646 ...
... UCS-2 is a 2 byte (16 bit) character set consisting of plane zero or the Basic Multilingual Plane (BMP). Currently, no codesets have been defined outside of the 2 byte ...
... or UTF-FSS, SHALL be used as a transfer encoding to transmit the international character set. UTF-8 is a file safe encoding which ...
... encoding rules allow for easy identification; and it has enough space to support a large number of character sets. ...
... encoding rules make it very unlikely that a character sequence from a different character set will be mistaken for a UTF-8 encoded character sequence ...
... character sequence. Clients and servers can use a simple routine to determine if the character set being exchanged is valid UTF-8 ...


... valid UTF-8 sequences is assumed to be UTF-8. The character set of other names is undefined. Clients and servers, unless otherwise ...
... other names is undefined. Clients and servers, unless otherwise configured to support a specific native character set, MUST check for a valid UTF-8 ...
... clients interpret 8-bit pathnames as being in the local character set. They MAY continue to do so for pathnames that are not valid UTF-8 ...


... The Character Set Workshop Report [RFC2130] suggests that clients and servers SHOULD negotiate a language ...


... This document addresses the support of character sets beyond 1 byte and a new language ...


... to-left and left-to-right text. Character Set - a collection of characters used to represent textual information in which each character has a numeric value ...
... information in which each character has a numeric value Code Set - (see character set). Glyph - a character image ...
... UCS-2 - the ISO/IEC 10646 two octet Universal Character Set form. UCS ...
... UCS-4 - the ISO/IEC 10646 four octet Universal Character Set form. UTF-8 ...


... ANSI X3.4:1986 Coded Character Sets - 7 Bit American National Standard Code for Information Interchange (7- bit ASCII ...
... ISO 8859. International standard -- Information processing -- 8-bit single-byte coded graphic character sets -- Part 1:Latin alphabet No. 1 (1987) -- Part 2: Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet No. 3 (1988) -- Part 4: Latin alphabet No. 4 (1988) -- Part 5: Latin/Cyrillic alphabet (1988) -- Part 6: Latin/Arabic alphabet (1987) -- Part : Latin/Greek alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988) -- Part 9: Latin alphabet No. 5 (1989) -- Part10: Latin alphabet No. 6 (1992) ...
... ISO/IEC 10646-1:1993. International standard -- Information technology -- Universal multiple-octet coded character set (UCS) -- Part 1: Architecture and basic multilingual plane. ...
... Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson, R., Crispin, M. and P. Svanberg, "Character Set Workshop Report", RFC 2130, April 1997. ...
... Alvestrand, H., " IETF Policy on Character Sets and Languages", RFC 2277, January 1998. ...


... no longer being able to steal the high order bit for internal use, when supporting the extended character set. - Implementers ...
... valid UTF-8, such as the existence of multiple local character sets in short pathnames. Hopefully, as more implementations conform to UTF-8 transfer encoding ...
... //Latin1DirectoryName/HebrewFileName). They should be prepared to handle the Bi-directional (BIDI) display of these character sets (i.e. right to left display for the directory and left to right display for the filename). While bi-directional ...
... equivalent due to the insertion of BIDI control characters at different points during composition. Also note that mixed character sets may also present problems with font swapping. - A server that copies pathnames transparently from a local ...
... can't then it should leave that name in its raw form. - Some server's OS do not mandate character sets, but allow administrators to configure it in the FTP ...
... charsets for different directories. - If the server's OS does not mandate the character set and the FTP server cannot be configured, the server should simply use the raw ...


... Note that the conversion examples below assume that the local character set supported in the operating system is something other than UCS2/UTF-16 ...
... support UCS2/UTF-16 (notably Plan 9 and Windows NT). In this case no conversion will be necessary from the local character set to the UCS. ...
... B.2.1 Conversion from Local Character Set to UTF-8 ...
... Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local ...
... UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8 ...
... The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. For instance the ISO/IEC ...
... B.2.2 Conversion from UTF-8 to Local Character Set ...
... When moving from UTF-8 encoding to the local character set the reverse procedure is used. First the UTF-8 encoding is transformed ...
... UTF-8 encoding is transformed into the UCS-4 character set. The UCS-4 is then converted to the local character set ...
... character set. The UCS-4 is then converted to the local character set from a mapping table (i.e. the opposite of the table used to form the UCS-4 character code). ...
... This example demonstrates mapping ISO/IEC 8859-8 character set to UTF-8 and back to ISO/IEC ...
... a very similar manner as described above. For instance both the PC and Mac codepages reflect the character set from the Thai standard TIS 620-2533. The character code on both platforms for the Thai letter "SO SO" is 0xAB. This character can then be mapped into the ...



Google
Web
RFC-Ref