1 - 2 - 4 - 6 - 7 - 8 - A - B - C - D - E - F - G - H - I - L - M - N - O - P - R - S - T - U - V - W
character set
Click on the red underlined text to get to the source
... Internet grows throughout the world the requirement to support
character sets outside of the ASCII [ASCII] / Latin-1 [ISO-8859 ...
... ASCII] / Latin-1 [ISO-8859]
character set becomes ever more urgent. For FTP, because of the
large installed base, it is paramount that this is done without
...
... client
commands and server responses, RECOMMENDs the use of a Universal
Character Set (UCS) ISO/IEC 10646 [ISO-10646 ...
... The recommendations made in this document are consistent with the
recommendations expressed by the IETF policy related to character
sets and languages as defined in RFC 2277 [RFC2277 ...
... The File Transfer Protocol was developed when the predominate
character sets were 7 bit ASCII and 8 bit ...
... 8 bit EBCDIC. Today these
character sets cannot support the wide range of characters needed by
multinational systems. Given that there are a number of character
sets ...
... character sets cannot support the wide range of characters needed by
multinational systems. Given that there are a number of character
sets in current use that provide more characters than 7-bit ASCII, it
...
... makes sense to decide on a convenient way to represent the union of
those possibilities. To work globally either requires support of a
number of character sets and to be able to convert between them, or
the use of a single preferred character set. To assure global
...
... number of character sets and to be able to convert between them, or
the use of a single preferred character set. To assure global
interoperability this document RECOMMENDS the latter approach and
...
... interoperability this document RECOMMENDS the latter approach and
defines a single character set, in addition to NVT ASCII and EBCDIC ...
... EBCDIC,
which is understandable by all systems. For FTP this character set
SHALL be ISO/IEC 10646:1993. For support of global compatibility ...
...
The character set used to store files SHALL remain a local decision
and MAY depend on the capability of local operating systems. Prior to
...
...
Sections 2.1 and 2.2 give a brief description of the international
character set and transfer encoding RECOMMENDED by this document. A
more thorough description of UTF-8 ...
... International Character Set ...
...
The character set defined for international support of FTP SHALL be
the Universal Character Set ...
... character set defined for international support of FTP SHALL be
the Universal Character Set as defined in ISO 10646:1993 as amended.
This standard incorporates the character sets ...
... Character Set as defined in ISO 10646:1993 as amended.
This standard incorporates the character sets of many existing
international, national, and corporate standards. ISO/IEC 10646
...
... UCS-2 is a 2 byte (16 bit) character set consisting of plane
zero or the Basic Multilingual Plane (BMP). Currently, no codesets
have been defined outside of the 2 byte ...
... or UTF-FSS, SHALL be used as a transfer encoding to transmit the
international character set. UTF-8 is a file safe encoding which
...
... encoding rules allow for easy
identification; and it has enough space to support a large number of
character sets.
...
... encoding rules make it very unlikely that
a character sequence from a different character set will be mistaken
for a UTF-8 encoded character sequence ...
... character sequence. Clients and servers can use a
simple routine to determine if the character set being exchanged is
valid UTF-8 ...
... valid UTF-8 sequences is assumed to be UTF-8. The character set of
other names is undefined. Clients and servers, unless otherwise
...
... other names is undefined. Clients and servers, unless otherwise
configured to support a specific native character set, MUST check
for a valid UTF-8 ...
... clients interpret 8-bit pathnames as being in the
local character set. They MAY continue to do so for pathnames that
are not valid UTF-8 ...
...
The Character Set Workshop Report [RFC2130] suggests that clients and
servers SHOULD negotiate a language ...
... to-left and left-to-right text.
Character Set - a collection of characters used to represent textual
information in which each character has a numeric value
...
... information in which each character has a numeric value
Code Set - (see character set).
Glyph - a character image ...
... ANSI X3.4:1986 Coded Character Sets - 7 Bit American National Standard Code for Information Interchange (7- bit ASCII ...
... ISO 8859. International standard -- Information processing -- 8-bit single-byte coded graphic character sets -- Part 1:Latin alphabet No. 1 (1987) -- Part 2: Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet No. 3 (1988) -- Part
4: Latin alphabet No. 4 (1988) -- Part 5: Latin/Cyrillic alphabet (1988) -- Part 6: Latin/Arabic alphabet (1987) -- Part :
Latin/Greek alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988) -- Part 9: Latin alphabet No. 5 (1989) -- Part10: Latin
alphabet No. 6 (1992) ...
... ISO/IEC 10646-1:1993. International standard -- Information technology -- Universal multiple-octet coded character set (UCS) -- Part 1: Architecture and basic multilingual plane. ...
... Weider, C., Preston, C., Simonsen, K., Alvestrand, H., Atkinson, R., Crispin, M. and P. Svanberg, "Character Set Workshop Report", RFC 2130, April 1997. ...
... no longer being able to steal the high order bit for internal use,
when supporting the extended character set.
- Implementers ...
... valid UTF-8, such as
the existence of multiple local character sets in short pathnames.
Hopefully, as more implementations conform to UTF-8 transfer
encoding ...
... //Latin1DirectoryName/HebrewFileName). They should be prepared to
handle the Bi-directional (BIDI) display of these character sets
(i.e. right to left display for the directory and left to right
display for the filename). While bi-directional ...
... equivalent due to the insertion of BIDI control characters at
different points during composition. Also note that mixed character
sets may also present problems with font swapping.
- A server that copies pathnames transparently from a local
...
... can't then it should leave that name in its raw form.
- Some server's OS do not mandate character sets, but allow
administrators to configure it in the FTP ...
... charsets for different directories.
- If the server's OS does not mandate the character set and the FTP
server cannot be configured, the server should simply use the raw
...
...
Note that the conversion examples below assume that the local
character set supported in the operating system is something other
than UCS2/UTF-16 ...
... support UCS2/UTF-16 (notably Plan 9 and Windows NT). In this case no
conversion will be necessary from the local character set to the UCS.
...
... B.2.1 Conversion from Local Character Set to UTF-8 ...
...
Conversion from the local filesystem character set to UTF-8 will
normally involve a two step process. First convert the local
...
... UTF-8 will
normally involve a two step process. First convert the local
character set to the UCS; then convert the UCS to UTF-8 ...
...
The first step in the process can be performed by maintaining a
mapping table that includes the local character set code and the
corresponding UCS code. For instance the ISO/IEC ...
... B.2.2 Conversion from UTF-8 to Local Character Set ...
...
When moving from UTF-8 encoding to the local character set the
reverse procedure is used. First the UTF-8 encoding is transformed
...
... UTF-8 encoding is transformed
into the UCS-4 character set. The UCS-4 is then converted to the
local character set ...
... character set. The UCS-4 is then converted to the
local character set from a mapping table (i.e. the opposite of the
table used to form the UCS-4 character code).
...
... a very similar manner as described above. For instance both the PC
and Mac codepages reflect the character set from the Thai standard
TIS 620-2533. The character code on both platforms for the Thai
letter "SO SO" is 0xAB. This character can then be mapped into the
...
