character encoding
Click on the red underlined text to get to the source
... languages,
using other coded character sets or character encodings, through
various ad hoc extensions to the language [TAKADA ...
... interoperability and proper support for at least ISO-
8859-1 in an environment where character encoding schemes other
than ISO-8859-1 are present, user agents ...
... length than 8) in a concrete realization of the document such as a
computer file. This encoding is called the external character
encoding of the concrete SGML document, and it should be carefully
distinguished from the document character set ...
... SGML, does not directly address the
question of the external character encoding. This is deferred to
mechanisms external to HTML, such as MIME ...
... For the HTTP protocol [RFC2068], the external character encoding is
indicated by the "charset" parameter of the "Content-Type ...
... The term "charset" in MIME is used to designate a character encoding,
rather than merely a coded character set as the term may suggest. A
...
... rather than merely a coded character set as the term may suggest. A
character encoding is a mapping (possibly many-to-one) of sequences
of octets to sequences of characters taken from one or more character
repertoires.
...
... HTTP protocol also defines a mechanism for the client to specify
the character encodings it can accept. Clients and servers are
...
... HTML documents are transferred by electronic mail, the
external character encoding is defined by the "charset" parameter of
the "Content-Type ...
...
No mechanisms are currently standardized for indicating the external
character encoding of HTML documents transferred by FTP or accessed
...
... HTML documents
are defined or become popular, it is advised that similar provisions
be made to clearly identify the character encoding used and/or to use
a single/default encoding capable of representing the widest range ...
... context.
Whatever the external character encoding may be, the reference
processing model translates it to the document character set ...
... Language tags can be used to control rendering of a marked up
document in various ways: glyph disambiguation, in cases where the
character encoding is not sufficient to resolve to a specific glyph;
quotation marks; hyphenation; ligatures; spacing; voice synthesis;
...
... These entities can be used in place of the corresponding formatting
characters whenever convenient, for example to ease keyboard entry or
when a formatting character is not available in the character
encoding of the document.
Next, an attribute called DIR is introduced, restricted to the values
...
... interoperability, it is necessary for the user agent
(and the user) to have an indication of the character encoding(s)
that the server providing a form will be able to handle upon
submission of the filled-in form. Such an indication is provided by
...
... EXCLUSIVE-OR list; the server announces that it is ready to accept
any ONE of these character encoding schemes for each part of a
multipart entity. The client ...
... multipart entity. The client may perform character encoding
translation to satisfy the server if necessary.
...
... INPUT or TEXTAREA element is the reserved value "UNKNOWN". A user
agent may interpret that value as the character encoding scheme
that was used to transmit the document containing that element.
...
... RFC1738] specifies that octets may be encoded using
the "%HH" notation, but text submitted from a form is composed of
characters, not octets. Lacking a specification of a character
encoding scheme, the "%HH" notation has no well-defined meaning.
...
... including if necessary a charset parameter that specifies the
character encoding scheme. The changes to the DTD necessary to
support this method ...
... URL encoding of [RFC1738] is applied on top of the specified
character encoding, as a kind of implicit Content-Transfer-Encoding.
...
... External character encoding issues ...
...
Proper interpretation of a text document requires that the character
encoding scheme be known. Current HTTP servers, however, do not
generally include an appropriate charset parameter ...
... hint to the User Agent as to the
character encoding scheme used by the resource pointed to by the
hyperlink; it should be the appropriate value of the MIME charset
parameter ...
... META element is parsed. Note that there are better ways
for a server to obtain character encoding information, instead of the
unreliable META above; see [NICOL2 ...
... Murai, J., Crispin M., and E. van der Poel, "Japanese Character Encoding for Internet Messages", RFC 1468, Keio University, Panda Programming, June 1993. ...
... The Unicode Consortium, "The Unicode Standard -- Worldwide Character Encoding -- Version 1.0", Addison- Wesley, Volume 1, 1991, Volume 2, 1992, and Technical Report #4, 1993. The BIDI algorithm is in appendix A of volume 1, with corrections in appendix D of volume 2. ...
