character set
Click on the red underlined text to get to the source
... World Wide Web was seriously
restricted by its reliance on the ISO-8859-1 coded character set,
which is appropriate only for Western European languages. Despite
...
... HTML has been widely used with other languages,
using other coded character sets or character encodings, through
various ad hoc extensions to the language ...
...
The specific issues addressed are the SGML document character set to
be used for HTML, the proper treatment of the charset parameter ...
... The document character set ...
... HTML,
and in particular the SGML concept of a document character set. An
actual implementation may widely differ in its internal workings from
the model given below, but should behave as described to an outside
...
... character
encoding of the concrete SGML document, and it should be carefully
distinguished from the document character set of the abstract HTML
document. SGML views the characters as a single set (called a
...
... integer
number (known as "character number") to each character in the
repertoire. The document character set declaration defines what each
of the character numbers represents [GOLD90, p. 451]. In most cases,
...
... SGML DTD and all documents that refer to it have a single document
character set, and all markup and data characters are part of this
set.
...
... MIME is used to designate a character encoding,
rather than merely a coded character set as the term may suggest. A
character encoding is a mapping (possibly many-to-one) of sequences
...
... character encoding may be, the reference
processing model translates it to the document character set
specified in Section 2.2 before processing specific to SGML/HTML ...
... The decoder is responsible for decoding the external representation
of the resource to the document character set. The entity manager,
the parser, and the application deal only with characters of the
...
... entity manager,
the parser, and the application deal only with characters of the
document character set. A display-oriented part of the application
or the display machinery itself may again convert characters
represented in the document character set ...
... character set. A display-oriented part of the application
or the display machinery itself may again convert characters
represented in the document character set to some other
representation more suitable for their purpose. In any case, the
entity ...
... An actual implementation may choose, or not, to translate the
document into some encoding of the document character set as
described above; the behaviour described by this reference processing
model can be achieved otherwise. This subject ...
... processing model is
that numeric character references are always resolved with respect to
the fixed document character set, and thus to the same characters,
whatever the external encoding actually used. For an example, see
...
... The document character set ...
... character set, in the SGML sense, is the Universal
Character Set (UCS) of ISO 10646:1993 [ISO-10646 ...
... can be used.
The adoption of this document character set implies a change in the
SGML declaration specified in the HTML ...
...
Making the UCS the document character set does not create non-
conformance of any expression, construct or document that is
...
... SGML declaration, in the
belief that the latter did not express its authors' true intent. The
syntax character set declaration was changed from ISO 646.IRV:1983 to
the newer ISO ...
... interoperability by i) having the SGML declaration say what everyone
thinks and ii) making the syntax character set a proper subset of the
document character set. The characters that differ between the two
...
... thinks and ii) making the syntax character set a proper subset of the
document character set. The characters that differ between the two
versions of ISO ...
...
ISO 10646-1:1993 is the most encompassing character set currently
existing, and there is no other character set that could take its
...
... ISO 10646-1:1993 is the most encompassing character set currently
existing, and there is no other character set that could take its
place as the document character set for HTML ...
... existing, and there is no other character set that could take its
place as the document character set for HTML. If nevertheless for a
specific application there is a need to use characters ...
...
With the document character set being the full ISO 10646, the
possibility that a character cannot be displayed due to lack of
...
... - In case a numeric representation of the missing character is
given, its hexadecimal (not decimal) form is to be preferred,
because this form is used in character set standards [ERCS].
...
... language and platform
capability. As the following examples show (rather poorly, because of
the character set restriction of Internet specifications), the
quotation marks surrounding the quotation are particularly affected:
...
... from user-agent implementers. It is present in many character
sets (including the whole ISO 8859 series and, of course, ISO
10646), and can always be included by means of the reference
...
... header (see [HTTP-1.1]), which
contains a space and/or comma delimited list of character sets
acceptable to the server. A user agent may want to somehow advise
...
... the user of the contents of this attribute, or to restrict his
possibility to enter characters outside the repertoires of the listed
character sets.
NOTE -- The list of character sets ...
... character sets.
NOTE -- The list of character sets is to be interpreted as an
EXCLUSIVE-OR list; the server announces that it is ready to accept
...
... ISO 8859. International standard -- Information pro- cessing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1 (1987) -- Part 2: Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet No. 3 (1988) -- Part
4: Latin alphabet No. 4 (1988) -- Part 5: Latin/Cyrillic alphabet (1988) -- Part 6: Latin/Arabic alphabet (1987) -- Part :
Latin/Greek alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988) -- Part 9: Latin alphabet No. 5 (1989) -- Part 10: Latin
alphabet No. 6 (1992) ...
... Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, Rationel Almen Planlaegning, June 1992. ...
