RFC-Ref is not longer maintained; use RFC browser at: http://zvon.org/comp/r/ref-RFC.html
RFC 2070:Internationalization of the Hypertext Mar...
RFC-Ref

character set


Click on the red underlined text to get to the source

... World Wide Web was seriously restricted by its reliance on the ISO-8859-1 coded character set, which is appropriate only for Western European languages. Despite ...
... HTML has been widely used with other languages, using other coded character sets or character encodings, through various ad hoc extensions to the language ...
... The specific issues addressed are the SGML document character set to be used for HTML, the proper treatment of the charset parameter ...
... removing the restriction to the ISO-8859-1 coded character set [ISO-8859]. ...


... The document character set ...
... HTML, and in particular the SGML concept of a document character set. An actual implementation may widely differ in its internal workings from the model given below, but should behave as described to an outside ...
... character encoding of the concrete SGML document, and it should be carefully distinguished from the document character set of the abstract HTML document. SGML views the characters as a single set (called a ...
... integer number (known as "character number") to each character in the repertoire. The document character set declaration defines what each of the character numbers represents [GOLD90, p. 451]. In most cases, ...
... SGML DTD and all documents that refer to it have a single document character set, and all markup and data characters are part of this set. ...
... MIME is used to designate a character encoding, rather than merely a coded character set as the term may suggest. A character encoding is a mapping (possibly many-to-one) of sequences ...
... character encoding may be, the reference processing model translates it to the document character set specified in Section 2.2 before processing specific to SGML/HTML ...
... The decoder is responsible for decoding the external representation of the resource to the document character set. The entity manager, the parser, and the application deal only with characters of the ...
... entity manager, the parser, and the application deal only with characters of the document character set. A display-oriented part of the application or the display machinery itself may again convert characters represented in the document character set ...
... character set. A display-oriented part of the application or the display machinery itself may again convert characters represented in the document character set to some other representation more suitable for their purpose. In any case, the entity ...
... semantics are concerned, are using the HTML document character set only. ...
... An actual implementation may choose, or not, to translate the document into some encoding of the document character set as described above; the behaviour described by this reference processing model can be achieved otherwise. This subject ...
... processing model is that numeric character references are always resolved with respect to the fixed document character set, and thus to the same characters, whatever the external encoding actually used. For an example, see ...
... The document character set ...
... The document character set, in the SGML sense, is the Universal Character Set ...
... character set, in the SGML sense, is the Universal Character Set (UCS) of ISO 10646:1993 [ISO-10646 ...
... can be used. The adoption of this document character set implies a change in the SGML declaration specified in the HTML ...
... Making the UCS the document character set does not create non- conformance of any expression, construct or document that is ...
... SGML declaration, in the belief that the latter did not express its authors' true intent. The syntax character set declaration was changed from ISO 646.IRV:1983 to the newer ISO ...
... interoperability by i) having the SGML declaration say what everyone thinks and ii) making the syntax character set a proper subset of the document character set. The characters that differ between the two ...
... thinks and ii) making the syntax character set a proper subset of the document character set. The characters that differ between the two versions of ISO ...
... ISO 10646-1:1993 is the most encompassing character set currently existing, and there is no other character set that could take its ...
... ISO 10646-1:1993 is the most encompassing character set currently existing, and there is no other character set that could take its place as the document character set for HTML ...
... existing, and there is no other character set that could take its place as the document character set for HTML. If nevertheless for a specific application there is a need to use characters ...
... With the document character set being the full ISO 10646, the possibility that a character cannot be displayed due to lack of ...
... - In case a numeric representation of the missing character is given, its hexadecimal (not decimal) form is to be preferred, because this form is used in character set standards [ERCS]. ...


... language and platform capability. As the following examples show (rather poorly, because of the character set restriction of Internet specifications), the quotation marks surrounding the quotation are particularly affected: ...
... from user-agent implementers. It is present in many character sets (including the whole ISO 8859 series and, of course, ISO 10646), and can always be included by means of the reference ...


... header (see [HTTP-1.1]), which contains a space and/or comma delimited list of character sets acceptable to the server. A user agent may want to somehow advise ...
... the user of the contents of this attribute, or to restrict his possibility to enter characters outside the repertoires of the listed character sets. NOTE -- The list of character sets ...
... character sets. NOTE -- The list of character sets is to be interpreted as an EXCLUSIVE-OR list; the server announces that it is ready to accept ...


... ISO 8859. International standard -- Information pro- cessing -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1 (1987) -- Part 2: Latin alphabet No. 2 (1987) -- Part 3: Latin alphabet No. 3 (1988) -- Part 4: Latin alphabet No. 4 (1988) -- Part 5: Latin/Cyrillic alphabet (1988) -- Part 6: Latin/Arabic alphabet (1987) -- Part : Latin/Greek alphabet (1987) -- Part 8: Latin/Hebrew alphabet (1988) -- Part 9: Latin alphabet No. 5 (1989) -- Part 10: Latin alphabet No. 6 (1992) ...
... Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, Rationel Almen Planlaegning, June 1992. ...



Google
Web
RFC-Ref