encoding
Click on the red underlined text to get to the source
... User
Agent, a program with which human users send and receive mail).
Examples of such encodings currently used in the Internet include
pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in
...
... Content-Transfer-Encoding header field, which can be
used to specify both the encoding transformation that
was applied to the body and the domain of the result.
...
... was applied to the body and the domain of the result.
Encoding transformations other than the identity
transformation are usually applied to data in order to
...
...
This definition is intended to allow various kinds of character
encodings, from simple single-table mappings such as US-ASCII to
complex table switching methods ...
... character sets and switching techniques make the
situation more complex. For example, some communities use the term
"character encoding" for what MIME calls a "character set", while
...
... encoding ...
...
It is necessary, therefore, to define a standard mechanism for
encoding such data into a 7bit short line format. Proper labelling
of unencoded material in less restrictive formats for direct use over
less restrictive transports ...
... less restrictive transports is also desireable. This document
specifies that such encodings will be indicated by a new "Content-
Transfer-Encoding" header field ...
... specifies that such encodings will be indicated by a new "Content-
Transfer-Encoding" header field. This field has not been defined by
any previous standard.
...
... Content-Transfer-Encoding Syntax ...
...
The Content-Transfer-Encoding field's value is a single token
specifying the type of encoding ...
... Content-Transfer-Encoding field's value is a single token
specifying the type of encoding, as enumerated below. Formally:
...
... "Content-Transfer-Encoding" ":" mechanism ...
... BASE64 and bAsE64
are all equivalent. An encoding type of 7BIT requires that the body
is already in a 7bit mail-ready representation. This is the default
value -- that is, "Content-Transfer-Encoding ...
... encoding type of 7BIT requires that the body
is already in a 7bit mail-ready representation. This is the default
value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
Content-Transfer-Encoding header field ...
... default
value -- that is, "Content-Transfer-Encoding: 7BIT" is assumed if the
Content-Transfer-Encoding header field is not present.
...
... Content-Transfer-Encodings Semantics ...
...
This single Content-Transfer-Encoding token actually provides two
pieces of information. It specifies what sort of encoding ...
... Content-Transfer-Encoding token actually provides two
pieces of information. It specifies what sort of encoding
transformation the body was subjected to and hence what decoding
operation must be used to restore it to its original form, and it
...
...
The transformation part of any Content-Transfer-Encodings specifies,
either explicitly or implicitly, a single, well-defined decoding
...
... it to the original sequence of octets which was encoded, or shows
that it is illegal as an encoded sequence. Content-Transfer-
Encodings transformations never depend on any additional external
profile information for proper operation. Note that while decoders ...
... must produce a single, well-defined output for a valid encoding no
such restrictions exist for encoders: Encoding ...
... encoding no
such restrictions exist for encoders: Encoding a given sequence of
octets to different, equivalent encoded sequences is perfectly legal.
...
... Three transformations are currently defined: identity, the "quoted-
printable" encoding, and the "base64" encoding. The domains ...
...
The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all
mean that the identity ...
... mean that the identity (i.e. NO) encoding transformation has been
performed. As such, they serve simply as indicators of the domain of
...
... domain of
the body data, and provide useful information about the sort of
encoding that might be needed for transmission in a given transport
system. The terms "7bit data", "8bit ...
... The quoted-printable and base64 encodings transform their input from
an arbitrary domain into material in the "7bit" range ...
...
The proper Content-Transfer-Encoding label must always be used.
Labelling unencoded data containing 8bit characters as "7bit" is not
...
...
Unlike media subtypes, a proliferation of Content-Transfer-Encoding
values is both undesirable and unnecessary. However, establishing
only a single transformation into the "7bit" domain ...
...
possible. There is a tradeoff between the desire for a compact and
efficient encoding of largely- binary data and the desire for a
somewhat readable encoding of data ...
... efficient encoding of largely- binary data and the desire for a
somewhat readable encoding of data that is mostly, but not entirely,
7bit. For this reason, at least two encoding mechanisms are
...
... somewhat readable encoding of data that is mostly, but not entirely,
7bit. For this reason, at least two encoding mechanisms are
necessary: a more or less readable encoding (quoted-printable ...
... 7bit. For this reason, at least two encoding mechanisms are
necessary: a more or less readable encoding (quoted-printable) and a
"dense" or "uniform" encoding ...
... unencoded binary data in mail bodies. Thus there are no
circumstances in which the "binary" Content-Transfer-Encoding is
actually valid in Internet mail ...
...
NOTE: The five values defined for the Content-Transfer-Encoding field
imply nothing about the media type other than the algorithm ...
... New Content-Transfer-Encodings ...
... Implementors may, if necessary, define private Content-Transfer-
Encoding values, but must use an x-token, which is a name prefixed by
"X-", to indicate its non-standard status, e.g., "Content-Transfer-
...
... token, which is a name prefixed by
"X-", to indicate its non-standard status, e.g., "Content-Transfer-
Encoding: x-my-new-encoding". Additional standardized Content-
Transfer-Encoding ...
... "X-", to indicate its non-standard status, e.g., "Content-Transfer-
Encoding: x-my-new-encoding". Additional standardized Content-
Transfer-Encoding values must be specified by a standards-track RFC ...
... Encoding: x-my-new-encoding". Additional standardized Content-
Transfer-Encoding values must be specified by a standards-track RFC.
The requirements ...
... requirements such specifications must meet are given in RFC 2048(-> 4289 | 4288).
As such, all content-transfer-encoding namespace except that
beginning with "X-" is explicitly reserved to the IETF ...
... Unlike media types and subtypes, the creation of new Content-
Transfer-Encoding values is STRONGLY discouraged, as it seems likely
to hinder interoperability with little potential benefit
...
... message header, it applies to the entire body of that message. If a
Content-Transfer-Encoding header field appears as part of an entity's
...
... entity. If an entity is
of type "multipart" the Content-Transfer-Encoding is not permitted to
have any value other than "7bit", "8bit" or "binary". Even more
...
... octets rather than bits, so that the mechanisms described here are
mechanisms for encoding arbitrary octet streams, not bit streams. If
a bit stream ...
...
The encoding mechanisms defined here explicitly encode all data in
US-ASCII. Thus, for example, suppose an entity ...
... base64 US-ASCII
encoding of data that was originally in ISO-8859-1, and will be in
that character set ...
...
Certain Content-Transfer-Encoding values may only be used on certain
media types. In particular, it is EXPRESSLY FORBIDDEN to use any
...
... media types. In particular, it is EXPRESSLY FORBIDDEN to use any
encodings other than "7bit", "8bit", or "binary" with any composite
...
... composite media types are "multipart" and
"message". All encodings that are desired for bodies of type
multipart or message must be done at the innermost level, by encoding
...
... "message". All encodings that are desired for bodies of type
multipart or message must be done at the innermost level, by encoding
the actual body that needs to be encoded.
...
... composite entity
has a transfer-encoding value such as "7bit", but one of the enclosed
entities has a less restrictive value such as "8bit", then either the
...
...
NOTE ON ENCODING RESTRICTIONS: Though the prohibition against using
content-transfer-encodings on composite ...
... ON ENCODING RESTRICTIONS: Though the prohibition against using
content-transfer-encodings on composite body data may seem overly
restrictive, it is necessary to prevent nested encodings ...
... content-transfer-encodings on composite body data may seem overly
restrictive, it is necessary to prevent nested encodings, in which
data are passed through an encoding algorithm ...
... restrictive, it is necessary to prevent nested encodings, in which
data are passed through an encoding algorithm multiple times, and
must be decoded multiple times in order to be properly viewed.
...
... algorithm multiple times, and
must be decoded multiple times in order to be properly viewed.
Nested encodings add considerable complexity to user agents: Aside
from the obvious efficiency problems with such multiple encodings ...
... encodings add considerable complexity to user agents: Aside
from the obvious efficiency problems with such multiple encodings,
they can obscure the basic structure of a message. In particular,
they can imply that several decoding operations are necessary simply
...
...
to find out what types of bodies a message contains. Banning nested
encodings may complicate the job of certain mail gateways, but this
seems less of a problem than the effect of nested encodings ...
... encodings may complicate the job of certain mail gateways, but this
seems less of a problem than the effect of nested encodings on user
agents.
...
...
Any entity with an unrecognized Content-Transfer-Encoding must be
treated as if it has a Content-Type of "application/octet-stream ...
... ON THE RELATIONSHIP BETWEEN CONTENT-TYPE AND CONTENT-TRANSFER-
ENCODING: It may seem that the Content-Transfer-Encoding could be
inferred from the characteristics of the media that is to be encoded,
...
... CONTENT-TYPE AND CONTENT-TRANSFER-
ENCODING: It may seem that the Content-Transfer-Encoding could be
inferred from the characteristics of the media that is to be encoded,
or, at the very least, that certain Content-Transfer-Encodings ...
... Content-Transfer-Encoding could be
inferred from the characteristics of the media that is to be encoded,
or, at the very least, that certain Content-Transfer-Encodings could
be mandated for use with specific media types. There are several
...
... reasons why this is not the case. First, given the varying types of
transports used for mail, some encodings may be appropriate for some
combinations of media types and transports ...
... example, in an 8bit transport, no encoding would be required for text
in certain character sets, while such encodings ...
... encoding would be required for text
in certain character sets, while such encodings are clearly required
for 7bit SMTP.)
...
...
Second, certain media types may require different types of transfer
encoding under different circumstances. For example, many PostScript
bodies might consist entirely of short lines of 7bit data and hence
...
... PostScript
bodies might consist entirely of short lines of 7bit data and hence
require no encoding at all. Other PostScript bodies (especially
those using Level 2 PostScript ...
... PostScript bodies (especially
those using Level 2 PostScript's binary encoding mechanism) may only
be reasonably represented using a binary transport encoding.
...
... PostScript's binary encoding mechanism) may only
be reasonably represented using a binary transport encoding.
Finally, since the Content-Type field is intended to be an open-ended ...
... association
between media types and encodings effectively couples the
specification of an application protocol with a specific lower-level
...
... Translating Encodings ...
... The quoted-printable and base64 encodings are designed so that
conversion between them is possible. The only issue that arises in
such a conversion is the handling of hard line breaks ...
... such a conversion is the handling of hard line breaks in quoted-
printable encoding output. When converting from quoted-printable to
base64 ...
... affect the treatment of CRLFs, given that the representation of
newlines varies greatly from system to system, and the relationship
between content-transfer-encodings and character sets. A canonical
...
... character sets. A canonical
model for encoding is presented in RFC 2049draft for this reason.
...
...
The Quoted-Printable encoding is intended to represent data that
largely consists of octets that correspond to printable characters in
the US-ASCII ...
...
In this encoding, octets are to be represented as determined by the
following rules:
...
... ASCII EQUAL SIGN) can be represented by "=3D". This
rule must be followed except when the following rules
allow an alternative encoding. ...
... CRLF sequence, in the Quoted-Printable encoding. Since
the canonical representation of media types ...
... and to be displayed to the user) can occur in the
quoted-printable encoding of such types. Sequences
like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely
appear in non-text data represented in quoted-
...
... rather than converting to canonical form first,
encoding, and then converting back to local
representation. In particular, this may apply to plain
text material on systems that use newline conventions
...
... implementation optimization is permissible, but only
when the combined canonicalization-encoding step is
equivalent to performing the three steps separately.
...
... (Soft Line Breaks) The Quoted-Printable encoding
REQUIRES that encoded lines be no more than 76
characters long. If longer lines are to be encoded
...
... characters long. If longer lines are to be encoded
with the Quoted-Printable encoding, "soft" line breaks
must be used. An equal sign as the last character on a
...
...
This can be represented, in the Quoted-Printable encoding, as:
...
... Since the hyphen character ("-") may be represented as itself in the
Quoted-Printable encoding, care must be taken, when encapsulating a
quoted-printable encoded body inside one or more multipart entities,
...
...
NOTE: The quoted-printable encoding represents something of a
compromise between readability and reliability in transport ...
... transport. Bodies
encoded with the quoted-printable encoding will work reliably over
most mail gateways, but may not work perfectly over a few gateways ...
... EBCDIC. A higher level of
confidence is offered by the base64 Content-Transfer-Encoding. A way
to get reasonably reliable transport through EBCDIC ...
... newline conventions. If such alterations are likely to constitute a
corruption of the data, it is probably more sensible to use the
base64 encoding rather than the quoted-printable encoding.
...
... NOTE: Several kinds of substrings cannot be generated according to
the encoding rules for the quoted-printable content-transfer-
encoding ...
... encoding rules for the quoted-printable content-transfer-
encoding, and hence are formally illegal if they appear in the output
of a quoted-printable encoder ...
... quoted-printable part of a message without itself
having been subjected to quoted-printable encoding. A
reasonable approach by a robust implementation might be
to include the "=" character and the following
...
... found in incoming, encoded data, a robust
implementation might nevertheless decode the lines, and
might report the erroneous encoding to the user. ...
... Base64 Content-Transfer-Encoding ...
...
The Base64 Content-Transfer-Encoding is designed to represent
arbitrary sequences of octets in a form that need not be humanly
readable. The encoding ...
... Content-Transfer-Encoding is designed to represent
arbitrary sequences of octets in a form that need not be humanly
readable. The encoding and decoding algorithms are simple, but the
encoded data are consistently only about 33 percent larger than the
...
... algorithms are simple, but the
encoded data are consistently only about 33 percent larger than the
unencoded data. This encoding is virtually identical to the one used
in Privacy Enhanced Mail (PEM ...
... versions of EBCDIC. Other popular encodings, such as the encoding
used by the uuencode utility, Macintosh ...
... versions of EBCDIC. Other popular encodings, such as the encoding
used by the uuencode utility, Macintosh binhex 4.0 [RFC-1741 ...
... Macintosh binhex 4.0 [RFC-1741], and
the base85 encoding specified as part of Level 2 PostScript, do not
share these properties, and thus do not fulfill the portability
...
... share these properties, and thus do not fulfill the portability
requirements a binary transport encoding for mail must meet.
...
... of which is translated into a single digit in the base64 alphabet.
When encoding a bit stream via the base64 encoding, the bit stream ...
... When encoding a bit stream via the base64 encoding, the bit stream
must be presumed to be ordered with the most-significant-bit ...
... Special processing is performed if fewer than 24 bits are available
at the end of the data being encoded. A full encoding quantum is
always completed at the end of a body. When fewer than 24 input bits
...
... base64
input is an integral number of octets, only the following cases can
arise: (1) the final quantum of encoding input is an integral
multiple of 24 bits; here, the final unit of encoded output will be
...
... 24 bits; here, the final unit of encoded output will be
an integral multiple of 4 characters with no "=" padding, (2) the
final quantum of encoding input is exactly 8 bits; here, the final
unit of encoded output will be two characters followed by two "="
...
... 8 bits; here, the final
unit of encoded output will be two characters followed by two "="
padding characters, or (3) the final quantum of encoding input is
exactly 16 bits; here, the final unit of encoded output will be three
...
...
Care must be taken to use the proper octets for line breaks if base64
encoding is applied directly to text material that has not been
converted to canonical form. In particular, text line breaks ...
... line breaks must be
converted into CRLF sequences prior to base64 encoding. The
important thing to note is that this may be done directly by the
encoder ...
... delimiters within base64-encoded bodies within multipart entities
because no hyphen characters are used in the base64 encoding.
...
... Using the MIME-Version, Content-Type, and Content-Transfer-Encoding
header fields, it is possible to include, in a standardized way,
...
... "Content-Transfer-Encoding" ":" mechanism ...
... encoding ...
