2. Document Conventions
This document makes use of the document conventions defined in BCP
14, RFC 2119 [4]. That provides the interpretation of capitalized
imperative words like MUST, SHOULD, etc.
This document also uses notation defined in STD 9, RFC 959std9 [3]. In
particular, the terms "reply", "user", "NVFS" (Network Virtual File
System), "file", "pathname", "FTP commands", "DTP" (data transfer
process), "user-FTP process", "user-PI" (user protocol interpreter),
"user-DTP", "server-FTP process", "server-PI", "server-DTP", "mode",
"type", "NVT" (Network Virtual Terminal), "control connection", "data
connection", and "ASCII", are all used here as defined there.
Syntax required is defined using the Augmented BNF defined in [5].
Some general ABNF definitions that are required throughout the
document will be defined later in this section. At first reading, it
may be wise to simply recall that these definitions exist here, and
skip to the next section.
This document imports the core ABNF definitions given in Appendix A
of [5]. There definitions will be found for basic ABNF elements like
ALPHA, DIGIT, SP, etc. The following terms are added for use in this
document.
TCHAR = VCHAR / SP / HTAB ; visible plus white space
RCHAR = ALPHA / DIGIT / "," / "." / ":" / "!" /
"@" / "#" / "$" / "%" / "^" /
"&" / "(" / ")" / "-" / "_" /
"+" / "?" / "/" / "\" / "'" /
DQUOTE ; <"> -- double quote character (%x22)
SCHAR = RCHAR / "=" ;
The VCHAR (from [5]), RCHAR, SCHAR, and TCHAR types give basic
character types from varying sub-sets of the ASCII character set for
use in various commands and responses.
token = 1*RCHAR
A "token" is a string whose precise meaning depends upon the context
in which it is used. In some cases it will be a value from a set of
possible values maintained elsewhere. In others it might be a string
invented by one party to an FTP conversation from whatever sources it
finds relevant.
Note that in ABNF, string literals are case insensitive. That
convention is preserved in this document, and implies that FTP
commands added by this specification have names that can be
represented in any case. That is, "MDTM" is the same as "mdtm",
"Mdtm" and "MdTm" etc. However note that ALPHA, in particular, is
case sensitive. That implies that a "token" is a case sensitive
value. That implication is correct, except where explicitly stated
to the contrary in this document, or in some other specification that
defines the values this document specifies be used in a particular
context.
2.2. Pathnames
Various FTP commands take pathnames as arguments, or return pathnames
in responses. When the MLST command is supported, as indicated in
the response to the FEAT command [6], pathnames are to be transferred
in one of the following two formats.
pathname = utf-8-name / raw
utf-8-name = <a UTF-8 encoded Unicode string>
raw = <any string that is not a valid UTF-8 encoding>
Which format is used is at the option of the user-PI or server-PI
sending the pathname. UTF-8 encodings [2] contain enough internal
structure that it is always, in practice, possible to determine
whether a UTF-8 or raw encoding has been used, in those cases where
it matters. While it is useful for the user-PI to be able to
correctly display a pathname received from the server-PI to the user,
it is far more important for the user-PI to be able to retain and
retransmit the identical pathname when required. Implementations are
advised against converting a UTF-8 pathname to a local charset that
isn't capable of representing the full Unicode character repertoire,
and then attempting to invert the charset translation later. Note
that ASCII is a subset of UTF-8. See also [1].
Unless otherwise specified, the pathname is terminated by the CRLF
that terminates the FTP command, or by the CRLF that ends a reply.
Any trailing spaces preceding that CRLF form part of the name.
Exactly one space will precede the pathname and serve as a separator
from the preceding syntax element. Any additional spaces form part
of the pathname. See [7] for a fuller explanation of the character
encoding issues. All implementations supporting MLST MUST support
[7].
Note: for pathnames transferred over a data connection, there is no
way to represent a pathname containing the characters CR and LF in
sequence, and distinguish that from the end of line indication.
Hence, pathnames containing the CRLF pair of characters cannot be
transmitted over a data connection. Data connections only contain
file names transmitted from server-FTP to user-FTP as the result of
one of the directory listing commands. Files with names containing
the CRLF sequence must either have that sequence converted to some
other form, such that the other form can be recognised and be
correctly converted back to CRLF, or be omitted from the listing.
Implementations should also beware that the FTP control connection
uses Telnet NVT conventions [8], and that the Telnet IAC character,
if part of a pathname sent over the control connection, MUST be
correctly escaped as defined by the Telnet protocol.
NVT also distinguishes between CR, LF, and the end of line CRLF, and
so would permit pathnames containing the pair of characters CR and LF
to be correctly transmitted. However, because such a sequence cannot
be transmitted over a data connection (as part of the result of a
LIST, NLST, or MLSD command), such pathnames are best avoided.
Implementors should also be aware that, although Telnet NVT
conventions are used over the control connections, Telnet option
negotiation MUST NOT be attempted. See section 4.1.2.12 of [9].
Except where TVFS is supported (see section 6), this specification
imposes no syntax upon pathnames. Nor does it restrict the character
set from which pathnames are created. This does not imply that the
NVFS is required to make sense of all possible pathnames. Server-PIs
may restrict the syntax of valid pathnames in their NVFS in any
manner appropriate to their implementation or underlying file system.
Similarly, a server-PI may parse the pathname and assign meaning to
the components detected.
2.2.2. Wildcarding
For the commands defined in this specification, all pathnames are to
be treated literally. That is, for a pathname given as a parameter
to a command, the file whose name is identical to the pathname given
is implied. No characters from the pathname may be treated as
special or "magic", thus no pattern matching (other than for exact
equality) between the pathname given and the files present in the
NVFS of the server-FTP is permitted.
Clients that desire some form of pattern matching functionality must
obtain a listing of the relevant directory, or directories, and
implement their own file name selection procedures.
2.3. Times
The syntax of a time value is:
time-val = 14DIGIT ["." 1*DIGIT]
The leading, mandatory, fourteen digits are to be interpreted as, in
order from the leftmost, four digits giving the year, with a range of
1000--9999, two digits giving the month of the year, with a range of
01--12, two digits giving the day of the month, with a range of
01--31, two digits giving the hour of the day, with a range of
00--23, two digits giving minutes past the hour, with a range of
00--59, and finally, two digits giving seconds past the minute, with
a range of 00--60 (with 60 being used only at a leap second). Years
in the tenth century, and earlier, cannot be expressed. This is not
considered a serious defect of the protocol.
The optional digits, which are preceded by a period, give decimal
fractions of a second. These may be given to whatever precision is
appropriate to the circumstance, however implementations MUST NOT add
precision to time-vals where that precision does not exist in the
underlying value being transmitted.
Symbolically, a time-val may be viewed as
YYYYMMDDHHMMSS.sss
The "." and subsequent digits ("sss") are optional. However the "."
MUST NOT appear unless at least one following digit also appears.
Time values are always represented in UTC (GMT), and in the Gregorian
calendar regardless of what calendar may have been in use at the date
and time indicated at the location of the server-PI.
The technical differences among GMT, TAI, UTC, UT1, UT2, etc., are
not considered here. A server-FTP process should always use the same
time reference, so the times it returns will be consistent. Clients
are not expected to be time synchronized with the server, so the
possible difference in times that might be reported by the different
time standards is not considered important.
Section 4.2 of [3] defines the format and meaning of replies by the
server-PI to FTP commands from the user-PI. Those reply conventions
are used here without change.
error-response = error-code SP *TCHAR CRLF
error-code = ("4" / "5") 2DIGIT
Implementors should note that the ABNF syntax used in this document
and in other FTP related documents (but not used in [3]), sometimes
shows replies using the one-line format. Unless otherwise explicitly
stated, that is not intended to imply that multi-line responses are
not permitted. Implementors should assume that, unless stated to the
contrary, any reply to any FTP command (including QUIT) may use the
multi-line format described in [3].
Throughout this document, replies will be identified by the three
digit code that is their first element. Thus the term "500 reply"
means a reply from the server-PI using the three digit code "500".
2.5. Interpreting Examples
In the examples of FTP dialogs presented in this document, lines that
begin "C> " were sent over the control connection from the user-PI to
the server-PI, lines that begin "S> " were sent over the control
connection from the server-PI to the user-PI, and each sequence of
lines that begin "D> " was sent from the server-PI to the user-PI
over a data connection created just to send those lines and closed
immediately after. No examples here show data transferred over a
data connection from the client to the server. In all cases, the
prefixes shown above, including the one space, have been added for
the purposes of this document, and are not a part of the data
exchanged between client and server.