1. Overview
In SIP-based media networks (RFC 3261prop [10]), there is a need to provide basic network media services. Such services include playing announcements, initiating a media mixing session (conference), and prompting and collecting information with a user. These services are basic in nature, are few in number, and fundamentally have not changed in 25 years of enhanced telephony services. Moreover, given their elemental nature, one would not expect them to change in the future. Multifunction media servers provide network media services to clients using server protocols such as SIP, often in conjunction with markup languages such as VoiceXML [20] and MSCML [21]. This document describes how to identify to a multifunction media server what sort of session the client is requesting, without modifying the SIP protocol. It is critically important to note that the mechanism described here in no way modifies the SIP protocol, the meaning, or definition of a SIP Request URI, or does it put any restrictions, in any way, on devices that do not implement this convention. Announcements are media played to the user. Announcements can be static media files, media files generated in real-time, media streams generated in real-time, multimedia objects, or combinations of the above. Media mixing is the act of mixing different RTP streams, as described in RFC 3550std64 [13]. Note that the service described here suffices for simple mixing of media for a basic conferencing service. This service does not address enhanced conferencing services, such as floor control, gain control, muting, subconferences, etc. MSCML [21] addresses enhanced conferencing. However, that is beyond the scope of this document. Interested readers should read conferencing- framework [22] for details on the IETF SIP conferencing framework. Prompt and collect is where the server prompts the user for some information, as in an announcement, and then collects the user's response. This can be a one-step interaction, for example by playing an announcement, "Please enter your pass code", followed by collecting a string of digits. It can also be a more complex interaction, specified, for example, by VoiceXML [20] or MSCML [21].
1.1. Conventions Used in This Document
RFC 2119 [6] the interpretations for the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" found in this document.
