RFC 4313:Requirements for Distributed Control of A...
RFC-Ref

TTS


Click on the red underlined text to get to the source

... text into audio, also known as Text-to-Speech (TTS). Many multimedia applications can benefit from having automatic speech recognition ...
... (ASR) and text-to-speech (TTS) processing available as a distributed, network resource. This requirements ...
... distributed control of ASR, SI/SV, and TTS servers. There is a broad range ...
... There is a broad range of systems that can benefit from a unified approach to control of TTS, ASR, and SI/SV. These include ...
... To date, there are a number of proprietary ASR and TTS APIs, as well as two IETF documents ...


... ASR, SI/SV, | | Processing |-------------------------| and/or TTS | RTP | Entity ...
... The "ASR, SI/SV, and/or TTS Server" is a network element that performs the back-end ...
... performs the back-end speech processing. It may generate an RTP stream as output based on text input (TTS) or return recognition results in response to an RTP stream as input (ASR ...
... the Application Server to control the ASR or TTS Server using SPEECHSC as a control protocol ...
... 11] gateway may combine the ASR and TTS functions on the same platform as the Media Processing Entity. Note that VoiceXML ...
... SPEECHSC. The following subsections provide a number of example use cases of the SPEECHSC, one each for TTS, ASR, and SI/SV. ...
... TTS Example ...
... +-------+ | |\_ | SPEECHSC| +-----------+ \ | TTS | \__ | Server | SPEECHSC ...
... would most likely just output a busy signal to the POTS phone. However, with SPEECHSC access to a TTS server, it can provide a spoken error message. The VoIP ...
... SPEECHSC server, open an RTP stream between itself and the server, and issue a TTS request for the error message, which will be played to the user on the POTS phone. ...
... IP |=========| SPEECHSC| | Phone | | TTS | | |_________| Server | | | SPEECHSC ...


... RTP channel for TTS, an inbound for ASR, and a different inbound for SI/SV ...
... modalities of SPEECHSC may be inconvenient or simply inappropriate for disabled users. Hearing-impaired individuals may find TTS of limited utility. Speech-impaired users may be unable to make use of ASR ...
... SPEECHSC framework what speech process produced the output. For example, an RTP stream containing the spoken output of TTS should be identifiable as TTS output, and the recognized utterance of ASR ...
... RTP stream containing the spoken output of TTS should be identifiable as TTS output, and the recognized utterance of ASR should be identifiable as having been produced by ASR ...


... TTS Requirements ...
... Application Server, using a control protocol, to request the TTS Server to play back text as voice in an RTP stream. ...
... The SPEECHSC framework MAY assume that all TTS servers are capable of reading plain text. For reading plain text, framework MUST allow the ...
... tags. The framework assumes all TTS servers are capable of reading SSML ...
... formatted text. Internationalization of TTS in the SPEECHSC framework, including multi-lingual output within a single utterance, is accomplished via SSML ...
... The SPEECHSC framework assumes all TTS servers accept text over the SPEECHSC connection ...
... element (ASR, TTS, etc.). ...


... system is that user perception of the quality of the interaction depends strongly on the ability of the user to interrupt a prompt or rendered TTS with speech. Interrupting, or barging, the speech output requires more than energy detection from the user's direction. Many advanced systems halt the media towards the user by employing ...
... o Combination in series of engines that may then act on the input or output of ASR, TTS, or Speaker recognition engines. The control MAY then extend beyond such engines to include other audio input ...


... SDP. TTS looks very much like playing back a file. Extending RTSP looks promising for when one requires VCR controls ...



Google
Web
RFC-Ref