RFC 4313:Requirements for Distributed Control of A...
RFC-Ref

ASR


Click on the red underlined text to get to the source

... control of network elements that perform Automated Speech Recognition (ASR), speaker identification or verification (SI/SV), and rendering ...
... applications can benefit from having automatic speech recognition (ASR) and text-to-speech (TTS) processing available as a distributed, ...
... network resource. This requirements document limits its focus to the distributed control of ASR, SI/SV, and TTS servers. ...
... range of systems that can benefit from a unified approach to control of TTS, ASR, and SI/SV. These include environments such as Voice over IP ...
... network. To date, there are a number of proprietary ASR and TTS APIs, as well ...


... +------------+ / \ +-------------+ | Media |/ SPEECHSC \---| ASR, SI/SV, | | Processing |-------------------------| and/or TTS ...
... VoiceXML browser, or other control entity. The "ASR, SI/SV, and/or TTS Server" is a network element ...
... TTS) or return recognition results in response to an RTP stream as input (ASR, SI/SV). The "Application Server ...
... Media Processing Entity or the Application Server to control the ASR or TTS Server using SPEECHSC ...
... VoiceXML [11] gateway may combine the ASR and TTS functions on the same platform as the Media Processing Entity ...
... example use cases of the SPEECHSC, one each for TTS, ASR, and SI/SV. They are intended to be illustrative only, and not to imply any ...
... application server using the SPEECHSC framework to supply an ASR-based user interface through an Interactive Voice Response ...
... SPEECHSC | |(VXML voice| | ASR | | browser) |=========| Server | +-----------+ RTP ...
... media stream to the SPEECHSC ASR server and uses SPEECHSC to control the ASR ...
... ASR server and uses SPEECHSC to control the ASR server. When, for example, the user speaks the name of a stock in response to ...
... an IVR prompt, the SPEECHSC ASR server attempts recognition of the name, and returns the results to the VXML gateway. The VXML gateway ...


... RTP channel for TTS, an inbound for ASR, and a different inbound for SI/SV (e.g., if processed by different elements ...
... TTS of limited utility. Speech-impaired users may be unable to make use of ASR or SI/SV capabilities. Therefore, systems employing SPEECHSC ...
... TTS should be identifiable as TTS output, and the recognized utterance of ASR should be identifiable as having been produced by ASR processing. ...
... TTS output, and the recognized utterance of ASR should be identifiable as having been produced by ASR processing. ...


... Media streams MAY source & sink from the controlled element (ASR, TTS, etc.). ...


... ASR Requirements ...
... Media Processing Entity or Application Server to request the ASR Server to perform automatic speech recognition on an RTP stream, returning the results over ...
... The SPEECHSC framework assumes that all ASR servers support the VoiceXML speech recognition grammar specification ...
... The SPEECHSC framework assumes all ASR servers are capable of accepting grammar specifications either "by value" (embedded in the protocol) or "by reference" (e.g., by de-referencing a URI ...
... The SPEECHSC framework MUST support a method directing the ASR Server to capture the input media stream ...
... capture the input media stream for later analysis and tuning of the ASR engine. ...


... output requires more than energy detection from the user's direction. Many advanced systems halt the media towards the user by employing the ASR engine to decide if an utterance is likely to be real speech, as opposed to a cough, for example. ...
... To achieve low latency between utterance detection and halting of playback, many implementations combine the speaking and ASR functions. The SPEECHSC framework MUST support such full-duplex ...
... voice verification (using ASR and verification simultaneously). Using ASR and verification ...
... ASR and verification simultaneously). Using ASR and verification on the same utterance is in fact the only way to support rolling or ...
... o Combination in series of engines that may then act on the input or output of ASR, TTS, or Speaker recognition engines. The control MAY then extend beyond such engines to include other audio ...


... modification. ASR has an entirely different set of characteristics. For barge-in support, ASR requires real-time ...
... ASR has an entirely different set of characteristics. For barge-in support, ASR requires real-time return of intermediate results. Barring the discovery of a good reuse model for an existing protocol, ...



Google
Web
RFC-Ref