ASR
Click on the red underlined text to get to the source
... control of network elements that perform Automated Speech Recognition
(ASR), speaker identification or verification (SI/SV), and rendering
...
... applications can benefit from having automatic speech recognition
(ASR) and text-to-speech (TTS) processing available as a distributed,
...
... network resource. This requirements document limits its focus to the
distributed control of ASR, SI/SV, and TTS servers.
...
... range of systems that can benefit from a unified
approach to control of TTS, ASR, and SI/SV. These include
environments such as Voice over IP ...
... +------------+ / \ +-------------+
| Media |/ SPEECHSC \---| ASR, SI/SV, |
| Processing |-------------------------| and/or TTS ...
... VoiceXML browser, or other control entity.
The "ASR, SI/SV, and/or TTS Server" is a network element ...
... TTS) or return recognition
results in response to an RTP stream as input (ASR, SI/SV). The
"Application Server ...
... Media Processing Entity or
the Application Server to control the ASR or TTS Server using
SPEECHSC ...
... VoiceXML [11] gateway may combine the ASR and TTS functions on the
same platform as the Media Processing Entity ...
... example use cases of the SPEECHSC, one each for TTS, ASR, and SI/SV.
They are intended to be illustrative only, and not to imply any
...
... application server using the SPEECHSC framework to supply
an ASR-based user interface through an Interactive Voice Response
...
... ASR server and uses SPEECHSC to control
the ASR server.
When, for example, the user speaks the name of a stock in response to
...
... an IVR prompt, the SPEECHSC ASR server attempts recognition of the
name, and returns the results to the VXML gateway. The VXML gateway ...
... RTP channel for TTS, an inbound for ASR, and a different inbound for
SI/SV (e.g., if processed by different elements ...
... TTS of
limited utility. Speech-impaired users may be unable to make use of
ASR or SI/SV capabilities. Therefore, systems employing SPEECHSC
...
... TTS should be
identifiable as TTS output, and the recognized utterance of ASR
should be identifiable as having been produced by ASR processing.
...
... TTS output, and the recognized utterance of ASR
should be identifiable as having been produced by ASR processing.
...
... ASR Requirements ...
... Media Processing Entity or
Application Server to request the ASR Server to perform automatic
speech recognition on an RTP stream, returning the results over
...
...
The SPEECHSC framework assumes that all ASR servers support the
VoiceXML speech recognition grammar specification ...
...
The SPEECHSC framework assumes all ASR servers are capable of
accepting grammar specifications either "by value" (embedded in the
protocol) or "by reference" (e.g., by de-referencing a URI ...
... The SPEECHSC framework MUST support a method directing the ASR Server
to capture the input media stream ...
... output requires more than energy detection from the user's direction.
Many advanced systems halt the media towards the user by employing
the ASR engine to decide if an utterance is likely to be real speech,
as opposed to a cough, for example.
...
... To achieve low latency between utterance detection and halting of
playback, many implementations combine the speaking and ASR
functions. The SPEECHSC framework MUST support such full-duplex ...
... ASR and verification simultaneously). Using ASR and verification on
the same utterance is in fact the only way to support rolling or
...
...
o Combination in series of engines that may then act on the input or
output of ASR, TTS, or Speaker recognition engines. The control
MAY then extend beyond such engines to include other audio ...
... modification.
ASR has an entirely different set of characteristics. For barge-in
support, ASR requires real-time ...
... ASR has an entirely different set of characteristics. For barge-in
support, ASR requires real-time return of intermediate results.
Barring the discovery of a good reuse model for an existing protocol,
...
