Download presentation
Presentation is loading. Please wait.
1
Integrating Nuance and Trindikit David Hjelm 2003-03-20
2
Nuance Speech recognition, voice authentication and text-to- speech engines API:s to create speech-recognition and text-to-speech clients in Java, C++ and C
3
Trindikit Framework for building dialogue systems Written in SICStus Prolog Contains predefined modules for input, output, interpretation, etc…
4
Trindikit text input/output modules input_simpletext reads input from screen and stores in input variable. output_simpletext reads output from output variable and prints on screen To use Nuance speech recognition and speech synthesis instead, input- and output modules must communicate with a Nuance process, since no Nuance SICStus APIs exist.
5
Solution: OAA OAA enables communication between Java and SICStus SICStus and Java processes register as agents to the same OAA facilitator. Each agent declares a set of solvables to facilitator. Solvables are declared using prolog-like syntax. Agents can pose queries to OAA community by calling solve(Query). Facilitator will try to find an agent which has declared a solvable that matches with Query. In that case the Query is delegated to the Agent which will try to solve it.
6
OAA Nuance Agents These OAA agents are provided in the latest distribution of Trindikit: –OAANuanceSpeechChannel – OAA java agent which provides NuanceSpeechChannel (Nuance Java API) functionality to OAA community –oaa_recserver – OAA prolog agent which can control a Nuance recognition server –oaa_vocalizer – OAA prolog agent which can control a Nuance TTS server
7
Trindikit Java OAA agents To simplify the writing of new OAA agents a base class for OAA agents, OAAAgent, is used. This is extended by agent implementing classes. A OAAAgent has of a number of states which it can be in. For each state a set of solvables is defined. If the facilitator delegates a solve(Query) request to the agent, the agent will iterate through the solvables defined for the state the agent currently is in, to find one that unifies with Query. The code that solves a solve(Query) request is implemented in a wrapper class OAASolver which defines the method solve. Each OAASolver defines a specific solvable. OAASolvers are added to the agent via the addSolver method which defines the pre-state(s) and post-state(s) of the OAASolver.
8
OAANuanceSpeechChannel OAANuanceSpeechChannel is a java OAA agent which extends OAAAgent. Another implemented agent is OAAVcr (used in the ILT) project, which functions as a software VCR agent which can record TV programs (captured using a TV-card)
9
OAANuanceSpeechChannel states NuanceSpeechChannel offers different functionality depending on its configuration. For example, if it uses a telephony-based audio provider, a call has to be answered before recognition can take place. This is mirrored by the four states (represented as int constants) of OAANuanceSpeechChannel which are: 0 - STOPPED There is no speech channel yet 1 - TEL_IDLE A speech channel using a telephony audio provider has been created. Currently not in a call. 2 - TEL_RUNNING A speech channel using a telephony audio provider has been created. Currently in a call. 3 - NATIVE_RUNNING A speech channel using the native audio provider has been created.
10
OAANuanceSpeechChannel solvables The solvables of OAANuanceSpeechChannel are: nscCreate(+Package,+Parameters) (creates a new SpeechChannel) pre-state STOPPED post-state TEL_IDLE or NATIVE_RUNNING (depending on Parameters) nscClose (closes the SpeechChannel) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state STOPPED nscPlayAndRecognize(+Grammar,?RecResult) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscRecognizeFile(+Filename,+Grammar,?RecResult) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscAppendTTS(+Text) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state
11
OAANuanceSpeechChannel solvables nscPlay(+Bool) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscStartPlay pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscSetParameter(+Name,+Value) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscGetParameter(+Name,?Value) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscGetAllGrammars(?Grammars) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state
12
SpeechChannel events Some NuanceSpeechChannel methods throw events, e.g. when the user starts speaking. When these events occur OAANuanceSpeechChannel will post a query to the OAA community consisting of an as close as possible transcription of the actual java event + a 'nsc' prefix. Other agents can declare these as solvables and implement code that handles the events. nscStartOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs) nscEndOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs) nscPartialResultEvent(RecResult) nscPlaybackStartedEvent nscPlaybackStoppedEvent(Reason,Tones) nscTerminationEvent(Reason) nscCallConnectedEvent --todo nscDTMFEvent(Tones) --todo nscHungupEvent(Side,Reason) --todo
13
oaa_recserver oaa_recserver is a prolog OAA agent which controls a nuance recognition server process. Solvables are: nrsStart(+Packages,+Params) Starts a recserver process using packages Packages and parameters Params. Format of Packages and Params is described below. nrsStop Stops the recserver process. nrsGetPackages(?Packages) Returns the currently loaded recognition packages. nrsGetState(?State). Returns current state (stopped or running)
14
oaa_vocalizer oaa_vocalizer is a prolog OAA agent which controls a nuance vocalizer process. Solvables are: nvocStart(+Params) Starts a vocalizer process. Params is any command line arguments. nvocStop Stops the vocalizer process. nvocGetState(?State) Returns current state (stopped or running)
15
Integrating it into Trindikit Trindikit provides a specific OAA resource, oaag, which can be used to make queries to the OAA community. Input and output modules specific for OAA+Nuance have been written which make use of oaag. A speech recognition grammar resource type, asr_grammar, keeps track of which speech recognition grammar Nuance should try to load.
16
input_nuance_basic_oaa Calls a OAA agent which performs speech recognition. Also communicates with a nuance recserver OAA agent. Assumes that if a nuance grammar contains top level symbol '.Top' it has been compiled into a recognition package named 'top'. To perform recognition using package 'top', a trindikit resource of type asr_grammar should be selected in the configuration file. For all selected resources of type asr_grammar, their corresponding packages will be loaded onto a recserver. The recclients are created at runtime.
17
output_nuance_basic_oaa Calls a OAA Agent which performs tts synthesis. Also communicates with a vocalizer OAA agent.
18
Future work real ASR-grammars in asr_grammar resources Trindikit integration with Regulus for converting feature structure grammars to Nuance grammars Use of dynamic grammar compilation, so that no Nuance grammars have to be written and compiled in advance. Integrate with asynchronous Trindikit Intelligent barge-in etcetera
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.