Integrating Nuance and Trindikit David Hjelm 2003-03-20.

Slides:



Advertisements
Similar presentations
TSpaces Services Suite: Automating the Development and Management of Web Services Presenter: Kevin McCurley IBM Almaden Research Center Contact: Marcus.
Advertisements

Database System Concepts and Architecture
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
 Copyright Wipro Technologies JSP Ver 1.0 Page 1 Talent Transformation Java Server Pages.
Copyright © 2001 Qusay H. Mahmoud RMI – Remote Method Invocation Introduction What is RMI? RMI System Architecture How does RMI work? Distributed Garbage.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
18-Jun-15 JSP Java Server Pages Reference: Tutorial/Servlet-Tutorial-JSP.html.
Speech recognition grammars as TRINDIKIT resources
Application architectures
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Application architectures
Intelligent Tutoring System Mobile Communication Team Drew Boatwright Nakul Dureja Richard Liou.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Introduction to Java Appendix A. Appendix A: Introduction to Java2 Chapter Objectives To understand the essentials of object-oriented programming in Java.
NETWORK CENTRIC COMPUTING (With included EMBEDDED SYSTEMS)
Configuration Management and Server Administration Mohan Bang Endeca Server.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
Introduction - What is Jini Technology?
Higher Grade Computing Studies 2. Languages and Environments Higher Computing Software Development S. McCrossan 1 Classification of Languages 1. Procedural.
JSP Java Server Pages Softsmith Infotech.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Cli/Serv.: rmiCORBA/131 Client/Server Distributed Systems v Objectives –introduce rmi and CORBA , Semester 1, RMI and CORBA.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
1 Module Objective & Outline Module Objective: After completing this Module, you will be able to, appreciate java as a programming language, write java.
© Copyright by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Tutorial 27 - Phone Book Application Introducing Multimedia.
Dynamic Memory Allocation Conventional array and other data declarations An incorrect attempt to size memory dynamically Requirement for dynamic allocation.
Object-Oriented Programming (OOP). Implementing an OOD in Java Each class is stored in a separate file. All files must be stored in the same package.
© 2005 Avaya Inc. All rights reserved. Using Context-Awareness and User Negotiation for Intelligent Dialing in Enterprise Communications Amogh Kavimandan.
Computer Science Department UoC. Outline Project Teams Key Points description Suggested Task Delegation Files Needed & previous work.
PVSSProxy The first piece of the MACS procedure framework (ProShell) Angela Brett.
C++ Programming Language Lecture 2 Problem Analysis and Solution Representation By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
C++ Basics C++ is a high-level, general purpose, object-oriented programming language.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
Integrating EPICS and LabVIEW on Windows using DCOM Freddie Akeroyd ISIS Computing Group.
I Power Higher Computing Software Development Development Languages and Environments.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Speech in, speech out. 24 listopad 2006WS0607 – elevator2/15 Nuance server compiled recognition grammar, master language package, licence manager Nuance.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Basics of JDBC Session 14.
 Software Development Life Cycle  Software Development Tools  High Level Programming:  Structures  Algorithms  Iteration  Pseudocode  Order of.
Reconfigurable Communication Interface Between FASTER and RTSim Dec0907.
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
CS223: Software Engineering
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Java Programming: Advanced Topics 1 Building Web Applications Chapter 13.
Bayu Priyambadha, S.Kom. Static content  Web Server delivers contents of a file (html) 1. Browser sends request to Web Server 3. Web Server sends HTML.
ECHO Technical Interchange Meeting 2013 Timothy Goff 1 Raytheon EED Program | ECHO Technical Interchange 2013.
Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.
TTCN-3 Testing and Test Control Notation Version 3.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
User-Written Functions
Outline Introduction to the Phalanger System
ODBC, OCCI and JDBC overview
JDBC Database Management Database connectivity
Introduction to Classes and Objects
Chapter 5 Remote Procedure Call
Chapter 3: Windows7 Part 4.
Computer Programming.
CSE 1020:Programming by Delegation
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Objectives In this lesson you will learn about: Need for servlets
1/2/2019 9:19 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
David Cyphert CS 2310 – Software Engineering
Introduction to Web Services
Presentation transcript:

Integrating Nuance and Trindikit David Hjelm

Nuance Speech recognition, voice authentication and text-to- speech engines API:s to create speech-recognition and text-to-speech clients in Java, C++ and C

Trindikit Framework for building dialogue systems Written in SICStus Prolog Contains predefined modules for input, output, interpretation, etc…

Trindikit text input/output modules input_simpletext reads input from screen and stores in input variable. output_simpletext reads output from output variable and prints on screen To use Nuance speech recognition and speech synthesis instead, input- and output modules must communicate with a Nuance process, since no Nuance SICStus APIs exist.

Solution: OAA OAA enables communication between Java and SICStus SICStus and Java processes register as agents to the same OAA facilitator. Each agent declares a set of solvables to facilitator. Solvables are declared using prolog-like syntax. Agents can pose queries to OAA community by calling solve(Query). Facilitator will try to find an agent which has declared a solvable that matches with Query. In that case the Query is delegated to the Agent which will try to solve it.

OAA Nuance Agents These OAA agents are provided in the latest distribution of Trindikit: –OAANuanceSpeechChannel – OAA java agent which provides NuanceSpeechChannel (Nuance Java API) functionality to OAA community –oaa_recserver – OAA prolog agent which can control a Nuance recognition server –oaa_vocalizer – OAA prolog agent which can control a Nuance TTS server

Trindikit Java OAA agents To simplify the writing of new OAA agents a base class for OAA agents, OAAAgent, is used. This is extended by agent implementing classes. A OAAAgent has of a number of states which it can be in. For each state a set of solvables is defined. If the facilitator delegates a solve(Query) request to the agent, the agent will iterate through the solvables defined for the state the agent currently is in, to find one that unifies with Query. The code that solves a solve(Query) request is implemented in a wrapper class OAASolver which defines the method solve. Each OAASolver defines a specific solvable. OAASolvers are added to the agent via the addSolver method which defines the pre-state(s) and post-state(s) of the OAASolver.

OAANuanceSpeechChannel OAANuanceSpeechChannel is a java OAA agent which extends OAAAgent. Another implemented agent is OAAVcr (used in the ILT) project, which functions as a software VCR agent which can record TV programs (captured using a TV-card)

OAANuanceSpeechChannel states NuanceSpeechChannel offers different functionality depending on its configuration. For example, if it uses a telephony-based audio provider, a call has to be answered before recognition can take place. This is mirrored by the four states (represented as int constants) of OAANuanceSpeechChannel which are: 0 - STOPPED There is no speech channel yet 1 - TEL_IDLE A speech channel using a telephony audio provider has been created. Currently not in a call. 2 - TEL_RUNNING A speech channel using a telephony audio provider has been created. Currently in a call. 3 - NATIVE_RUNNING A speech channel using the native audio provider has been created.

OAANuanceSpeechChannel solvables The solvables of OAANuanceSpeechChannel are: nscCreate(+Package,+Parameters) (creates a new SpeechChannel) pre-state STOPPED post-state TEL_IDLE or NATIVE_RUNNING (depending on Parameters) nscClose (closes the SpeechChannel) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state STOPPED nscPlayAndRecognize(+Grammar,?RecResult) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscRecognizeFile(+Filename,+Grammar,?RecResult) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscAppendTTS(+Text) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state

OAANuanceSpeechChannel solvables nscPlay(+Bool) pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscStartPlay pre-state TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscSetParameter(+Name,+Value) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscGetParameter(+Name,?Value) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state nscGetAllGrammars(?Grammars) pre-state TEL_IDLE, TEL_RUNNING or NATIVE_RUNNING post-state same as pre-state

SpeechChannel events Some NuanceSpeechChannel methods throw events, e.g. when the user starts speaking. When these events occur OAANuanceSpeechChannel will post a query to the OAA community consisting of an as close as possible transcription of the actual java event + a 'nsc' prefix. Other agents can declare these as solvables and implement code that handles the events. nscStartOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs) nscEndOfSpeechEvent(SafeOffsetSecs,ActualOffsetSecs) nscPartialResultEvent(RecResult) nscPlaybackStartedEvent nscPlaybackStoppedEvent(Reason,Tones) nscTerminationEvent(Reason) nscCallConnectedEvent --todo nscDTMFEvent(Tones) --todo nscHungupEvent(Side,Reason) --todo

oaa_recserver oaa_recserver is a prolog OAA agent which controls a nuance recognition server process. Solvables are: nrsStart(+Packages,+Params) Starts a recserver process using packages Packages and parameters Params. Format of Packages and Params is described below. nrsStop Stops the recserver process. nrsGetPackages(?Packages) Returns the currently loaded recognition packages. nrsGetState(?State). Returns current state (stopped or running)

oaa_vocalizer oaa_vocalizer is a prolog OAA agent which controls a nuance vocalizer process. Solvables are: nvocStart(+Params) Starts a vocalizer process. Params is any command line arguments. nvocStop Stops the vocalizer process. nvocGetState(?State) Returns current state (stopped or running)

Integrating it into Trindikit Trindikit provides a specific OAA resource, oaag, which can be used to make queries to the OAA community. Input and output modules specific for OAA+Nuance have been written which make use of oaag. A speech recognition grammar resource type, asr_grammar, keeps track of which speech recognition grammar Nuance should try to load.

input_nuance_basic_oaa Calls a OAA agent which performs speech recognition. Also communicates with a nuance recserver OAA agent. Assumes that if a nuance grammar contains top level symbol '.Top' it has been compiled into a recognition package named 'top'. To perform recognition using package 'top', a trindikit resource of type asr_grammar should be selected in the configuration file. For all selected resources of type asr_grammar, their corresponding packages will be loaded onto a recserver. The recclients are created at runtime.

output_nuance_basic_oaa Calls a OAA Agent which performs tts synthesis. Also communicates with a vocalizer OAA agent.

Future work real ASR-grammars in asr_grammar resources Trindikit integration with Regulus for converting feature structure grammars to Nuance grammars Use of dynamic grammar compilation, so that no Nuance grammars have to be written and compiled in advance. Integrate with asynchronous Trindikit Intelligent barge-in etcetera