Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM.

Slides:



Advertisements
Similar presentations
3 Copyright © 2005, Oracle. All rights reserved. Designing J2EE Applications.
Advertisements

© 2007 Avaya Inc. All rights reserved. Interactive Voice and Video Response Applications Dr. Valentine C. Matula
DIREC TV Look Whos Talking Michael Uhlenkamp Call Center Technology Manager Speech to Increase Revenue & Decrease Costs (D103),
Rob Marchand Genesys Telecommunications
1 August 9, David Claiborn SLM Tuning: Lessons Learned.
1 Profit from usage data analytics: Recent trends in gathering and analyzing IVR usage data Vasudeva Akula, Convergys Corporation 08/08/2006.
Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.
© GyrusLogic, Inc AI and Natural Language VUI design Peter Trompetter, VP Global Development GyrusLogic, Inc. Tuesday, August 21 at 1:30 PM - C202.
Managing Speech Projects for Maximum Efficiency Christoph Mosing, Vice President of Professional Services, Envox Worldwide.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB JavaForum.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
VoiceXML and Internet Telephony Kundan Singh and Henning Schulzrinne Columbia University Joint work (in progress) with Daniel,
Multimodal Architecture for Integrating Voice and Ink XML Formats Under the guidance of Dr. Charles Tappert By Darshan Desai, Shobhana Misra, Yani Mulyani,
Thomas Kisner.  Unified Communications Architect at BNSF Railway  Board Member, DFW Unified Communications User Group ◦ Meets 4 th Thursday of Every.
Alicia Abella AT&T Labs – Research Florham Park, New Jersey
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
IBM Proof of Technology Discovering the Value of SOA with WebSphere Process Integration © 2005 IBM Corporation SOA on your terms and our expertise WebSphere.
Separating VUI from business logic Caller Experience-centered design approach Alex Kurganov, CTO Parus Interactive
© GyrusLogic, Inc A Conversational System That Reduces Development Cost Luis Valles, Chief Scientist GyrusLogic, Inc. Monday, August 7 at 1:30 PM.
Should Intelligent Agents Listen and Speak to Us? James A. Larson Larson Technical Services
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.
TRADE SMART Zihao Yu Kevin Bobsein Ashrith Kumar Marpaka Hanzhi Wu Instructor : Prof. Ivan Marsic Partial fulfillment of the course Software Engineering.
SpeechCycle Confidential Confidential 1 Optimizing Natural Language Interfaces: No Data Like More Data SpeechTEK New York, 2007 Jonathan Bloom & Roberto.
How Will You Be Developing Your Next Application? (SIP-01)
Business Requirements Using Unified Modeling Language Eric H. Castain, SVP Internet Services Group, Architecture Wells Fargo March 2005.
Conversational Applications Workshop Introduction Jim Larson.
1 High Resolution Statistical Natural Language Understanding: Tools, Processes, and Issues. Roberto Pieraccini SpeechCycle
Delight QuickBooks Online Banking Internal Support Training QuickBooks Windows 2009/2010 Online Banking.
© GyrusLogic, Inc. Improved call completion with Natural Language Processing Peter Trompetter Vice President Global Development
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Alignment (horizontal / vertical): in center! 0cm (center) Voxeo VoiceObject Overview.
Integrating VoiceXML with SIP services
1 David Thomson The Search for a Dialog Metalanguage that Makes Everybody Happy David Thomson Chair, VoiceXML Tools Committee, SpeechPhone CTO.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
SWE © Solomon Seifu ELABORATION. SWE © Solomon Seifu Lesson 10 Use Case Design.
Outline Grammar-based speech recognition Statistical language model-based recognition Speech Synthesis Dialog Management Natural Language Processing ©
SE: CHAPTER 7 Writing The Program
Voice User Interface
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
(c) University of Washington01-1 CSC 143 Java Programming as Modeling Reading: Ch. 1-6.
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
Page 1 © 2001, Epicentric - All Rights Reserved Epicentric Modular Web Services Alan Kropp Web Services Architect WSRP Technical Committee – March 18,
WEP Presentation for non-IT Steps and roles in software development 2. Skills developed in 1 st year 3. What can do a student in 1 st internship.
UCM201 – Unified Communications for Developers: Building Communications Into Your Applications Kirt Debique General Manager, Microsoft Office Communications.
Introduction to Computational Linguistics
Basic Concepts of Component- Based Software Development (CBSD) Model-Based Programming and Verification.
Quick overview of ASP.NET Ajax Ajax deep-dive Cover some key real-world problems Discuss solutions, patterns, opportunities Lots of demos And more of.
Phone Mashups Integrating Telephony & the Web Irv Shapiro CEO, Ifbyphone, Inc.
Bringing Speech Technologies to the Enterprise Ken Waln C.T.O. and V.P. of Engineering Edify Corporation
How Your Customers Will Pay Online & by Phone
Listener-Control Navigation of VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Computer Simulation of Networks ECE/CSC 777: Telecommunications Network Design Fall, 2013, Rudra Dutta.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
VoiceXML. Nuance Speech Analysis 92% of customer service is through phone. 84% of industrialists believe speech better than web.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Pete LePage Product Manager Internet Explorer Team.
Building Enterprise Applications Using Visual Studio®
CSC 222: Object-Oriented Programming
CSC 222: Object-Oriented Programming
CSC 222: Object-Oriented Programming
Managing Dialogue Julia Hirschberg CS /28/2018.
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Where do we go from here? Research and Commercial Spoken Dialog Systems. Roberto Pieraccini CTO, Tell-Eureka Corporation, New York, NY Juan Huerta IBM T. J. Watson Research Center, Yorktown Heights, NY

The Spoken Dialog Landscape Dialog Linguistics Voice User Interface Dialog Engineering ACADEMIC RESEARCH INDUSTRIAL R&D

Cost Time to market Business model Interoperability Standards Two Different Goals Natural Interaction Freedom of Expression User is in control Task Completion Usability System is in control UNCONSTRAINED NATURAL LANGUAGE UNDERSTANDING MIXED INITIATIVE DIALOG HAND-CRAFTED GRAMMARS DIRECTED DIALOG ACADEMIC RESEARCH INDUSTRIAL R&D

FAQs Aren’t human-like free-form conversational systems more usable? Aren’t human-like free-form conversational systems more usable? It depends. It depends. Speech recognition and understanding technology is still very limited Speech recognition and understanding technology is still very limited Even if we had perfect technology, that won’t guarantee usability. Even if we had perfect technology, that won’t guarantee usability. Two extremes: Human Agents ---- DTMF Two extremes: Human Agents ---- DTMF Human agents are trained to follow well defined scripts Human agents are trained to follow well defined scripts

FAQs Isn’t natural language always the best design choice? Don’t users always want freedom of expression? Isn’t natural language always the best design choice? Don’t users always want freedom of expression? It depends. It depends. More freedom of expression = more speech recognition errors – Users hate errors More freedom of expression = more speech recognition errors – Users hate errors More freedom of expression = more difficult to correct and set dialog back on track More freedom of expression = more difficult to correct and set dialog back on track Some applications do very well without freedom of expression, others don’t. Some applications do very well without freedom of expression, others don’t.

FAQs Directed dialog or mixed initiative? Shouldn't users always be able to control the course of dialog? Directed dialog or mixed initiative? Shouldn't users always be able to control the course of dialog? It depends It depends Without guidance most users will be lost and wouldn’t know what to say or what the capabilities of the system are. Without guidance most users will be lost and wouldn’t know what to say or what the capabilities of the system are. Structured interactions (as opposed to free form) show a reduced rate of speech disfluences (Oviatt, 1995). Less disfluences = better ASR accuracy Structured interactions (as opposed to free form) show a reduced rate of speech disfluences (Oviatt, 1995). Less disfluences = better ASR accuracy Directed prompts allow to predict responses and tune grammars = less error prone interactions Directed prompts allow to predict responses and tune grammars = less error prone interactions

A little bit of history 1995 – The Birth of the Dialog Industry At that time the research world was searching for the holy grail of free form, natural language, spoken interaction (DARPA ATIS, Communicator) At that time the research world was searching for the holy grail of free form, natural language, spoken interaction (DARPA ATIS, Communicator) A couple of startup companies took a step back and realized that well structured directed dialog can outperform free form interactions for certain types of applications. A couple of startup companies took a step back and realized that well structured directed dialog can outperform free form interactions for certain types of applications. They realized the value of Voice User Interface (VUI) design They realized the value of Voice User Interface (VUI) design A market for telephony based speech applications started to appear and soon became a structured mature industry. A market for telephony based speech applications started to appear and soon became a structured mature industry. Industrial standards started to catch up with the convergence of speech and Web technology. Industrial standards started to catch up with the convergence of speech and Web technology.

2005 – The Commercial Spoken Dialog Landscape TECHNOLOGY VENDORS SPEECH RECOGNITION, TTS PLATFORM INTEGRATORS IVR, VoiceXML, CTI,… TOOLS – AUTHORING, TUNING, PREPACKAGED APPLICATIONS APPLICATION DEVELOPERS PROFESSIONAL SERVICES HOSTING In 2004, 600 to 1,000M$ revenue > 200 deployed applications in NA New evolving standards guarantee interoperability of engines and platforms.

Two Different Architectures General Natural Language Understanding Prompt Specific Grammars ACADEMIC RESEARCH INDUSTRIAL R&D

Architecture of a dialog system (the research view) SPEECH RECOGNIZER Language Models DIALOG MANAGER NATURAL LANGUAGE UNDERSTANDING Semantic Models TEXT-TO-SPEECH SYNTHESIZER

Commercial Conversational Architecture VOICE BROWSER PLATFORM (ASR, TTS, PLAY) APPLICATION SERVER PROMPTS GRAMMARS PROMPT GRAMMAR RECOGNITION RESULT SRGS SSML VoiceXML MRCP CCXML EMMA ? SCXML?

Speech and the Web VoiceXML applications VoiceXML Browser Web Server VoiceXML page HTTP request Static VoiceXML pages BACKEND Internet

Speech and the Web VoiceXML applications VoiceXML Browser Web Server BACKEND APPLICATION Application State Dynamic VoiceXML generation Dialog Manager VoiceXML document HTTP request Internet

Dialog Engineering: What is a Dialog Manager? DECIDE WHICH FUNCTION TO CALL GET RESULTS UPDATE STATE

Dialog Engineering: What is a Dialog Manager? DECIDE WHICH VoiceXML PAGE TO SERVE GET RESULTS UPDATE STATE

Two Different Approaches to Dialog Management DIALOG ENGINE USER EXPERIENCE ACADEMIC RESEARCH INDUSTRIAL R&D ENGINE BASED ON GENERAL DIALOG PRINCIPLES--CONFIGURED FOR DIFFERENT APPLICATIONS USER EXPERIENCE COMPLETELY SPECIFIED BY DESIGNER – CODED INTO APPLICATION PLATFORM

VUI Completeness Successful commercial applications require a detailed control of the user experience Successful commercial applications require a detailed control of the user experience No unpredictable behavior—every possible situation needs to be thought of and specified No unpredictable behavior—every possible situation needs to be thought of and specified It is common practice in the industry to fully describe the Voice User Interface (VUI) in a specification document. It is common practice in the industry to fully describe the Voice User Interface (VUI) in a specification document. Graph with nodes and conditional transitions Graph with nodes and conditional transitions System prompts and grammars are specified in detail System prompts and grammars are specified in detail Design-develop-test cycles prior to full deployment. Design-develop-test cycles prior to full deployment. Programming paradigm should match specification Programming paradigm should match specification

User Experience/VUI design Welcome Main Menu Account Balance Transfer Bill Payments Exit Get Origin Account Get Destination Account Get Amount Enter Transfer amount > origin account? Play Wrong Amount Message YES Play Confirmation confirmed? What is wrong? Go to Main Menu NO YES NO amount destination account origin account

Get Origin Account Get Destination Account Get Amount Enter Transfer amount > origin account? Play Wrong Amount Message YES Play Confirmation confirmed? What is wrong? Go to Main Menu NO YES NO amount destination account origin account User Experience/VUI design Get Amount Interaction Module PROMPTS TypeWordingSource Initial Please say the amount you would like to transfer from your get_amount_I_1.wav <origin-account>TTS to your get_amount_I_2.wav <destination-account>TTS in dollars and cents. get_amount_I_3.wav Retry 1 Please say the amount you would like to transfer from your get_amount_I_1.wav <origin-account>TTS to your get_amount_I_2.wav <destination-account>TTS in dollars and cents. get_amount_I_3.wav Retry 2 Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_R_2_1.wav Timeout 1 I'm sorry, I didn't hear you. get_amount_T_1_1.wav Please say the amount you would like to transfer from your get_amount_I_1.wav <origin-account>TTS to your get_amount_I_2.wav <destination-account>TTS Timeout 2 I didn't hear you this time either. Please say the amount you would like to have transferred, like one hundred dollars and fifty cents. get_amount_T_2_1.wav Help Please say how much do you wish to transfer. You can say the amount in dollars and cents, like, for instance, one hundred dollars and fifty cents. get_amount_H.wav ACTIONS CONDITIONACTION if amount greater than amount in if amount greater than amount in Go to "Play Wrong Amount Message" else Go to "Play Confirmation"

The Speech Application Lifecycle requirements VUI design usability 1 VUI development speech science high level system design system engineering integration partial deployment full deployment Analyst VUI Designer Speech Scientist VUI Designer Architect, App Developer Engineer Project Manager

Simple Things Should be Easy Difficult Things Should be Possible CONTROL: The dialog programming paradigm should allow a detailed control of the VUI CONTROL: The dialog programming paradigm should allow a detailed control of the VUI Too low-level makes complex behavior hard to program Too low-level makes complex behavior hard to program EXPRESSIVENESS: Complex behavior need to be expressed in a simple way EXPRESSIVENESS: Complex behavior need to be expressed in a simple way Built-in behavior may be hard to bypass Built-in behavior may be hard to bypass

When Things Go Wrong Speech Error Control Speech recognition is not perfect and probably will never be in the foreseeable future Speech recognition is not perfect and probably will never be in the foreseeable future Speech recognition errors are extremely disruptive to the course of the dialog Speech recognition errors are extremely disruptive to the course of the dialog Commercial dialog applications developed robust strategies for error control Commercial dialog applications developed robust strategies for error control Directed dialog–matching grammars and prompts Directed dialog–matching grammars and prompts Two-step and One-step correction strategies Two-step and One-step correction strategies

Programmatic Dialog Management A generic program implementing a VUI specification if( answer == “balance” ) { PlayPrompt(“CheckingOrSavings.wav”); Recognize(&answer, “CheckingOrSavings.grm”); if( answer == “checking” ) { GetCheckingAccountBalance(&balance); PlayComplexPrompt(“BalanceOfCheckingIs.wav”, balance); elseif( answer == “savings” ) { GetSavingsAccontBalance(&balance); PlayComplexPrompt(“BalanceOfSavingsIs.wav”, balance); endif; elseif( answer == “transfer” ) { PlayPrompt(“SayFromAccount.wav”); Recognize(&answer, “CheckingOrSavings.grm”); if(answer == “checking”) {...

Smart developers hate to do the same things over and over Build libraries of reusable functions Build libraries of reusable functions Dialog Modules Dialog Modules Handle full collection of a single or multiple pieces of information (e.g. Credit Card, SSN, Date,...). Handle full collection of a single or multiple pieces of information (e.g. Credit Card, SSN, Date,...). Manage re-prompts, timeouts, disambiguation, data normalization, etc. Manage re-prompts, timeouts, disambiguation, data normalization, etc. Develop design patterns and styles Develop design patterns and styles Build sample code frameworks Build sample code frameworks State machine frameworks State machine frameworks Code examples, templates,... Code examples, templates,... State machine engines State machine engines

State Machine Call Flow Call flow is the simplest state machine model Call flow is the simplest state machine model Nodes correspond to prompts Nodes correspond to prompts Arcs correspond to user choices Arcs correspond to user choices Nodes roughly correspond to the application state Nodes roughly correspond to the application state

Balance or transfer? Which account? From account? Give checking balance Give savings balance transfer checkingsavings Amount? checking savings Make Savings to checking transfer Make Checking to savings transfer

Call-flow authoring in commercial IVR platforms

Early call flow tools had several limitations Topology often restricted to trees Topology often restricted to trees Limited functionality of nodes Limited functionality of nodes Limited conditional language Limited conditional language No recursion, encapsulation, scoping No recursion, encapsulation, scoping No inheritance of node properties No inheritance of node properties Limited mechanisms for handling external variables. Limited mechanisms for handling external variables. GUI drag-n-drop development environment GUI drag-n-drop development environment Difficult to handle mixed initiative Difficult to handle mixed initiative

A common misconception Finite state based dialog managers need a branch for each possible situation – cannot handle mixed initiative because of the combinatory explosion. Finite state based dialog managers need a branch for each possible situation – cannot handle mixed initiative because of the combinatory explosion.

What? dest Destination? Time? Destination? origin dest & time Origin? dest Origin? time Time? origin & dest Dest? origin & time Origin? dest & time origin dest time FLIGHT

What? Origin? !origin origin dest time FLIGHT Destination? Time? !dest !time …? FIA: Form Interpretation Algorithm

origin dest time FLIGHT !origin  ask_origin !dest  ask_dest !time  ask_time nprompts = 0  retrieve_flights !origin  ask_origin !dest  ask_dest !time  ask_time nprompts = 0  retrieve_flights Rule Based Authoring ask_origin ask_dest ! origin ! dest RETURN CONTINUE END STOP ask_time CONTINUE ! time RETURN STOP

Bottom line Finite state dialog controller is more powerful than what we thought Finite state dialog controller is more powerful than what we thought Can handle mixed initiative dialogs Can handle mixed initiative dialogs Applications can be authored in different forms Applications can be authored in different forms Some do not put constraints on the topology of the state machine (e.g. “call flow”) Some do not put constraints on the topology of the state machine (e.g. “call flow”) Others use specific topologies (e.g. rules, FIA) Others use specific topologies (e.g. rules, FIA) VUI completeness can be managed VUI completeness can be managed Simple things are easy. Are difficult things possible? Simple things are easy. Are difficult things possible?

The complexity of dialog systems LOW MEDIUM HIGH COMPLEXITY FLIGHT STATUS STOCK TRADING PACKAGE TRACKING FLIGHT RESERVATION BANKING CUSTOMER CARE TECHNICAL SUPPORT INFORMATIONALTRANSACTIONALPROBLEM SOLVING

Building more complex applications: Inference Based Dialog Managers Use an inference engine with a defined behavior Use an inference engine with a defined behavior Describe the application model rather than the user experience Describe the application model rather than the user experience Infer the user goal and create plans for system actions Infer the user goal and create plans for system actions

Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_destination book_hotel  get_date_in AND get_date_out AND get_room_type AND get_city...

Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_destination book_hotel  get_date_in AND get_date_out AND get_room_size AND get_city... I want to leave on May 3 rd from New York ORIGIN DEP_DATE

Example of Inference Based Dialog Management application  book_flight OR book_hotel OR book_train book_flight  get_dep_date AND ((get_itinerary AND get_time) OR get_flight_ID) AND get_airline get_itinerary  get_origin AND get_dest book_hotel  get_date_in AND get_date_out AND get_room_size AND get_city... I want to leave on May 3 rd from New York ORIGIN DEP_DATE STACK Where do you want to go? AGENDA get_dest get_time get_airline get_dest get_time Which airline ?

Roadblocks to Deploying Engine Based Models in Commercial Applications VUI completeness and predictability of engine behavior. VUI completeness and predictability of engine behavior. Difficult things are possible but are simple things easy? Difficult things are possible but are simple things easy? Application independence Application independence Developer training Developer training Fine tuning of VUI Fine tuning of VUI Mapping to/from VUI specs and roundtripping Mapping to/from VUI specs and roundtripping

The Innovator’s Dilemma TIME PERFORMANCE MARKET NEEDS SUSTAINING TECHNOLOGY DISRUPTIVE TECHNOLOGY The Innovator’s Dilemma Clayton M. Christensen, 1997 optical photography digital photography

Speech Technology and the Innovator’s Dilemma TIME PERFORMANCE MARKET NEEDS RESEARCH SYSTEMS COMMERCIAL SYSTEMS TODAY