Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Slides:



Advertisements
Similar presentations
Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.
Advertisements

Key architectural details RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda Dan BohusAlex Rudnicky School of.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
Planning under Uncertainty
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
Grounding in Conversational Systems Dan Bohus January 2003 Dialogs on Dialogs Reading Group Carnegie Mellon University.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.
Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.
Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.
Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.
RavenClaw Yet another (please read “An improved”) dialog management architecture for task-oriented spoken dialog systems Presented by: Dan Bohus
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
Madeleine, a RavenClaw Exercise in the Medical Diagnosis Domain Dan Bohus, Alex Rudnicky MITRE Workshop on Dialog Management, Boston, October 2003.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus
A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.
misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
RavenClaw An improved dialog management architecture for task-oriented spoken dialog systems Presented by: Dan Bohus Work by: Dan Bohus,
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.
A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.
Software Process and Product Metrics
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
DQA meeting: : Learning more effective dialogue strategies using limited dialogue move features Matthew Frampton & Oliver Lemon, Coling/ACL-2006.
Fault Diagnosis System for Wireless Sensor Networks Praharshana Perera Supervisors: Luciana Moreira Sá de Souza Christian Decker.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Speech and Language Processing Chapter 24 of SLP (part 3) Dialogue and Conversational Agents.
Creating a Shared Vision Model. What is a Shared Vision Model? A “Shared Vision” model is a collective view of a water resources system developed by managers.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation G.Kumaravelan Pondicherry University, Karaikal.
Conversation as Action Under Uncertainty Tim Paek Eric Horvitz.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
1 CS 224S W2006 CS 224S LING 281 Speech Recognition and Synthesis Lecture 15: Dialogue and Conversational Agents (III) Dan Jurafsky.
Systems Analysis and Design in a Changing World, Fourth Edition
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
STEP 4 Manage Delivery. Role of Project Manager At this stage, you as a project manager should clearly understand why you are doing this project. Also.
Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.
2015 Pipeline Safety Trust Conference November 20 th, 2015 | New Orleans, LA API RP 1175 Pipeline Leak Detection Program Management – New RP Highlights.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.
February 19, February 19, 2016February 19, 2016February 19, 2016 Azusa, CA Sheldon X. Liang Ph. D. Software Engineering in CS at APU Azusa Pacific.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Integrating Learning of Dialog Strategies and Semantic Parsing
Spoken Dialogue Systems
Presentation transcript:

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)

2 Problem  Spans most domains and interaction types  Has a significant impact on performance Lack of robustness when faced with understanding errors

3 An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

4 Some Statistics …  Corrections [Krahmer, Swerts, Litman, Levow]  30% of utterances correct system mistakes  2-3 times more likely to be misrecognized  Semantic error rates CMU Communicator [CMU] 32% CU Communicator [CU] 27% How May I Help You? [AT&T] 36% Jupiter [MIT] 28% SpeechActs [SRI] 25%

5 Significant Impact on Interaction  CMU Communicator 40% 26% Contain understanding errors Failed  Multi-site Communicator Corpus [Shin et al] 37% Failed sessions 33% 63%

6 Outline  Problem  Approach  Infrastructure  Research Program  Timeline & Summary problem : approach : infrastructure : indicators : strategies : decision process : summary

7 Increasing Robustness …  Increase the accuracy of speech recognition  Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level  ASR performance increases / demands increase  More general problem : approach : infrastructure : indicators : strategies : decision process : summary

8 Snapshot of Existing Work: Slide 1  Theoretical models of grounding  Contribution Model [Clark], Grounding Acts [Traum]  Practice: heuristic rules  Misunderstandings Threshold(s) on confidence scores  Non-understandings problem : approach : infrastructure : indicators : strategies : decision process : summary Analytical/Descriptive, not decision oriented Ad-hoc, lack generality, not easy to extend

9 Snapshot of Existing Work: Slide 2  Conversation as Action under Uncertainty [Paek and Horvitz]  Belief networks to model uncertainties  Decisions based on expected utility, VOI-analysis  Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc]  Formulate dialogue control as an MDP  Learn optimal control policy from data Do not scale up to complex, real-world domains problem : approach : infrastructure : indicators : strategies : decision process : summary

10 A task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Research Program: Goals & Approach  Decision making under uncertainty Approach: problem : approach : infrastructure : indicators : strategies : decision process : summary

11 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three Components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems problem : approach : infrastructure : indicators : strategies : decision process : summary 0.Infrastructure problem : approach : infrastructure : indicators : strategies : decision process : summary

12 Infrastructure  RavenClaw  Modern dialog management framework for complex, task-oriented domains  RavenClaw spoken dialogue systems  Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary Completed

13 RavenClaw Dialogue Task (Specification) Domain-Independent Dialogue Engine RoomLine Login Welcome AskRegisteredAskName GreetUser GetQuery DateTimeLocationProperties NetworkProjectorWhiteboard GetResultsDiscussResults user_nameregistered query results RoomLine Login AskRegistered Dialogue Stack registered: [No]-> false, [Yes] -> true Expectation Agenda Error Handling Decision Process Strategies Indicators ExplicitConfirm problem : approach : infrastructure : indicators : strategies : decision process : summary

14 RavenClaw-based Systems  RoomLine  CMU Let’s Go!! Bus Information System  LARRI [Symphony]  TeamTalk [11-741]  Eureka [11-743] problem : approach : infrastructure : indicators : strategies : decision process : summary

15 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three Components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems problem : approach : infrastructure : indicators : strategies : decision process : summary

16 Existing Work  Confidence Annotation  Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others]  Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] Recognition, parsing, dialogue management  Detect misunderstandings: ~ 80-90% accuracy  Correction and Aware Site Detection [Swerts, Litman, Levow and others]  Multiple sources of knowledge  Detect corrections: ~ 80-90% accuracy problem : approach : infrastructure : indicators : strategies : decision process : summary

17 S: Where are you flying from? U: [CityName={Aspen/0.6; Austin/0.2}] S: Did you say you wanted to fly out of Aspen? U: [No] [CityName={Boston/0.8}] Proposed: Belief Updating  Continuously assess beliefs in light of initial confidence and subsequent events [CityName={Aspen/?; Austin/?; Boston/?}]  An example: problem : approach : infrastructure : indicators : strategies : decision process : summary initial belief + system action + user response updated belief

18 contents Belief Updating: Approach  Model the update in a dynamic belief network CC system action User response features tt + 1 problem : approach : infrastructure : indicators : strategies : decision process : summary CC system action initial beliefupdated belief confidence correction contentsconfidence correction CurrentTopCurrent2nd Current3r d Confidence YesNo Positive Markers Negative Markers Utterance Length

19 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three Components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems problem : approach : infrastructure : indicators : strategies : decision process : summary

20 Is the Dialogue Advancing Normally? Locally, turn-level:  Non-understanding indicators  Non-understanding flag directly available  Develop additional indicators Recognition, Understanding, Interpretation Globally, discourse-level:  Dialogue-on-track indicators  Summary statistics of non-understanding indicators  Rate of dialogue advance problem : approach : infrastructure : indicators : strategies : decision process : summary

21 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three Components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems problem : approach : infrastructure : indicators : strategies : decision process : summary

22 Error Recovery Strategies  Identify  Identify and define an extended set of error handling strategies  Implement  Construct task-decoupled implementations of a large number of strategies  Evaluate  Evaluate performance and bring further refinements

23 List of Error Recovery Strategies Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Ask repeat turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Ask rephrase turn problem : approach : infrastructure : indicators : strategies : decision process : summary User InitiatedSystem Initiated Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the system has reliable information (misunderstandings) Ensure that the dialogue on track

24 List of Error Recovery Strategies Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Ask repeat turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Ask rephrase turn problem : approach : infrastructure : indicators : strategies : decision process : summary User InitiatedSystem Initiated Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the system has reliable information (misunderstandings) Ensure that the dialogue on track

25 Error Recovery Strategies: Evaluation  Reusability  Deploy in different spoken dialogue systems  Efficiency of non-understanding strategies  Simple metric: Is the next utterance understood?  Efficiency depends on decision process  Construct upper and lower bounds for efficiency Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary

26 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three Components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based approach for error recovery in spoken dialogue systems problem : approach : infrastructure : indicators : strategies : decision process : summary

27  Dialogue control ~ Markov Decision Process  States  Actions  Rewards  Previous work: successes in small domains  NJFun [Singh, Kearns, Litman, Walker et al]  Problems  Lack of scalability  Once learned, policies are not reusable Previous Reinforcement Learning Work problem : approach : infrastructure : indicators : strategies : decision process : summary S1S1 S2S2 S3S3 A

28 Proposed Approach Overcome previous shortcomings:  Focus learning only on error handling  Reduces the size of the learning problem  Favors reusability of learned policies  Lessens the system development effort  Use a “divide-and-conquer” approach  Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary

29 Gated Markov Decision Processes RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP  Small-size models  Parameters can be tied across models  Easy to design initial policies  Decoupling favors reusability of policies  Accommodate dynamic task generation Independence assumption problem : approach : infrastructure : indicators : strategies : decision process : summary No Action Explicit Confirm No Action Explicit Confirmation

30 Reward structure & learning Gating Mechanism MDP Action Global, post-gate rewards Reward Gating Mechanism MDP Action Local rewards Reward  Rewards based on any dialogue performance metric  Atypical, multi-agent reinforcement learning setting  Multiple, standard RL problems  Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary

31 Evaluation  Performance  Compare learned policies with initial heuristic policies  Metrics Task completion Efficiency Number and lengths of error segments User satisfaction  Scalability  Deploy in a system operating with a sizable task  Theoretical analysis problem : approach : infrastructure : indicators : strategies : decision process : summary

32 Outline  Problem  Approach  Infrastructure  Research Program  Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

33  Overall Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems  Modern dialogue management framework  Belief updating framework  Investigation of an extended set of error handling strategies  Scalable data-driven approach for learning error handling policies Summary of Contributions problem : approach : infrastructure : indicators : strategies : decision process : summary

34 Timeline proposal milestone 1 milestone 2 milestone 3 defense end of year 4 end of year 5 now 5.5 years Data collection for belief updating and WOZ study Develop and evaluate the belief updating models Implement dialogue-on-track indicators Misunderstanding and non-understanding strategies Investigate theoretical aspects of proposed reinforcement learning model Evaluate non-understanding strategies; develop remaining strategies Error handling decision process: reinforcement learning experiments Data collection for RL training Data collection for RL evaluation data indicatorsstrategiesdecisions Contingency data collection efforts Additional experiments: extensions or contingency work problem : approach : infrastructure : indicators : strategies : decision process : summary

35 Thank You! Questions & Comments committee members, then floor

36 Indicators: Goals  Goal: Increase awareness and capacity to detect problems  Develop indicators which can inform the error handling process about potential problems Understanding process System acquires information System does not acquire information Non-understanding System acquires correct information System acquires incorrect information Misunderstanding OK

37 problem : approach : support work : indicators : strategies : decision process : summary

38 Three Desired Properties  Task-Independence  Reuse the proposed architecture across different spoken dialogue systems with a minimal amount of authoring effort  Adaptability  Learn from experience how to adapt to the characteristics of various domains  Scalability  Applicable in spoken dialogue systems operating with large, practical tasks

39 HC ExplConf ImplConf NoAct LC ExplConf ImplConf NoAct MC ExplConf ImplConf NoAct 0

40 Belief Updating: Approach  Model the update in a dynamic belief network CC System Action User response features Yes No CurrentTopCurrent2nd Current3r d Confidence Positive Markers Negative Markers Utterance Length tt + 1  Top-N values  Fixed structure  Learn parameters  Data collection  Evaluation  Accuracy  Soft-error problem : approach : infrastructure : indicators : strategies : decision process : summary CC System Action CurrentTopCurrent2nd Current3r d Confidence Yes No Positive Markers Negative Markers Utterance Length

41 Gated Markov Decision Processes Issues:  Structure of individual MDPs  Gating mechanism  Reward structure and learning problem : approach : infrastructure : indicators : strategies : decision process : summary RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP No Action Explicit Confirm No Action Explicit Confirmation

42 Structure for individual MDPs  State-space:  informative subset of corresponding indicators  Concept-MDPs: confidence / beliefs  Topic-MDPs: non-understanding, dialogue-on- track indicators  Action-space  corresponding system-initiated error handling strategies problem : approach : infrastructure : indicators : strategies : decision process : summary

43 Gating Mechanism  Heuristic derived from domain-independent dialogue principles  Give priority to topics over concept  Give priority to entities closer to the conversational focus problem : approach : infrastructure : indicators : strategies : decision process : summary