Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

AP STUDY SESSION 2.
1
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Key architectural details RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda Dan BohusAlex Rudnicky School of.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
1 Balloting/Handling Negative Votes September 22 nd and 24 th, 2009 ASTM Virtual Training Session Christine DeJong Joe Koury.
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 Hyades Command Routing Message flow and data translation.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Wants.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
EE, NCKU Tien-Hao Chang (Darby Chang)
Turing Machines.
Red Tag Date 13/12/11 5S.
PP Test Review Sections 6-1 to 6-6
The Weighted Proportional Resource Allocation Milan Vojnović Microsoft Research Joint work with Thành Nguyen Microsoft Research Asia, Beijing, April, 2011.
1 The Blue Café by Chris Rea My world is miles of endless roads.
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
An Application of Linear Programming Lesson 12 The Transportation Model.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Artificial Intelligence
Subtraction: Adding UP
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Essential Cell Biology
1 Phase III: Planning Action Developing Improvement Plans.
Clock will move after 1 minute
PSSA Preparation.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.
Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,
A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,
Presentation transcript:

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee Alex Rudnicky (Chair) Roni Rosenfeld Jeff Schneider Eric Horvitz (Microsoft Research)

2 Problem Spans most domains and interaction types Has a significant impact on performance Lack of robustness when faced with understanding errors

3 An example S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry Im not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: Im still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no Im traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

4 Some Statistics … Corrections [Krahmer, Swerts, Litman, Levow] 30% of utterances correct system mistakes 2-3 times more likely to be misrecognized Semantic error rates: ~25-35% SpeechActs [SRI] 25% CU Communicator [CU] 27% Jupiter [MIT] 28% CMU Communicator [CMU] 32% How May I Help You? [AT&T] 36%

5 Significant Impact on Interaction CMU Communicator 40% 26% Contain understanding errors Failed Multi-site Communicator Corpus [Shin et al] 37% Failed sessions 33% 63%

6 Outline Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

7 Increasing Robustness … Increase the accuracy of speech recognition Assume recognition is unreliable, and create the mechanisms for acting robustly at the dialogue management level problem : approach : infrastructure : indicators : strategies : decision process : summary

8 Snapshot of Existing Work: Slide 1 Theoretical models of grounding Contribution Model [Clark], Grounding Acts [Traum] Practice: heuristic rules Misunderstandings Threshold(s) on confidence scores Non-understandings problem : approach : infrastructure : indicators : strategies : decision process : summary Analytical/Descriptive, not decision oriented Ad-hoc, lack generality, not easy to extend

9 Snapshot of Existing Work: Slide 2 Conversation as Action under Uncertainty [Paek and Horvitz] Belief networks to model uncertainties Decisions based on expected utility, VOI-analysis Reinforcement learning for dialogue control policies [Singh, Kearns, Litman, Walker, Levin, Pieraccini, Young, Scheffler, etc] Formulate dialogue control as an MDP Learn optimal control policy from data Do not scale up to complex, real-world tasks problem : approach : infrastructure : indicators : strategies : decision process : summary

10 Develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Thesis Statement Decision making under uncertainty Approach: problem : approach : infrastructure : indicators : strategies : decision process : summary

11 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Three components Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary 0.Infrastructure problem : approach : infrastructure : indicators : strategies : decision process : summary

12 Infrastructure RavenClaw Modern dialog management framework for complex, task-oriented domains RavenClaw spoken dialogue systems Test-bed for evaluation problem : approach : infrastructure : indicators : strategies : decision process : summary Completed

13 RavenClaw Dialogue Task (Specification) Domain-Independent Dialogue Engine RoomLine Login Welcome AskRegisteredAskName GreetUser GetQuery DateTimeLocationProperties NetworkProjectorWhiteboard GetResultsDiscussResults user_nameregistered query results RoomLine Login AskRegistered Dialogue Stack registered: [No]-> false, [Yes] -> true Expectation Agenda Error Handling Decision Process Strategies Error Indicators ExplicitConfirm problem : approach : infrastructure : indicators : strategies : decision process : summary

14 RavenClaw-based Systems problem : approach : infrastructure : indicators : strategies : decision process : summary SystemDomain RoomLineInformation Access CMU Lets Go! Bus Information System Information Access LARRI [Symphony] Guidance through procedures Intelligent Procedure Assistant [NASA Ames] Guidance through procedures TeamTalk [11-754] Command-and-control Eureka [11-743] Web-access

15 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Research Plan Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

16 Existing Work Confidence Annotation Traditionally focused on speech recognizer [Bansal, Chase, Cox, and others] Recently, multiple sources of knowledge [San-Segundo, Walker, Bosch, Bohus, and others] Recognition, parsing, dialogue management Detect misunderstandings: ~ 80-90% accuracy Correction and Aware Site Detection [Swerts, Litman, Levow and others] Multiple sources of knowledge Detect corrections: ~ 80-90% accuracy problem : approach : infrastructure : indicators : strategies : decision process : summary

17 S: Where are you flying from? U: [CityName={Aspen/0.6; Austin/0.2}] S: Did you say you wanted to fly out of Aspen? U: [No/0.6] [CityName={Boston/0.8}] Proposed: Belief Updating Continuously assess beliefs in light of initial confidence and subsequent events [CityName={Aspen/?; Austin/?; Boston/?}] An example: problem : approach : infrastructure : indicators : strategies : decision process : summary initial belief + system action + user response updated belief

18 contents Belief Updating: Approach Model the update in a dynamic belief network User concept User response tt + 1 problem : approach : infrastructure : indicators : strategies : decision process : summary confidence correction 1 st Hyp2 nd Hyp3 rd Hyp Confidence YesNo Positive Markers Negative Markers Utterance Length User concept System action

19 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Research Plan Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

20 Is the Dialogue Advancing Normally? Locally, turn-level: Non-understanding indicators Non-understanding flag directly available Develop additional indicators Recognition, Understanding, Interpretation Globally, discourse-level: Dialogue-on-track indicators Counts, averages of non-understanding indicators Rate of dialogue advance problem : approach : infrastructure : indicators : strategies : decision process : summary

21 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Research Plan Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

22 Error Recovery Strategies Identify Identify and define an extended set of error handling strategies Implement Construct task-decoupled implementations of a large number of strategies Evaluate Evaluate performance and bring further refinements problem : approach : infrastructure : indicators : strategies : decision process : summary

23 List of Error Recovery Strategies Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Ask repeat turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Ask rephrase turn problem : approach : infrastructure : indicators : strategies : decision process : summary User InitiatedSystem Initiated Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the system has reliable information (misunderstandings) Ensure that the dialogue on track

24 List of Error Recovery Strategies Help Where are we? Start over Scratch concept value Go back Channel establishment Suspend/Resume Repeat Summarize Quit Restart subtask plan Select alternative plan Start over Terminate session / Direct to operator Local problems (non-understandings) Global problems (compounded, discourse-level problems) Switch input modality SNR repair Ask repeat turn Notify non-understanding Explicit confirm turn Targeted help WH-reformulation Keep-a-word reformulation Generic help You can say Ask rephrase turn problem : approach : infrastructure : indicators : strategies : decision process : summary User InitiatedSystem Initiated Explicit confirmation Implicit confirmation Disambiguation Ask repeat concept Reject concept Ensure that the system has reliable information (misunderstandings) Ensure that the dialogue on track

25 Error Recovery Strategies: Evaluation Reusability Deploy in different spoken dialogue systems Efficiency of non-understanding strategies Simple metric: Is the next utterance understood? Efficiency depends on decision process Construct upper and lower bounds for efficiency Lower bound: decision process which chooses uniformly Upper bound: human performs decision process (WOZ) problem : approach : infrastructure : indicators : strategies : decision process : summary

26 0.Infrastructure 1.Error awareness 2.Error recovery strategies 3.Error handling decision process Research Plan Develop indicators that … Assess reliability of information Assess how well the dialogue is advancing Develop and investigate an extended set of conversational error handling strategies Develop a scalable reinforcement-learning based architecture for making error handling decisions problem : approach : infrastructure : indicators : strategies : decision process : summary

27 Dialogue control ~ Markov Decision Process States Actions Rewards Previous work: successes in small domains NJFun [Singh, Kearns, Litman, Walker et al] Problems Approach does not scale Once learned, policies are not reusable Previous Reinforcement Learning Work problem : approach : infrastructure : indicators : strategies : decision process : summary S1S1 S2S2 S3S3 A R

28 Proposed Approach Overcome previous shortcomings: Focus learning only on error handling Reduces the size of the learning problem Favors reusability of learned policies Lessens the system development effort Use a divide-and-conquer approach Leverage independences in dialogue problem : approach : infrastructure : indicators : strategies : decision process : summary

29 Decision Process Architecture RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP Small-size models Parameters can be tied across models Accommodate dynamic task generation Favors reusability of policies Initial policies can be easily handcrafted problem : approach : infrastructure : indicators : strategies : decision process : summary No Action Explicit Confirm No Action Explicit Confirmation Independence assumption

30 Reward Structure & Learning Gating Mechanism MDP Action Global, post-gate rewards Reward Gating Mechanism MDP Action Local rewards Reward Rewards based on any dialogue performance metric Atypical, multi-agent reinforcement learning setting Multiple, standard RL problems Model-based approaches problem : approach : infrastructure : indicators : strategies : decision process : summary

31 Evaluation Performance Compare learned policies with initial heuristic policies Metrics Task completion Efficiency Number and lengths of error segments User satisfaction Scalability Deploy in a system operating with a sizable task Theoretical analysis problem : approach : infrastructure : indicators : strategies : decision process : summary

32 Outline Problem Approach Infrastructure Research Program Summary & Timeline problem : approach : infrastructure : indicators : strategies : decision process : summary

33 Goal: develop a task-independent, adaptive and scalable framework for error recovery in task-oriented spoken dialogue systems Modern dialogue management framework Belief updating framework Investigation of an extended set of error handling strategies Scalable data-driven approach for learning error handling policies Summary of Anticipated Contributions problem : approach : infrastructure : indicators : strategies : decision process : summary

34 Timeline proposal milestone 1 milestone 2 milestone 3 defense end of year 4 end of year 5 now 5.5 years Data collection for belief updating and WOZ study Develop and evaluate the belief updating models Implement dialogue-on-track indicators Misunderstanding and non-understanding strategies Investigate theoretical aspects of proposed reinforcement learning model Evaluate non-understanding strategies; develop remaining strategies Error handling decision process: reinforcement learning experiments Data collection for RL training Data collection for RL evaluation data indicatorsstrategiesdecisions Contingency data collection efforts Additional experiments: extensions or contingency work problem : approach : infrastructure : indicators : strategies : decision process : summary February 2004 September 2004 September 2005 January 2005 December 2005

35 Thank You! Questions & Comments

36 Additional Slides

37

38 Understanding Process Errors in spoken dialogue systems Recognition System acquires information System does not acquire information Non-understanding System acquires correct information System acquires incorrect information Misunderstanding OK Parsing Contextual Interpretation Non-understanding indicators/ Turn-level strategies Belief Updating/ Concept-level strategies

39 Structure of Individual MDPs HC ExplConf ImplConf NoAct LC ExplConf ImplConf NoAct MC ExplConf ImplConf NoAct 0 Concept MDPs State-space: belief indicators Action-space: concept scoped system actions Topic MDPs State-space: non-understanding, dialogue-on-track indicators Action-space: non-understanding actions, topic-level actions

40 Gating Mechanism Heuristic derived from domain-independent dialogue principles Give priority to entities closer to the conversational focus Give priority to topics over concept

41 Task-independence / Reusability Dialogue Task (Specification) Domain-Independent Dialogue Engine RoomLine Login Welcome AskRegisteredAskName GreetUser GetQuery DateTimeLocationProperties NetworkProjectorWhiteboard GetResultsDiscussResults user_nameregistered query results RoomLine Login AskRegistered Dialogue Stack registered: [No]-> false, [Yes] -> true Expectation Agenda Error Handling Decision Process Strategies Error Indicators ExplicitConfirm problem : approach : infrastructure : indicators : strategies : decision process : summary Argument : architecure Proof: deployment across multiple RavenClaw systems

42 Adaptable problem : approach : infrastructure : indicators : strategies : decision process : summary Argument: reinforcement learning approach Proof: longer term evaluation of adaptability (extension work item) RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP No Action Explicit Confirm No Action Explicit Confirmation

43 Scalable problem : approach : infrastructure : indicators : strategies : decision process : summary Argument: architecture Proof: deployment and experiments with systems with large tasks RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP No Action Explicit Confirm No Action Explicit Confirmation

44 Scalability of Reinforcement Learning NJFun 3 concepts, 7 state variables, 62 states Learned a policy from 311 dialogues Consider 12 concepts (RoomLine/20, CMU Lets Go!/27) 242 states State-space: grows 4 times # Parameters: grows 16 times

45 Extension Work Items Portability of confidence annotation and belief updating schemes Use domain-independent features Train in one domain, test in another Use of unlabeled data Self-training and co-training to improve performance in a new domain Evaluation of adaptability/reusability Reusability: migrate policies from one domain to another Adaptability: monitor model changes and system behavior throughout a period of extended use

46 Study by [Shin et al] Labeled error segments in 141 dialogs from multiple Communicator systems 1.66 error segments / session 22% of the error segments never get back on track 37% of sessions have breakdowns in interaction

47 Errors Indicators, Strategies & Decision Process RoomLine Login Welcome AskRegisteredAskName GreetUser user_nameregistered Gating Mechanism Concept-MDP Topic-MDP No Action Explicit Confirm No Action Explicit Confirmation Error indicators Strategies Decision Process

48 An example S: Where do you want to go to? U: I wanna go to Grand Rapids Michigan on December first [I WANNA GO TO GRAND RAPIDS MICHIGAN FOLLOWING DECEMBER BOSTON] S: traveling to Boston… departing Pittsburgh on what date? U: traveling to Grand Rapids Michigan [TRAVELING TO GRAND RAPIDS MICHIGAN] S: traveling to Grand Rapids… on what date would you like to travel? U: December first [DECEMBER FOR SHIT] S: sorry, Im not sure I understood what you said. What day did you need to travel? U: Friday December first [FRIDAY DECEMBER FOURTH] S: a flight on Monday, December fourth… and what time would you like to leave Pittsburgh Monday, December fourth U: traveling on Friday December first [COMING FRIDAY DECEMBER] S: the date you gave was ambiguous. Did you mean November twenty-third or November twenty fourth? U: traveling on Friday December first [TRAVELING ON FRIDAY DECEMBER] S: a flight on Friday, November twenty-fourth… and departing Pittsburgh at what time? U: traveling the following week on Friday [IM TRAVELING FOLLOWING WEEK ON FRIDAY]