Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (

Slides:

Advertisements

Similar presentations

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Advertisements

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Key architectural details RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda Dan BohusAlex Rudnicky School of.

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Developing Spoken Dialogue Systems in the Communicator / RavenClaw Framework Sphinx Lunch Talk Carnegie Mellon University, October 2004 Presented by:Dan.

Manuela Veloso, Anthony Stentz, Alexander Rudnicky Brett Browning, M. Bernardine Dias Faculty Thomas Harris, Brenna Argall, Gil Jones Satanjeev Banerjee.

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

OpenDial Framework Svetlana Stoyanchev SDS seminar 3/23.

Software Process Models

Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

Planning under Uncertainty

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

INDUSTRIAL & SYSTEMS ENGINEERING

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

Learning From Data Chichang Jou Tamkang University.

Spoken Dialog Management for an Astronaut’s Procedure Assistant Presented by: Dan Bohus Collaborators: Gregory Aist, RIALIST Group.

Madeleine, a RavenClaw Exercise in the Medical Diagnosis Domain Dan Bohus, Alex Rudnicky MITRE Workshop on Dialog Management, Boston, October 2003.

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,

Extending VERA (Conference Information) Design Specification & Schedules Arthur Chan (AC) Rohit Kumar (RK) Lingyun Gao (LG)

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Introduction to Systems Analysis and Design

Software Process and Product Metrics

23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.

Release & Deployment ITIL Version 3

Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Experimental Evaluation of Learning Algorithms Part 1.

Improved Human-Robot Team performance using Chaski Proceeding: HRI '11HRI '11 Proceedings of the 6th international conference on Human-robot interaction.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.

Today Ensemble Methods. Recap of the course. Classifier Fusion

1 CS 224S W2006 CS 224S LING 281 Speech Recognition and Synthesis Lecture 15: Dialogue and Conversational Agents (III) Dan Jurafsky.

Extending VERA (Conference Information) Design Specification & Schedules Arthur Chan (AC) Rohit Kumar (RK) Lingyun Gu (LG)

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Evaluating VR Systems. Scenario You determine that while looking around virtual worlds is natural and well supported in VR, moving about them is a difficult.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

A New Generation of Artificial Neural Networks.  Support Vector Machines (SVM) appeared in the early nineties in the COLT92 ACM Conference.  SVM have.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Design Engineering 1. Analysis  Design 2 Characteristics of good design 3 The design must implement all of the explicit requirements contained in the.

Project Cost Management

Tests of Significance The reasoning of significance tests

Erasmus University Rotterdam

Transparent Adaptive Resource Management for Middleware Systems

Sphinx Lunch Talk Carnegie Mellon University, October 2004

Integrating Learning of Dialog Strategies and Semantic Parsing

Presentation transcript:

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ( infrastructure & architecture ) 1 RavenClaw dialog management  information access Let’s Go! Bus Information, RoomLine  guidance through procedures LARRI, IPA  taskable agent Vera  command-and-control TeamTalk systems built  conference room reservations  live schedules for 13 rooms in 2 buildings on campus  size, location, a/v equipment  recognition: sphinx-2 [3-gram]  parsing: phoenix  synthesis: cepstral theta demo: Roomline ( current research ) 3 belief updating  problem: confidence scores provide an initial assessment for the reliability of the information obtained from the user. However, a system should leverage information available in subsequent user responses in order to update and improve the accuracy of its beliefs.  goal: bridge confidence annotation and correction detection in a unified framework for belief updating in task oriented spoken dialog system  approach: - machine learning (generalized linear models) - integrate features from multiple knowledge sources in the system - work with compressed beliefs - top hypothesis + other [ ASRU-2005 paper] - k hypotheses + other [ work in progress] results : data driven approach significantly outperforms common heuristics sample problem S: where would you like to fly from? U: [Boston/0.45]; [Austin/0.30] S: sorry, did you say you wanted to fly from Boston? U: [No/0.37] + [Aspen / 0.7] Updated belief = ? [Boston/?; Austin/?; Aspen/?] 4 rejection threshold optimization Bohus and Rudnicky - “Constructing Accurate Beliefs in Spoken Dialog Systems”, in ASRU-2005 transfer of confidence annotators across domains  data-driven approach for tuning state-specific rejection thresholds in a spoken dialog system Bohus and Rudnicky - “A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems”, to be presented at Interspeech work in progress, in collaboration with Antoine Raux Explicit Confirmation Did you say you wanted a room on Friday? Implicit Confirmation a room on Friday … for what time? misunderstanding recovery strategies AskRepeat Can you please repeat that? AskRephrase Could you please try to rephrase that? Reprompt Would you like a small or a large room? DetailedReprompt Sorry, I’m not sure I understood you correctly. Right now I need to know if you would prefer a small or a large room. Notify Sorry, I didn’t catch that … Yield Ø MoveOn Sorry, I did’t catch that. Once choice would be Wean Hall Would you like a reserva- tion for this room? YouCanSay Sorry, I didn’t catch that. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. TerseYouCanSay Full-Help Sorry, I didn’t catch that. So far I found 5 rooms matching your constraints. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. non-understanding recovery strategies updates following explicit confirmation updates following implicit confirmation initial heuristic proposed oracle Bohus and Rudnicky - “Sorry, I didn’t Catch That! – an Investigation of Non-understanding Errors and Recovery Strategies”, in SIGdial % 20% 30% 10% 20% 30% error rates  migrate (adapt) a confidence annotator trained with data from domain A to domain B, without any labeled data in the new domain (B).  question: can dialog performance be improved by using a better, more informed policy for engaging non-understanding recovery strategies?  approach: a between-groups experiment - control group: system chooses a non-understanding strategy randomly (i.e. in an uninformed fashion); - wizard group: a human wizard chooses which strategy should be used whenever a non-understanding happens; - 23 participants in each condition - first-time users, balanced by gender x native language - each attempted a maximum of 10 scenario-based interactions - evaluated global dialog performance (task success) and various local non-understanding recovery performance metrics (see side panel) results : wizard policy outperforms uninformed recovery policy on a number of global and local metrics avg. task success rate 20% 40% 60% 80% 100% non-nativesnatives * avg. recovery WER 20% 40% 60% 80% non-nativesnatives * wizard policy uninformed policy avg. recovery concept utility non-nativesnatives * avg. recovery efficiency non-nativesnatives * * 2 RavenClaw error handling architecture  RavenClaw: dialog management framework for complex, task-oriented domains - Dialog Task Specification (DTS): hierarchical plan which captures the domain-specific dialog control logic; - Dialog Engine “executes” a given DTS.  platform for research - error handling [this poster] - multi-participant dialog (Thomas Harris) - turn-taking (Antoine Raux) error handling decision process RoomLine start_time: [start_time] [time] dialog engine dialog task specification GetQuery GetStartTime ExplConf(start_time) dialog stack date: [date] start_time: [start_time] [time] end_time: [end_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] location: [location] network: [with_network] → true [without_network] → false expectation agenda For when do you need the room? Let’s try two to four p.m. [time](two) [end_time](four) Did you say you wanted the room starting at two p.m.? System: User: Parse: System: error handling strategies RoomLine i:WelcomeGetQuery r:GetDate r:GetStartTime r:GetEndTime x: DoQueryDiscussResults learning policies for recovering fromnon-understandings  question: can we learn a better policy from data?  decision theoretic approach: - learn to predict likelihood of success for each strategy - use features available at runtime - stepwise logistic regression (good class posterior probabilities) - compute expected utility for each strategy - choose strategy with maximum expected utility - preliminary results promising, a new experiment needed for validation predicting likelihood of success Reprompt49.2% → 32.8% YouCanSay48.6% → 34.3% TerseYouCanSay43.5% → 32.6% MoveOn35.6% → 30.0% DetailedReprompt37.7% → 34.4% for 5 of 10 strategies models perform better than a majority baseline, on both soft and hard error majority baseline error → cross-validation error  error handling strategies are implemented as library dialog agents  new strategies can be plugged in as they are developed  goal: task-independent, adaptive and scalable error handling architecture  approach: - error handling strategies and error handling decision process are decoupled from the dialog task → reusability, uniformity, plug-and-play strategies, lessens development effort - error handling decision process implemented in a distributed fashion - local concept error handling decision process (handles potential misunderstandings) - local request error handling decision process (handles non-understandings) - currently implemented as POMDPs (for concepts) and MDPs (for request agents) date start_time end_time concept error handling MDP request error handling MDP concept error handling MDP request error handling MDP 5 6