Presentation is loading. Please wait.

Presentation is loading. Please wait.

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (

Similar presentations


Presentation on theme: "Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ("— Presentation transcript:

1 Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ( infrastructure & architecture ) 1 RavenClaw dialog management  information access Let’s Go! Bus Information, RoomLine  guidance through procedures LARRI, IPA  taskable agent Vera  command-and-control TeamTalk systems built  conference room reservations  live schedules for 13 rooms in 2 buildings on campus  size, location, a/v equipment  recognition: sphinx-2 [3-gram]  parsing: phoenix  synthesis: cepstral theta demo: Roomline ( current research ) 3 belief updating  problem: confidence scores provide an initial assessment for the reliability of the information obtained from the user. However, a system should leverage information available in subsequent user responses in order to update and improve the accuracy of its beliefs.  goal: bridge confidence annotation and correction detection in a unified framework for belief updating in task oriented spoken dialog system  approach: - machine learning (generalized linear models) - integrate features from multiple knowledge sources in the system - work with compressed beliefs - top hypothesis + other [ ASRU-2005 paper] - k hypotheses + other [ work in progress] results : data driven approach significantly outperforms common heuristics sample problem S: where would you like to fly from? U: [Boston/0.45]; [Austin/0.30] S: sorry, did you say you wanted to fly from Boston? U: [No/0.37] + [Aspen / 0.7] Updated belief = ? [Boston/?; Austin/?; Aspen/?] 4 rejection threshold optimization Bohus and Rudnicky - “Constructing Accurate Beliefs in Spoken Dialog Systems”, in ASRU-2005 transfer of confidence annotators across domains  data-driven approach for tuning state-specific rejection thresholds in a spoken dialog system Bohus and Rudnicky - “A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems”, to be presented at Interspeech work in progress, in collaboration with Antoine Raux Explicit Confirmation Did you say you wanted a room on Friday? Implicit Confirmation a room on Friday … for what time? misunderstanding recovery strategies AskRepeat Can you please repeat that? AskRephrase Could you please try to rephrase that? Reprompt Would you like a small or a large room? DetailedReprompt Sorry, I’m not sure I understood you correctly. Right now I need to know if you would prefer a small or a large room. Notify Sorry, I didn’t catch that … Yield Ø MoveOn Sorry, I did’t catch that. Once choice would be Wean Hall 7220. Would you like a reserva- tion for this room? YouCanSay Sorry, I didn’t catch that. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. TerseYouCanSay Full-Help Sorry, I didn’t catch that. So far I found 5 rooms matching your constraints. Right now I’m trying to find out if you would prefer a small room or a large one. You can say ‘I want a small room’ or ‘I want a large room’. If the size of the room doesn’t matter to you, just say ‘I don’t care’. non-understanding recovery strategies updates following explicit confirmation updates following implicit confirmation initial heuristic proposed oracle Bohus and Rudnicky - “Sorry, I didn’t Catch That! – an Investigation of Non-understanding Errors and Recovery Strategies”, in SIGdial-2005 10% 20% 30% 10% 20% 30% error rates  migrate (adapt) a confidence annotator trained with data from domain A to domain B, without any labeled data in the new domain (B).  question: can dialog performance be improved by using a better, more informed policy for engaging non-understanding recovery strategies?  approach: a between-groups experiment - control group: system chooses a non-understanding strategy randomly (i.e. in an uninformed fashion); - wizard group: a human wizard chooses which strategy should be used whenever a non-understanding happens; - 23 participants in each condition - first-time users, balanced by gender x native language - each attempted a maximum of 10 scenario-based interactions - evaluated global dialog performance (task success) and various local non-understanding recovery performance metrics (see side panel) results : wizard policy outperforms uninformed recovery policy on a number of global and local metrics avg. task success rate 20% 40% 60% 80% 100% non-nativesnatives * avg. recovery WER 20% 40% 60% 80% non-nativesnatives * wizard policy uninformed policy avg. recovery concept utility 1 2 3 4 5 non-nativesnatives * avg. recovery efficiency non-nativesnatives * 0 1 1 * 2 RavenClaw error handling architecture  RavenClaw: dialog management framework for complex, task-oriented domains - Dialog Task Specification (DTS): hierarchical plan which captures the domain-specific dialog control logic; - Dialog Engine “executes” a given DTS.  platform for research - error handling [this poster] - multi-participant dialog (Thomas Harris) - turn-taking (Antoine Raux) error handling decision process RoomLine start_time: [start_time] [time] dialog engine dialog task specification GetQuery GetStartTime ExplConf(start_time) dialog stack date: [date] start_time: [start_time] [time] end_time: [end_time] [time] date: [date] start_time: [start_time] [time] end_time: [end_time] [time] location: [location] network: [with_network] → true [without_network] → false expectation agenda For when do you need the room? Let’s try two to four p.m. [time](two) [end_time](four) Did you say you wanted the room starting at two p.m.? System: User: Parse: System: error handling strategies RoomLine i:WelcomeGetQuery r:GetDate r:GetStartTime r:GetEndTime x: DoQueryDiscussResults learning policies for recovering fromnon-understandings  question: can we learn a better policy from data?  decision theoretic approach: - learn to predict likelihood of success for each strategy - use features available at runtime - stepwise logistic regression (good class posterior probabilities) - compute expected utility for each strategy - choose strategy with maximum expected utility - preliminary results promising, a new experiment needed for validation predicting likelihood of success Reprompt49.2% → 32.8% YouCanSay48.6% → 34.3% TerseYouCanSay43.5% → 32.6% MoveOn35.6% → 30.0% DetailedReprompt37.7% → 34.4% for 5 of 10 strategies models perform better than a majority baseline, on both soft and hard error majority baseline error → cross-validation error  error handling strategies are implemented as library dialog agents  new strategies can be plugged in as they are developed  goal: task-independent, adaptive and scalable error handling architecture  approach: - error handling strategies and error handling decision process are decoupled from the dialog task → reusability, uniformity, plug-and-play strategies, lessens development effort - error handling decision process implemented in a distributed fashion - local concept error handling decision process (handles potential misunderstandings) - local request error handling decision process (handles non-understandings) - currently implemented as POMDPs (for concepts) and MDPs (for request agents) date start_time end_time concept error handling MDP request error handling MDP concept error handling MDP request error handling MDP 5 6


Download ppt "Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University ("

Similar presentations


Ads by Google