Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.

Slides:



Advertisements
Similar presentations
( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.
Advertisements

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Linear Regression.
Brief introduction on Logistic Regression
Imbalanced data David Kauchak CS 451 – Fall 2013.
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (
Supervised Learning Recap
What is Statistical Modeling
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University.
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus
A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.
misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,
Information Modeling: The process and the required competencies of its participants Paul Frederiks Theo van der Weide.
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Preference Elicitation in Scheduling Problems Ulaş Bardak Ph.D. Thesis Proposal Committee Jaime Carbonell, Eugene Fink, Stephen Smith, Sven Koenig (University.
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.
A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Logistic Regression William Cohen.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
Adversarial Learning for Neural Dialogue Generation
Conditional Random Fields for ASR
For Evaluating Dialog Error Conditions Based on Acoustic Information
Spoken Dialogue Systems
Ying shen Sse, tongji university Sep. 2016
Spoken Dialogue Systems
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Discriminative Training
Presentation transcript:

Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004

2 Misunderstandings  Misunderstandings are an important problem in spoken dialog systems  System obtains an incorrect semantic interpretation of the users’ utterance  15-40% of turns  Significant negative impact on overall success rate

3 Confidence annotation  Use confidence scores to guard against potential misunderstandings  Traditionally: from speech recognition engine [Chase, Bansal, Cox, Kemp, etc]  Focuses on WER, not tuned to task at hand  More recently: system-specific semantic confidence scores [Carpenter, Walker, San-Segundo, etc]  Integrate knowledge from different levels in the system: speech recognition, language understanding, dialog management

4 Correction Detection  Detect whether or not the user is trying to correct the system  Related: aware-site detection  Similar ML approaches using multiple sources of knowledge [Litman, Swerts, Krahmer, etc]

5 S: Where are you flying from? U: [CityName={Aspen/0.6; Austin/0.2}] S: Did you say you wanted to fly out of Aspen? U: [No/0.6] [CityName={Boston/0.8}] Proposed: Belief Updating  Integrate confidence annotation and correction detection in a unified framework for continuously tracking beliefs [CityName={Aspen/?; Austin/?; Boston/?}]  A “belief updating” problem: initial belief + system action + user response updated belief

6 Formally…  Given:  An initial belief P initial (C) over concept C  A system action SA  A user response R  Construct an updated belief P updated (C)  As “accurate” as possible  P updated (C) ← f (P initial (C), SA, R)

7 Examples

8 Examples - continued

9 Outline  Introduction  Data  A simplified version of the problem. Approach  User behaviors  Learning: Preliminary results  More on evaluation  Where to from here? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

10 Data  Collected in an experiment with RoomLine  Phone-based, mixed initiative system for making conference room reservations  Equipped with explicit and implicit confirmations  Corpus statistics  46 participants  449 sessions, 8278 turns  13.5% misunderstandings [9.8% / 22.5%]  25.6% WER [19.6% / 39.5%]  concept updates data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

11 System actions and concept updates  Explicit and implicit confirmations Start time: Explicit Confirmation/grounding [EC] Date: Implicit Confirmation/grounding [IC] data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

12 System actions and concept updates Date: Implicit Confirmation/grounding [IC] Start time: Implicit Confirmation/grounding [IC] End time: Implicit Confirmation/task [ICT]  Implicit Confirmations Task data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

13 # of Conflicting Hypotheses  Below 3% involve more than 1 hypothesis  System not using multiple hypotheses  [Future work: regenerate multiple hypotheses in batch] data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

14 Outline  Introduction  Data  A simplified version of the problem. Approach  User behaviors  Learning: preliminary results  More on evaluation  Where to from here? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

15 A Simplified Version Given only 3% have more than 1 hypothesis,  Update belief in the top-hypothesis after implicit and explicit confirmations  Instead of  P updated (C) ← f (P initial (C), SA, R)  Do  ConfTop updated (C) ← f (ConfTop initial (C), SA, R)  For SA = {EC, IC, ICT} data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

16 Approach  Use machine learning  Dataset  Concept updates for EC, IC, ICTs  Features  Initial confidence score ConfTop initial (C)  System action (SA)  User response (R)  Target  Updated confidence score ConfTop updated (C)  Data is labeled, so we have a binary target data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

17 Outline  Introduction  Data  A simplified version of the problem. Approach  User behaviors  Learning: preliminary results  More on evaluation  Where to from here? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

18 User behaviors  Study of user behaviors in response to ICs and ECs  Can inform feature selection and feature development  Provide insights into where the difficulties are  Can inform potential strategy refinements data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

19 User responses to ECs  Transcripts  Decoded YESNOOther CORRECT 1097 [94.2% of cor] 862 INCORRECT3 202 [69.9% of inc] 84 YESNOOther CORRECT 1016 [87.3% of cor] INCORRECT2 171 [69.9% of inc] 116 ~10% data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

20 “Other” Responses to EC  “Eyeball” estimates (out of 146 responses)  ~70% simply repeat the correct concept value That should come in as a handy feature  ~10% change conversation focus  ~10% turn overtaking issues Maybe inhibit barge-in until Antoine finishes his thesis  ~10% other data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

21 User responses to ICs  Transcripts  Decoded YESNOOther CORRECT 166 [31.3% of cor] INCORRECT15 75 [31.5% of inc] 148 YESNOOther CORRECT 151 [28.5% of cor] INCORRECT16 62 [26.1% of inc] 160 data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

22 Users Don’t Always Correct ICs  Actually, they corrected in 45% of the cases User does not correct User corrects CORRECT5571 INCORRECT 126 [55% of incor] 104 [45% of incor]  That means if we knew exactly when they correct, we’d still have (126+1)/788 = 16% error  So what do users do when they don’t correct?  They may actually correct partially  Completely ignore the error … (if non-essential)  Readjust to accommodate task data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

23 More questions…  Understand better this “ignore” phenomenon  Impact on task success? IC correction rate: 49% (successful tasks) vs 41% (unsuccessful) Fixed vs more “flexible” scenarios  Impact of prompt length on P(user will correct)?  “Essential” vs “non-essential” concepts? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

24 Outline  Introduction  Data  A simplified version of the problem. Approach  User behaviors  Learning: preliminary results  More on evaluation  Where to from here? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

25 Which ML technique?  Need good probability outputs  Margins produced by discriminant classifiers are inadequate  If you want probability scores, i.e. conf = 0.85 means that in 85% of cases with conf=0.85 the concept is right evaluate on a soft-metric [I’ll contradict myself later!! ]  Step-wise logistic regression  Sample-efficient  Feature selection  Good soft-metric performance optimizes for avg. log likelihood of data data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

26 Data. Features  For each system action {EC, IC, ICT}  Initial Confidence score  Other indicators about current state: How well has the dialog been going Which concept are we talking about How far back was this concept acquired  Features on user response Confirmation and Disconfirmation markers Acoustic / Prosodic: f0 (min, max, range, maxslope, etc) + normalized versions Num words; turn length (secs) Concept information: expected / repeated / new concepts and grammar slots… Confidence Barge-in & Timeout info Lexical features (preselected by MI with “target” or confirm/disconfirm markers) data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

27 Results  Actually using a 1-level logistic model-tree  Split on answer_type = {yes, no, other, no_parse}  Perform step-wise logistic regression on the 4 leaves P-entry = 0.05 P-reject = 0.30 BIC stopping criterion  Also tried full-blown model tree, results are similar, maybe marginally worse data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

28 Explicit Confirmation HARDSOFT Initial31.1% Heuristic8.6% LMT(CV)3.7% LMT(training)2.9% data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

29 Implicit Confirmation HARDSOFT Initial31.4% Heuristic24.0% LMT(CV)19.6% LMT(training)18.8% Oracle Baseline16.1%- data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

30 Outline  Introduction  Data  A simplified version of the problem. Approach  User behaviors  Learning: preliminary results  More on evaluation  Where to from here? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

31 What can Logistic Regression / AVG-LL do for you?  D = {d 1, d 2, d 3, d 4, …} d i = 1/0  P(D) = ∏P(d i =1 | x i )  Express density P(d i =1 | x i ) as:  P(d=1 | x ) = 1 / (1 + exp(- wx )) You can actually derive this if you start with P(x | d) gaussian  Find parameters w to max(P(D))  argmax(P(D)) = argmax ∏P(d i =1 | x i )  argmax(P(D)) = argmin ∑-log(P(d i =1 | x i ))  Hence we maximize the average log-likelihood  But what does that mean? data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

32 Loss function in Logistic Regression  Log-likelihood loss function If d=1, then P(d=1)=0.01 is ten times worse than P(d=1)=0.1, but P(d=1)=0.7 is about the same as P(d=1)=0.8 Things are mirrored for d=0 d=1 This does not match the “threshold” model commonly used to engage actions data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

33 A New Loss Function: T2  A loss function that better matches our domain: T2 (or even T3)  Optimize argmax ∑ T2(P(d i =c | x i ))  Not differentiable   Not convex  0 t1 t2 1 d=1 C1 C2 0 t1 t2 1 d=0 C3 C4 data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

34 Smoothed version  A loss function that better matches our domain: T2 (or even T3)  Optimize argmax ∑ SmoothT2(P(d i =c | x i ))  Differentiable!  But still not convex  … multiple local maxima 0 t1 t2 1 d=1 C1 C2 SmoothT2(p) = σ 1 (p) + σ 2 (p) σ i (p) = 1 / (1+exp(k i (p-θ i ))) with ks and θs chosen accordingly data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

35 Costs & Thresholds  Costs: where from?  “Expert” knowledge  Derive from data (might be tricky)  Thresholds: where from?  Fixed  Actually optimize at the same time SmoothT2 = SmoothT2(w, th1, th2) Differentiable in th1 and th2, so we can do gradient search for it Calibrates in one step both the belief updating and the threshold to minimize loss data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

36 Questions: What Next?  ICT: can we do anything there?  Looks really tough  Push for better performance  … Add more features?  … Debug the models more, eliminate singularities  … Why doesn’t the model-tree do better?  Push for better understanding  … What are the other interesting questions …  Optimize for new loss function  More in the future: look at the full belief updating problem data : problem/approach : user behaviors : preliminary results : more on evaluation : what next?

37 Thank You!

38 Encoding System Actions  For each concept update, define system action signature:  IC: Implicit Confirm [grounding]  ICT: Implicit Confirm [task]  EC: Explicit Confirm  REQ: Request  Each variable can have 1 of 4 values  0  C (action happens on concept of interest)  OC (action happens on some other concept)  C&OC (action happens both on concept of interest and some other concept)  Only certain combinations are valid and appear in the data