Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Alex Rudnicky Carnegie Mellon University – 2002
Detecting Misunderstandings in the CMU Communicator…Page 2 What’s a Spoken Dialog System ? Human talking to a computer Taking turns in a goal-oriented dialog
Detecting Misunderstandings in the CMU Communicator…Page 3 Why Spoken Language Interfaces ? Speech: advantages and problems Speech is the natural communication modality for humans Can easily express fairly complex structures Works well in hands- or eyes-busy situations Serial channel It is still an unreliable channel
Detecting Misunderstandings in the CMU Communicator…Page 4 Sample Spoken Dialog Systems Interactive Voice Response systems (IVR) Information Access Systems Air-travel planning (Communicator) Weather info over the phone (Jupiter) access over the phone (ELVIS) UA Baggage claims (Simon) Other Systems: guidance, personal assistants, taskable agents, etc.
Detecting Misunderstandings in the CMU Communicator…Page 5 A Look Under the Hood … S:Where are you flying from ? U:from London to Paris and then on to Toronto D:from London to Paris on then on to go on to SI:depart_location = London, arrive_location = Paris SO:query depart_time NL:And when do you want to leave ? S:And when do you want to leave ? Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend
Detecting Misunderstandings in the CMU Communicator…Page 6 Roadmap Intro to Spoken Dialog Systems The Problem: Misunderstandings A Learning Solution Experiments and Results Conclusion
Detecting Misunderstandings in the CMU Communicator…Page 7 Speech Recognition Speech Recognition is the main drive behind the development of SDS. But it is problematic: Input signal quality Accents, Non-native speakers Spoken language disfluencies: stutters, false- starts, /mm/, /um/ Typical Word Error Rates: 20-30%
Detecting Misunderstandings in the CMU Communicator…Page 8 Recognition Errors Impact Errors propagate to upper levels : compromise the parse non-understandings propagate to Dialog Control level misunderstandings, which can have a higher cost, if acted upon. Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend
Detecting Misunderstandings in the CMU Communicator…Page 9 Misunderstandings in Action ! S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]
Detecting Misunderstandings in the CMU Communicator…Page 10 Addressing the Problem Wait for SR technology to reach better performance. Increase the robustness of systems when faced with poor recognition: Detect Misunderstandings Use Recovery Techniques
Detecting Misunderstandings in the CMU Communicator…Page 11 Problem Formulation Given an input utterance, and the current state of the system, detect whether it was correctly perceived by the system or not. (confidence annotation problem)
Detecting Misunderstandings in the CMU Communicator…Page 12 Roadmap Intro to Spoken Dialog Systems The Problem: Detecting Misunderstandings A Learning Solution Experiments and Results Conclusion
Detecting Misunderstandings in the CMU Communicator…Page 13 A Classification Task Cast the problem as a classification task Heuristic approach “Garble” rule previously used in Communicator Data-driven (learning) approach UtteranceGOOD / BAD Classifier (Features)
Detecting Misunderstandings in the CMU Communicator…Page 14 A Data-Driven Approach Machine learning approach Learn to classify from a labeled training corpus Use it to classify new instances Features Classifier (Learn Mode) GOOD/BAD Features Classifier GOOD/BAD
Detecting Misunderstandings in the CMU Communicator…Page 15 Ingredients Three ingredients needed for a machine learning approach: Corpus of labeled data to use for training Identify a set of relevant features Choose a classification technique
Detecting Misunderstandings in the CMU Communicator…Page 16 Roadmap Intro to Spoken Dialog Systems The Problem: Misunderstandings A Learning Solution Training corpus Features Classification techniques Experiments and Results Conclusion
Detecting Misunderstandings in the CMU Communicator…Page 17 Corpus – Sources Collected 2 months of sessions October and November 1999 About 300 sessions Both developer and outsider calls Eliminated conversations with < 5 turns Developers calling to check if system is on-line Wrong number calls
Detecting Misunderstandings in the CMU Communicator…Page 18 Corpus – Structure The Logs Generated automatically by various system modules Serve as a source of features for classification (also contain the decoded utterances) The Transcripts (the actual utterances) Performed and double-checked by a human annotator Provide a basis for labeling
Detecting Misunderstandings in the CMU Communicator…Page 19 Corpus – Labeling Labeling was done at the concept level. Four possible labels: OK:The concept is okay RBAD:Recognition is bad PBAD:Parse is bad OOD:Out of domain Aggregate utterance labels generated automatically.
Detecting Misunderstandings in the CMU Communicator…Page 20 Corpus – Sample Labeling Only 6% of the utterances actually contained mixed-type concept labels ! Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded utterance: from London to Paris on then on to go on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[go on] Labeling:depart_loc:OK arrive_loc:OK interj:OK resume:RBAD Aggregate Label: BAD
Detecting Misunderstandings in the CMU Communicator…Page 21 Corpus – Summary Started with 2 months of dialog sessions Eliminated short, ill-formed sessions Transcribed the corpus Labeled it at the concept level Discarded mixed-label utterances 4550 binary labeled utterances 311 dialogs
Detecting Misunderstandings in the CMU Communicator…Page 22 Features – Sources Traditionally, features are extracted from the Speech Recognition layer [Chase]. In a SDS, there are at least 2 other orthogonal knowledge sources: The Parser The Dialog Manager Speech Parsing Dialog Features
Detecting Misunderstandings in the CMU Communicator…Page 23 Features – Speech Recog. WordNumber (11) UnconfidentPerc = % of unconfident words (9%) this feature already captures other decoder level features Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on] Speech Parsing Dialog
Detecting Misunderstandings in the CMU Communicator…Page 24 Features – Parser Level UncoveredPerc = % of words uncovered by the parse (36%) GapNumber = # of unparsed fragments (3) FragmentationScore = # of transitions between parsed and unparsed fragments (5) Garble = flag computed by a heuristic rule based on parse coverage and fragmentation Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]
Detecting Misunderstandings in the CMU Communicator…Page 25 Features – Parser Level (2) ConceptBigram = bigram concept model score: P(c 1 … c n ) P(c n | c n-1 ) P(c n-1 | c n-2 )… P(c 2 | c 1 )P(c 1 ) Probabilities trained from a corpus ConceptNumber (4) Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]
Detecting Misunderstandings in the CMU Communicator…Page 26 Features – Dlg Mng. Level DialogState = the current state of the DM StateDuration = for how many turns did the DM remain in the same state TurnNumber = how many turns since the beginning of the session ExpectedConcepts = indicates if the concepts correspond to the expectation of the DM. Speech Parsing Dialog
Detecting Misunderstandings in the CMU Communicator…Page 27 Features – Summary 12 Features from 3 levels in the system: Speech Level Features: WordNumber, UnconfidentPerc Parsing Level Features: UncoveredPerc, FragmentationScore, GapNumber, Garble, ConceptBigram, ConceptNumber Dialog Management Level Features: DialogState, StateDuration, TurnNumber, ExpectedConcepts Speech Parsing Dialog
Detecting Misunderstandings in the CMU Communicator…Page 28 Classification Techniques Bayesian Networks Boosting Decision Tree Artificial Neural Networks Support Vector Machine Naïve Bayes
Detecting Misunderstandings in the CMU Communicator…Page 29 Roadmap Intro to Spoken Dialog Systems The Problem: Detecting Misunderstandings A Learning Approach Training corpus Features Classification techniques Experiments and Results Conclusion
Detecting Misunderstandings in the CMU Communicator…Page 30 Experimental Setup Performance metric: classification error rate 2 Performance baselines: “Random” baseline = 32.84% “Heuristic” baseline = 25.69% Used a 10-fold cross-validation process Build confidence intervals for the error rates Do statistical analysis of the differences in performance exhibited by the classifiers
Detecting Misunderstandings in the CMU Communicator…Page 31 Results – Individual Features RankFeatureLevelMean Err.Graphic 1.UncoveredPercParsing19.93% 2.ExpectedConceptsDialog Manag.20.97% 3.GapNumberParsing23.01% 4.ConceptBigramParsing23.14% 5.GarbleParsing/Recog.25.32% 6.ConceptNumberParsing25.69% 7.UnconfidentPercRecognition27.34% 8.DialogStateDialog Manag.31.03% 9.WordNumberRecognition32.33% 10.FragmentationScoreParsing32.73% 11.StateDurationDialog Manag.32.84% 12.TurnNumberDialog Manag.33.14%
Detecting Misunderstandings in the CMU Communicator…Page 32 Results – Classifiers ClassifierMean ErrorGraphic Random Baseline32.84% “Heuristic” Baseline25.69% AdaBoost16.59% Decision Tree17.32% Bayesian Network17.82% SVM18.40% Neural Network18.90% Naïve Bayes21.65%
Detecting Misunderstandings in the CMU Communicator…Page 33 An in Depth Look at Error Rates OKBAD Classifier says OKTPFP Classifier says BADFNTN FP = False acceptance FN = False rejection Error Rate = FP + FN CDR = TN/(TN+FP) = 1-(FP/NBAD)
Detecting Misunderstandings in the CMU Communicator…Page 34 Results – Classifiers (cont’d) ClassifierMean ErrorF/P RateF/N Rate Random Baseline32.84% 0.00% “Heuristic” Baseline25.32%25.30%0.02% AdaBoost16.59%11.43%5.16% Decision Tree17.32%11.82%5.49% Bayesian Network17.82%9.41%8.42% SVM18.40%15.01%3.39% Neural Network18.90%15.08%3.82% Naïve Bayes21.65%14.24%7.41% 77.4 % Correct detection rate
Detecting Misunderstandings in the CMU Communicator…Page 35 Conclusion Spoken Dialog System performance is strongly impaired by misunderstandings Increase the robustness of systems when faced with poor recognition: Detect Misunderstandings Use Recovery Techniques
Detecting Misunderstandings in the CMU Communicator…Page 36 Conclusion (cont’d) Data-driven classification task Corpus 12 Features from 3 levels in the system Empirically compared 6 classification techniques Data-Driven Misunderstanding Detector Significant improvement over previous heuristic classifier Correctly detect 74% of the misunderstandings
Detecting Misunderstandings in the CMU Communicator…Page 37 Future Work Detect Misunderstandings Improve performance by adding new features Identify the source of the error Use Recovery Techniques Incorporate the confidence score into the Dialog Management process
Detecting Misunderstandings in the CMU Communicator…Page 38 Pointers “Is This Conversation On Track?”, P.Carpenter, C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky, Eurospeech 2001, Aalborg, Denmark CMU Communicator