Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Alex Rudnicky Carnegie Mellon University – 2002

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 2 What’s a Spoken Dialog System ?  Human talking to a computer  Taking turns in a goal-oriented dialog

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 3 Why Spoken Language Interfaces ?  Speech: advantages and problems Speech is the natural communication modality for humans Can easily express fairly complex structures Works well in hands- or eyes-busy situations  Serial channel  It is still an unreliable channel

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 4 Sample Spoken Dialog Systems  Interactive Voice Response systems (IVR)  Information Access Systems  Air-travel planning (Communicator)  Weather info over the phone (Jupiter)  E-mail access over the phone (ELVIS)  UA Baggage claims (Simon)  Other Systems: guidance, personal assistants, taskable agents, etc.

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 5 A Look Under the Hood … S:Where are you flying from ? U:from London to Paris and then on to Toronto D:from London to Paris on then on to go on to SI:depart_location = London, arrive_location = Paris SO:query depart_time NL:And when do you want to leave ? S:And when do you want to leave ? Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 6 Roadmap  Intro to Spoken Dialog Systems  The Problem: Misunderstandings  A Learning Solution  Experiments and Results  Conclusion

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 7 Speech Recognition  Speech Recognition is the main drive behind the development of SDS.  But it is problematic:  Input signal quality  Accents, Non-native speakers  Spoken language disfluencies: stutters, false- starts, /mm/, /um/  Typical Word Error Rates: 20-30%

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 8 Recognition Errors Impact  Errors propagate to upper levels :  compromise the parse  non-understandings  propagate to Dialog Control level  misunderstandings, which can have a higher cost, if acted upon. Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 9 Misunderstandings in Action ! S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 10 Addressing the Problem  Wait for SR technology to reach better performance. Increase the robustness of systems when faced with poor recognition:  Detect Misunderstandings  Use Recovery Techniques

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 11 Problem Formulation  Given an input utterance, and the current state of the system, detect whether it was correctly perceived by the system or not. (confidence annotation problem)

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 12 Roadmap  Intro to Spoken Dialog Systems  The Problem: Detecting Misunderstandings  A Learning Solution  Experiments and Results  Conclusion

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 13 A Classification Task  Cast the problem as a classification task  Heuristic approach  “Garble” rule previously used in Communicator  Data-driven (learning) approach UtteranceGOOD / BAD Classifier (Features)

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 14 A Data-Driven Approach  Machine learning approach  Learn to classify from a labeled training corpus  Use it to classify new instances Features Classifier (Learn Mode) GOOD/BAD Features Classifier GOOD/BAD

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 15 Ingredients  Three ingredients needed for a machine learning approach:  Corpus of labeled data to use for training  Identify a set of relevant features  Choose a classification technique

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 16 Roadmap  Intro to Spoken Dialog Systems  The Problem: Misunderstandings  A Learning Solution  Training corpus  Features  Classification techniques  Experiments and Results  Conclusion

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 17 Corpus – Sources  Collected 2 months of sessions  October and November 1999  About 300 sessions  Both developer and outsider calls  Eliminated conversations with < 5 turns  Developers calling to check if system is on-line  Wrong number calls

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 18 Corpus – Structure  The Logs  Generated automatically by various system modules  Serve as a source of features for classification (also contain the decoded utterances)  The Transcripts (the actual utterances)  Performed and double-checked by a human annotator  Provide a basis for labeling

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 19 Corpus – Labeling  Labeling was done at the concept level.  Four possible labels:  OK:The concept is okay  RBAD:Recognition is bad  PBAD:Parse is bad  OOD:Out of domain  Aggregate utterance labels generated automatically.

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 20 Corpus – Sample Labeling  Only 6% of the utterances actually contained mixed-type concept labels ! Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded utterance: from London to Paris on then on to go on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[go on] Labeling:depart_loc:OK arrive_loc:OK interj:OK resume:RBAD Aggregate Label: BAD

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 21 Corpus – Summary  Started with 2 months of dialog sessions  Eliminated short, ill-formed sessions  Transcribed the corpus  Labeled it at the concept level  Discarded mixed-label utterances  4550 binary labeled utterances  311 dialogs

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 22 Features – Sources  Traditionally, features are extracted from the Speech Recognition layer [Chase].  In a SDS, there are at least 2 other orthogonal knowledge sources:  The Parser  The Dialog Manager Speech Parsing Dialog Features

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 23 Features – Speech Recog.  WordNumber (11)  UnconfidentPerc = % of unconfident words (9%)  this feature already captures other decoder level features Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on] Speech Parsing Dialog

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 24 Features – Parser Level  UncoveredPerc = % of words uncovered by the parse (36%)  GapNumber = # of unparsed fragments (3)  FragmentationScore = # of transitions between parsed and unparsed fragments (5)  Garble = flag computed by a heuristic rule based on parse coverage and fragmentation Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 25 Features – Parser Level (2)  ConceptBigram = bigram concept model score:  P(c 1 … c n )  P(c n | c n-1 ) P(c n-1 | c n-2 )… P(c 2 | c 1 )P(c 1 )  Probabilities trained from a corpus  ConceptNumber (4) Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 26 Features – Dlg Mng. Level  DialogState = the current state of the DM  StateDuration = for how many turns did the DM remain in the same state  TurnNumber = how many turns since the beginning of the session  ExpectedConcepts = indicates if the concepts correspond to the expectation of the DM. Speech Parsing Dialog

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 27 Features – Summary 12 Features from 3 levels in the system:  Speech Level Features:  WordNumber, UnconfidentPerc  Parsing Level Features:  UncoveredPerc, FragmentationScore, GapNumber, Garble, ConceptBigram, ConceptNumber  Dialog Management Level Features:  DialogState, StateDuration, TurnNumber, ExpectedConcepts Speech Parsing Dialog

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 28 Classification Techniques  Bayesian Networks  Boosting  Decision Tree  Artificial Neural Networks  Support Vector Machine  Naïve Bayes

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 29 Roadmap  Intro to Spoken Dialog Systems  The Problem: Detecting Misunderstandings  A Learning Approach  Training corpus  Features  Classification techniques  Experiments and Results  Conclusion

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 30 Experimental Setup  Performance metric: classification error rate  2 Performance baselines:  “Random” baseline = 32.84%  “Heuristic” baseline = 25.69%  Used a 10-fold cross-validation process  Build confidence intervals for the error rates  Do statistical analysis of the differences in performance exhibited by the classifiers

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 31 Results – Individual Features RankFeatureLevelMean Err.Graphic 1.UncoveredPercParsing19.93% 2.ExpectedConceptsDialog Manag.20.97% 3.GapNumberParsing23.01% 4.ConceptBigramParsing23.14% 5.GarbleParsing/Recog.25.32% 6.ConceptNumberParsing25.69% 7.UnconfidentPercRecognition27.34% 8.DialogStateDialog Manag.31.03% 9.WordNumberRecognition32.33% 10.FragmentationScoreParsing32.73% 11.StateDurationDialog Manag.32.84% 12.TurnNumberDialog Manag.33.14%

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 32 Results – Classifiers ClassifierMean ErrorGraphic Random Baseline32.84% “Heuristic” Baseline25.69% AdaBoost16.59% Decision Tree17.32% Bayesian Network17.82% SVM18.40% Neural Network18.90% Naïve Bayes21.65%

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 33 An in Depth Look at Error Rates OKBAD Classifier says OKTPFP Classifier says BADFNTN FP = False acceptance FN = False rejection Error Rate = FP + FN CDR = TN/(TN+FP) = 1-(FP/NBAD)

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 34 Results – Classifiers (cont’d) ClassifierMean ErrorF/P RateF/N Rate Random Baseline32.84% 0.00% “Heuristic” Baseline25.32%25.30%0.02% AdaBoost16.59%11.43%5.16% Decision Tree17.32%11.82%5.49% Bayesian Network17.82%9.41%8.42% SVM18.40%15.01%3.39% Neural Network18.90%15.08%3.82% Naïve Bayes21.65%14.24%7.41% 77.4 % Correct detection rate

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 35 Conclusion  Spoken Dialog System performance is strongly impaired by misunderstandings  Increase the robustness of systems when faced with poor recognition:  Detect Misunderstandings  Use Recovery Techniques

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 36 Conclusion (cont’d)  Data-driven classification task  Corpus  12 Features from 3 levels in the system  Empirically compared 6 classification techniques  Data-Driven Misunderstanding Detector  Significant improvement over previous heuristic classifier  Correctly detect 74% of the misunderstandings

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 37 Future Work  Detect Misunderstandings  Improve performance by adding new features  Identify the source of the error  Use Recovery Techniques  Incorporate the confidence score into the Dialog Management process

03-08-2002Detecting Misunderstandings in the CMU Communicator…Page 38 Pointers  “Is This Conversation On Track?”, P.Carpenter, C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky, Eurospeech 2001, Aalborg, Denmark  CMU Communicator  1-412-268-1084  www.cs.cmu.edu/~dbohus/SDS www.cs.cmu.edu/~dbohus/SDS

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Similar presentations

Presentation on theme: "Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Similar presentations

Presentation on theme: "Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,"— Presentation transcript:

Similar presentations

About project

Feedback