Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Slides:



Advertisements
Similar presentations
Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
AN IMPROVED AUDIO Jenn Tam Computer Science Dept. Carnegie Mellon University SOAPS 2008, Pittsburgh, PA.
constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
The Nature of Statistical Learning Theory by V. Vapnik
Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.
Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,
Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical.
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.
A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Measuring Language Development in Children: A Case Study of Grammar Checking in Child Language Transcripts Khairun-nisa Hassanali and Yang Liu {nisa,
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 / 12 PSLC Summer School, June 21, 2007 Identifying Students’ Gradual Understanding of Physics Concepts Using TagHelper Tools Nava L.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Sign Classification Boosted Cascade of Classifiers using University of Southern California Thang Dinh Eunyoung Kim
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Introduction to Machine Learning
Spoken Dialogue Systems
Asymmetric Gradient Boosting with Application to Spam Filtering
Spoken Dialogue Systems
CSCI 5832 Natural Language Processing
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Speech recognition, machine learning
Speech recognition, machine learning
Presentation transcript:

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Alex Rudnicky Carnegie Mellon University – 2002

Detecting Misunderstandings in the CMU Communicator…Page 2 What’s a Spoken Dialog System ?  Human talking to a computer  Taking turns in a goal-oriented dialog

Detecting Misunderstandings in the CMU Communicator…Page 3 Why Spoken Language Interfaces ?  Speech: advantages and problems Speech is the natural communication modality for humans Can easily express fairly complex structures Works well in hands- or eyes-busy situations  Serial channel  It is still an unreliable channel

Detecting Misunderstandings in the CMU Communicator…Page 4 Sample Spoken Dialog Systems  Interactive Voice Response systems (IVR)  Information Access Systems  Air-travel planning (Communicator)  Weather info over the phone (Jupiter)  access over the phone (ELVIS)  UA Baggage claims (Simon)  Other Systems: guidance, personal assistants, taskable agents, etc.

Detecting Misunderstandings in the CMU Communicator…Page 5 A Look Under the Hood … S:Where are you flying from ? U:from London to Paris and then on to Toronto D:from London to Paris on then on to go on to SI:depart_location = London, arrive_location = Paris SO:query depart_time NL:And when do you want to leave ? S:And when do you want to leave ? Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend

Detecting Misunderstandings in the CMU Communicator…Page 6 Roadmap  Intro to Spoken Dialog Systems  The Problem: Misunderstandings  A Learning Solution  Experiments and Results  Conclusion

Detecting Misunderstandings in the CMU Communicator…Page 7 Speech Recognition  Speech Recognition is the main drive behind the development of SDS.  But it is problematic:  Input signal quality  Accents, Non-native speakers  Spoken language disfluencies: stutters, false- starts, /mm/, /um/  Typical Word Error Rates: 20-30%

Detecting Misunderstandings in the CMU Communicator…Page 8 Recognition Errors Impact  Errors propagate to upper levels :  compromise the parse  non-understandings  propagate to Dialog Control level  misunderstandings, which can have a higher cost, if acted upon. Speech Recognition Dialog Management Language Generation Synthesis U DSI SONLS Semanticizer (Parsing) Backend

Detecting Misunderstandings in the CMU Communicator…Page 9 Misunderstandings in Action ! S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

Detecting Misunderstandings in the CMU Communicator…Page 10 Addressing the Problem  Wait for SR technology to reach better performance. Increase the robustness of systems when faced with poor recognition:  Detect Misunderstandings  Use Recovery Techniques

Detecting Misunderstandings in the CMU Communicator…Page 11 Problem Formulation  Given an input utterance, and the current state of the system, detect whether it was correctly perceived by the system or not. (confidence annotation problem)

Detecting Misunderstandings in the CMU Communicator…Page 12 Roadmap  Intro to Spoken Dialog Systems  The Problem: Detecting Misunderstandings  A Learning Solution  Experiments and Results  Conclusion

Detecting Misunderstandings in the CMU Communicator…Page 13 A Classification Task  Cast the problem as a classification task  Heuristic approach  “Garble” rule previously used in Communicator  Data-driven (learning) approach UtteranceGOOD / BAD Classifier (Features)

Detecting Misunderstandings in the CMU Communicator…Page 14 A Data-Driven Approach  Machine learning approach  Learn to classify from a labeled training corpus  Use it to classify new instances Features Classifier (Learn Mode) GOOD/BAD Features Classifier GOOD/BAD

Detecting Misunderstandings in the CMU Communicator…Page 15 Ingredients  Three ingredients needed for a machine learning approach:  Corpus of labeled data to use for training  Identify a set of relevant features  Choose a classification technique

Detecting Misunderstandings in the CMU Communicator…Page 16 Roadmap  Intro to Spoken Dialog Systems  The Problem: Misunderstandings  A Learning Solution  Training corpus  Features  Classification techniques  Experiments and Results  Conclusion

Detecting Misunderstandings in the CMU Communicator…Page 17 Corpus – Sources  Collected 2 months of sessions  October and November 1999  About 300 sessions  Both developer and outsider calls  Eliminated conversations with < 5 turns  Developers calling to check if system is on-line  Wrong number calls

Detecting Misunderstandings in the CMU Communicator…Page 18 Corpus – Structure  The Logs  Generated automatically by various system modules  Serve as a source of features for classification (also contain the decoded utterances)  The Transcripts (the actual utterances)  Performed and double-checked by a human annotator  Provide a basis for labeling

Detecting Misunderstandings in the CMU Communicator…Page 19 Corpus – Labeling  Labeling was done at the concept level.  Four possible labels:  OK:The concept is okay  RBAD:Recognition is bad  PBAD:Parse is bad  OOD:Out of domain  Aggregate utterance labels generated automatically.

Detecting Misunderstandings in the CMU Communicator…Page 20 Corpus – Sample Labeling  Only 6% of the utterances actually contained mixed-type concept labels ! Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded utterance: from London to Paris on then on to go on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[go on] Labeling:depart_loc:OK arrive_loc:OK interj:OK resume:RBAD Aggregate Label: BAD

Detecting Misunderstandings in the CMU Communicator…Page 21 Corpus – Summary  Started with 2 months of dialog sessions  Eliminated short, ill-formed sessions  Transcribed the corpus  Labeled it at the concept level  Discarded mixed-label utterances  4550 binary labeled utterances  311 dialogs

Detecting Misunderstandings in the CMU Communicator…Page 22 Features – Sources  Traditionally, features are extracted from the Speech Recognition layer [Chase].  In a SDS, there are at least 2 other orthogonal knowledge sources:  The Parser  The Dialog Manager Speech Parsing Dialog Features

Detecting Misunderstandings in the CMU Communicator…Page 23 Features – Speech Recog.  WordNumber (11)  UnconfidentPerc = % of unconfident words (9%)  this feature already captures other decoder level features Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on] Speech Parsing Dialog

Detecting Misunderstandings in the CMU Communicator…Page 24 Features – Parser Level  UncoveredPerc = % of words uncovered by the parse (36%)  GapNumber = # of unparsed fragments (3)  FragmentationScore = # of transitions between parsed and unparsed fragments (5)  Garble = flag computed by a heuristic rule based on parse coverage and fragmentation Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]

Detecting Misunderstandings in the CMU Communicator…Page 25 Features – Parser Level (2)  ConceptBigram = bigram concept model score:  P(c 1 … c n )  P(c n | c n-1 ) P(c n-1 | c n-2 )… P(c 2 | c 1 )P(c 1 )  Probabilities trained from a corpus  ConceptNumber (4) Speech Parsing Dialog Transcript:#noise# from London to Paris and then on to Toronto #noise# Decoded:from London to Paris on then on to ?go? on to Parse:depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]

Detecting Misunderstandings in the CMU Communicator…Page 26 Features – Dlg Mng. Level  DialogState = the current state of the DM  StateDuration = for how many turns did the DM remain in the same state  TurnNumber = how many turns since the beginning of the session  ExpectedConcepts = indicates if the concepts correspond to the expectation of the DM. Speech Parsing Dialog

Detecting Misunderstandings in the CMU Communicator…Page 27 Features – Summary 12 Features from 3 levels in the system:  Speech Level Features:  WordNumber, UnconfidentPerc  Parsing Level Features:  UncoveredPerc, FragmentationScore, GapNumber, Garble, ConceptBigram, ConceptNumber  Dialog Management Level Features:  DialogState, StateDuration, TurnNumber, ExpectedConcepts Speech Parsing Dialog

Detecting Misunderstandings in the CMU Communicator…Page 28 Classification Techniques  Bayesian Networks  Boosting  Decision Tree  Artificial Neural Networks  Support Vector Machine  Naïve Bayes

Detecting Misunderstandings in the CMU Communicator…Page 29 Roadmap  Intro to Spoken Dialog Systems  The Problem: Detecting Misunderstandings  A Learning Approach  Training corpus  Features  Classification techniques  Experiments and Results  Conclusion

Detecting Misunderstandings in the CMU Communicator…Page 30 Experimental Setup  Performance metric: classification error rate  2 Performance baselines:  “Random” baseline = 32.84%  “Heuristic” baseline = 25.69%  Used a 10-fold cross-validation process  Build confidence intervals for the error rates  Do statistical analysis of the differences in performance exhibited by the classifiers

Detecting Misunderstandings in the CMU Communicator…Page 31 Results – Individual Features RankFeatureLevelMean Err.Graphic 1.UncoveredPercParsing19.93% 2.ExpectedConceptsDialog Manag.20.97% 3.GapNumberParsing23.01% 4.ConceptBigramParsing23.14% 5.GarbleParsing/Recog.25.32% 6.ConceptNumberParsing25.69% 7.UnconfidentPercRecognition27.34% 8.DialogStateDialog Manag.31.03% 9.WordNumberRecognition32.33% 10.FragmentationScoreParsing32.73% 11.StateDurationDialog Manag.32.84% 12.TurnNumberDialog Manag.33.14%

Detecting Misunderstandings in the CMU Communicator…Page 32 Results – Classifiers ClassifierMean ErrorGraphic Random Baseline32.84% “Heuristic” Baseline25.69% AdaBoost16.59% Decision Tree17.32% Bayesian Network17.82% SVM18.40% Neural Network18.90% Naïve Bayes21.65%

Detecting Misunderstandings in the CMU Communicator…Page 33 An in Depth Look at Error Rates OKBAD Classifier says OKTPFP Classifier says BADFNTN FP = False acceptance FN = False rejection Error Rate = FP + FN CDR = TN/(TN+FP) = 1-(FP/NBAD)

Detecting Misunderstandings in the CMU Communicator…Page 34 Results – Classifiers (cont’d) ClassifierMean ErrorF/P RateF/N Rate Random Baseline32.84% 0.00% “Heuristic” Baseline25.32%25.30%0.02% AdaBoost16.59%11.43%5.16% Decision Tree17.32%11.82%5.49% Bayesian Network17.82%9.41%8.42% SVM18.40%15.01%3.39% Neural Network18.90%15.08%3.82% Naïve Bayes21.65%14.24%7.41% 77.4 % Correct detection rate

Detecting Misunderstandings in the CMU Communicator…Page 35 Conclusion  Spoken Dialog System performance is strongly impaired by misunderstandings  Increase the robustness of systems when faced with poor recognition:  Detect Misunderstandings  Use Recovery Techniques

Detecting Misunderstandings in the CMU Communicator…Page 36 Conclusion (cont’d)  Data-driven classification task  Corpus  12 Features from 3 levels in the system  Empirically compared 6 classification techniques  Data-Driven Misunderstanding Detector  Significant improvement over previous heuristic classifier  Correctly detect 74% of the misunderstandings

Detecting Misunderstandings in the CMU Communicator…Page 37 Future Work  Detect Misunderstandings  Improve performance by adding new features  Identify the source of the error  Use Recovery Techniques  Incorporate the confidence score into the Dialog Management process

Detecting Misunderstandings in the CMU Communicator…Page 38 Pointers  “Is This Conversation On Track?”, P.Carpenter, C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky, Eurospeech 2001, Aalborg, Denmark  CMU Communicator  