1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Slides:

Advertisements

Similar presentations

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Advertisements

Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.

Audiovisual Emotional Speech of Game Playing Children: Effects of Age and Culture By Shahid, Krahmer, & Swerts Presented by Alex Park

The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.

Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

Detecting missrecognitions Predicting with prosody.

1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

User Study Evaluation Human-Computer Interaction.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

A Successful Dialogue without Adaptation S: Hi, this is AT&T Amtrak schedule system. This is Toot. How may I help you? U: I want a train from Baltimore.

Turn-taking Discourse and Dialogue CS 359 November 6, 2001.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Designing and Evaluating Two Adaptive Spoken Dialogue Systems Diane J. Litman* University of Pittsburgh Dept. of Computer Science & LRDC

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Towards Emotion Prediction in Spoken Tutoring Dialogues

Conditional Random Fields for ASR

Error Detection and Correction in SDS

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Automatic Fluency Assessment

Prosody in Recognition/Understanding

Automatic Speech Recognition

Detecting Prosody Improvement in Oral Rereading

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Turn-taking and Disfluencies

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Presentation transcript:

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems

2 Spoken Dialogue Systems Outline Avoiding errors Detecting errors From the user side: what cues does the user provide to indicate an error? From the system side: how likely is it the system made an error? Dealing with Errors: what can the system do when it thinks an error has occurred? Evaluating SDS: evaluating ‘problem’ dialogues

3 Spoken Dialogue Systems Avoiding misunderstandings The problem By imitating human performance Timing and grounding (Clark ’03)Clark ’03 Confirmation strategies Clarification and repair subdialogues

4 Spoken Dialogue Systems Outline Avoiding errors Detecting errors From the user side: what cues does the user provide to indicate an error? From the system side: how likely is it the system made an error? Dealing with Errors: what can the system do when it thinks an error has occurred? Evaluating SDS: evaluating ‘problem’ dialogues

5 Spoken Dialogue Systems Learning from Human Behavior: Features in repetition corrections(KTH) more clearly articulated increased loudness shifting of focus Percentage of all repetitions adults children

6 Spoken Dialogue Systems Learning from Human Behavior (Krahmer et al ’01) Learning from human behavior ‘go on’ and ‘go back’ signals in grounding situations (implicit/explicit verification) Positive: short turns, unmarked word order, confirmation, answers, no corrections or repetitions, new info Negative: long turns, marked word order, disconfirmation, no answer, corrections, repetitions, no new info

7 Spoken Dialogue Systems Hypotheses supported but… –Can these cues be identified automatically? –How might they affect the design of SDS?

8 Spoken Dialogue Systems Outline Avoiding errors Detecting errors From the user side: what cues does the user provide to indicate an error? From the system side: how likely is it the system made an error? Dealing with Errors: what can the system do when it thinks an error has occurred? Evaluating SDS: evaluating ‘problem’ dialogues

9 Spoken Dialogue Systems Systems Have Trouble Knowing When They’ve Made a Mistake Hard for humans to correct system misconceptions (Krahmer et al `99) User: I want to go to Boston. System: What day do you want to go to Baltimore? Easier: answering explicit requests for confirmation or responding to ASR rejections System: Did you say you want to go to Baltimore? System: I'm sorry. I didn't understand you. Could you please repeat your utterance?

10 Spoken Dialogue Systems But constant confirmation or over-cautious rejection lengthens dialogue and decreases user satisfaction

11 Spoken Dialogue Systems …And Systems Have Trouble Recognizing User Corrections Probability of recognition failures increases after a misrecognition (Levow ‘98) Corrections of system errors often hyperarticulated (louder, slower, more internal pauses, exaggerated pronunciation)  more ASR error (Wade et al ‘92, Oviatt et al ‘96, Swerts & Ostendorf ‘97, Levow ‘98, Bell & Gustafson ‘99)

12 Spoken Dialogue Systems Can Prosodic Information Help Systems Perform Better? If errors occur where speaker turns are prosodically ‘marked’…. Can we recognize turns that will be misrecognized by examining their prosody? Can we modify our dialogue and recognition strategies to handle corrections more appropriately?

13 Spoken Dialogue Systems Approach Collect corpus from interactive voice response system Identify speaker ‘turns’ –incorrectly recognized –where speakers first aware of error –that correct misrecognitions Identify prosodic features of turns in each category and compare to other turns Use Machine Learning techniques to train a classifier to make these distinctions automatically misrecognition correction aware site

14 Spoken Dialogue Systems Turn Types TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you? User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening. TOOT: Which city do you want to go to? User: New York. misrecognition correction aware site

15 Spoken Dialogue Systems TOOT Dialogues Collected to study effects of differences in dialogue strategy on user performance and satisfaction (Litman & Pan ‘99) type of initiative (system, user, mixed) type of confirmation (explicit, implicit, none) adaptability condition Subjects 39 summer students 16/23 (F/M) 20/19 (native speaker/non)

16 Spoken Dialogue Systems Platform: combined over-the-phone ASR and TTS (Kamm et al ‘97) with web access to train information Task: find train information for 4 scenarios Corpus for current study:  2328 speaker turns, 52 dialogues Misrecognitions  Overall word accuracy: 61%  Overall concept accuracy(CA): 71%  “I want to go to Boston from Philadelphia” (2 domain concepts)  recognized as “I want to go to Boston” (one concept) = 50%

17 Spoken Dialogue Systems A Successful Dialogue S: Hi, this is AT&T Amtrak schedule system. This is TOOT. How may I help you? U: I would like a train to New York City from Philadelphia on Sunday at ten thirty P M S: I heard you say “go to New York City from Philadelphia on Sunday approximately at ten thirty p.m." Do you want me to find the trains from Philadelphia to New York City on Sunday approximately at ten thirty p. m. now? U: Yes. S: I am going to get the train schedule for you....

18 Spoken Dialogue Systems Do Misrecognitions, Awares, and Corrections Differ from Other Turns? For each type of turn: For each speaker, for each prosodic feature, calculate mean values for e.g. all correctly recognized speaker turns and for all incorrectly recognized turns Perform paired t-tests on these speaker pairs of means (e.g., for each speaker, pairing mean values for correctly and incorrectly recognized turns)

19 Spoken Dialogue Systems How: Prosodic Features Examined per Turn Raw prosodic/acoustic features f0 maximum and mean (pitch excursion/range) rms maximum and mean (amplitude) total duration duration of preceding silence amount of silence within turn speaking rate (estimated from syllables of recognized string per second) Normalized versions of each feature (compared to first turn in task, to previous turn in task, Z scores)

20 Spoken Dialogue Systems Distinguishing Correct Recognitions from Misrecognitions (NAACL ‘00) Misrecognitions differ prosodically from correct recognitions in F0 maximum (higher) RMS maximum (louder) turn duration (longer) preceding pause (longer) slower Effect holds up across speakers and even when hyperarticulated turns are excluded

21 Spoken Dialogue Systems WER-Based Results Misrecognitions are higher in pitch, louder, longer, more preceding pause and less internal silence

22 Spoken Dialogue Systems Predicting Turn Types Automatically Ripper (Cohen ‘96) automatically induces rule sets for predicting turn types greedy search guided by measure of information gain input: vectors of feature values output: ordered rules for predicting dependent variable and (X-validated) scores for each rule set Independent variables: all prosodic features, raw and normalized experimental conditions (adaptability of system, initiative type, confirmation style, subject, task) gender, native/non-native status ASR recognized string, grammar, and acoustic confidence score

23 Spoken Dialogue Systems ML Results: WER-defined Misrecognition

24 Spoken Dialogue Systems Best Rule-Set for Predicting WER if (conf = 1.27) ^ then F if (conf <= -4.34) then F if (tempo <=.81) then F If (conf <= then F If (conf <= ^ str contains “help” then F If conf =.77 ^ tempo <=.25 then F If str contains “nope” then F If dur >= 1.71 ^ tempo <= 1.76 then F else T Using prosody, ASR conf, ASR string, ASR grammar

25 Spoken Dialogue Systems Outline Avoiding errors Detecting errors From the user side: what cues does the user provide to indicate an error? From the system side: how likely is it the system made an error? Dealing with Errors: what can the system do when it thinks an error has occurred? Evaluating SDS: evaluating ‘problem’ dialogues

26 Spoken Dialogue Systems Error Handling Strategies If systems can recognize their lack of recognition, how should they inform the user that they don’t understand (Goldberg et al ’03)? System rephrasing vs. repetitions vs. statement of not understanding Apologies What behaviors might these produce? Hyperarticulation User frustration User repetition vs. rephrasing

27 Spoken Dialogue Systems What lessons do we learn? When users are frustrated they are generally harder to recognize accurately When users are increasingly misrecognized they tend to be misrecognized more often and become increasingly frustrated Apologies combined with rephrasing of system prompts tend to decrease frustration and improve WER: Don’t just repeat! Users are better recognized when they rephrase their input

28 Spoken Dialogue Systems How does an SDS Recognize a Correction? (ICSLP ’00) TOOT: Hi. This is AT&T Amtrak Schedule System. This is TOOT. How may I help you? User: Hello. I would like trains from Philadelphia to New York leaving on Sunday at ten thirty in the evening. TOOT: Which city do you want to go to? User: New York. correction

29 Spoken Dialogue Systems Serious Problem for Spoken Dialogue Systems 29% of turns in our corpus are corrections 52% of corrections are hyperarticulated but only 12% of other turns Corrections are misrecognized at least twice as often as non-corrections (60% vs. 31%) But corrections are no more likely to be rejected than non-corrections…. (9% vs. 8%) Are corrections also measurably distinct from non- corrections?

30 Spoken Dialogue Systems Prosodic Indicators of Corrections Corrections differ from other turns prosodically: longer, louder, higher in pitch excursion, longer preceding pause, less internal silence ML results: Baseline: 30% error norm’d prosody +non-prosody: 18.45% +/- 0.78% automatic: 21.48% +/- 0.68%

31 Spoken Dialogue Systems Prosodic Indicators of Corrections Corrections differ from other turns prosodically: longer, louder, higher in pitch excursion, longer preceding pause, less internal silence

32 Spoken Dialogue Systems ML Rules for Correction Prediction Baseline: 30% error (predict not correction) norm’d prosody +non-prosody: 18.45% +/- 0.78% automatic: 21.48% +/- 0.68% TRUE :- gram=universal, f0max>=0.96, dur>=6.55 TRUE :- gram=universal, zeros>=0.57, asr<=-2.95 TRUE :- gram=universal, f0max =1.21, zeros>=0.71 TRUE :- dur>=0.76, asr<=-2.97, strat=UsrNoConf TRUE :- dur>=2.28, ppau<=0.86 TRUE :- rmsav>=1.11, strat=MixedImplicit, gram=cityname, f0max>=0.70 default FALSE

33 Spoken Dialogue Systems Corrections in Context Similar in prosodic features but… What about their form and content? How do system behaviors affect the corrections users produce? What sort of corrections are most, least effective? When users correct the same mistake more than once, do they vary their strategy in productive ways?

34 Spoken Dialogue Systems User Correction Behavior Correction classes: ‘omits’ and ‘repetitions’ lead to fewer misrecognitions than ‘adds’ and ‘paraphrases’ Turns that correct rejections are more likely to be repetitions, while turns correcting misrecognitions are more likely to be omits

35 Spoken Dialogue Systems Type of correction sensitive to strategy much more likely to exactly repeat their misrecognized utterance in a system-initiative environment much more likely to correct by omitting information if no system confirmation than with explicit confirmation omits used more in MixedImplicit and UserNoConfirm conditions “Restarts” unlikely to be recognized (77% misrecognized) and skewed in distribution: 31% of corrections are “restarts” in MI and UNC

36 Spoken Dialogue Systems None for SE, where initial turns well recognized It doesn’t pay to start over!

37 Spoken Dialogue Systems Outline Avoiding errors Detecting errors From the user side: what cues does the user provide to indicate an error? From the system side: how likely is it the system made an error? Dealing with Errors: what can the system do when it thinks an error has occurred? Evaluating SDS: evaluating ‘problem’ dialogues

38 Spoken Dialogue Systems Recognizing `Problematic’ Dialogues Hastie et alHastie et al, “What’s the Trouble?” ACL 2002 How to define a dialogue as problematic? User satisfaction is low Task is not completed How to recognize? Train on a corpus of recorded dialogues (1242 DARPA Communicator dialogues) Predict –User Satisfaction –Task Completion (0,1,2)

39 Spoken Dialogue Systems

40 Spoken Dialogue Systems Results