Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Slides:



Advertisements
Similar presentations
Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
Advertisements

Date Transcripts Learning Objectives: 1.To be able to recall and apply the features of transcripts. 2.To be able to examine how character is created in.
Language and Cognition Colombo, June 2011 Day 8 Aphasia: disorders of comprehension.
Using prosody to avoid ambiguity: Effects of speaker awareness and referential context Snedeker and Trueswell (2003) Psych 526 Eun-Kyung Lee.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
1 Spoken Dialogue Systems Dialogue and Conversational Agents (Part IV) Chapter 19: Draft of May 18, 2005 Speech and Language Processing: An Introduction.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.
HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User:I want to go to.
Dialogue Management Ling575 Discourse and Dialogue May 18, 2011.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Detecting missrecognitions Predicting with prosody.
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
ITCS 6010 Speech Guidelines 1. Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
6/28/20151 Spoken Dialogue Systems: Human and Machine Julia Hirschberg CS 4706.
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Preparing for the Verbal Reasoning Measure. Overview Introduction to the Verbal Reasoning Measure Question Types and Strategies for Answering General.
1 MTN-003 Training General Interviewing Techniques Some specific tips for administering the Screening interviewer-administered CRFs SSP Section 14.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
Speech Guidelines 2 of Errors VUIs are error-prone due to speech recognition. Humans aren’t perfect speech recognizers, therefore, machines aren’t.
INFO 637Lecture #81 Software Engineering Process II Integration and System Testing INFO 637 Glenn Booker.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Discourse Markers Discourse & Dialogue CS November 25, 2006.
SLB /04/07 Thinking and Communicating “The Spiritual Life is Thinking!” (R.B. Thieme, Jr.)
4/12/2007dhartman, CS A Survey of Socially Interactive Robots Terrance Fong, Illah Nourbakhsh, Kerstin Dautenhahn Presentation by Dan Hartmann.
Event Management & ITIL V3
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
N o, you don’t understand, I mean… Irini Nomikou supervisor: Dr. Floriana Grasso The one with the conductor and the girl on the train Cond: Did you pay.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013.
Building & Evaluating Spoken Dialogue Systems Discourse & Dialogue CS 359 November 27, 2001.
Discourse & Dialogue CS 359 November 13, 2001
Yonglei Tao School of Computing & Info Systems GVSU Ch 7 Design Guidelines.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Turn-taking and Backchannels Ryan Lish. Turn-taking We all learned it in preschool, right? Also an essential part of conversation Basic phenomenon of.
Developing Communication Skills Developing Listening Techniques.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
Agent-Based Dialogue Management Discourse & Dialogue CMSC November 10, 2006.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Error Detection and Correction in SDS
Issues in Spoken Dialogue Systems
Spoken Dialogue Systems
Turn-taking and Disfluencies
Spoken Dialogue Systems
Spoken Dialogue Systems
Presentation transcript:

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi

Grounding in conversation Communication: ”making something common” Common ground: The mutual understanding of the participants in a joint action Grounding: establish something as part of common ground well enough for current purposes The grounding acts will depend on –Confidence of understanding/prior groundedness –The grounding criterion (current purposes) Cost of task failure –Cost of grounding

Miscommunication Principle of least effort –All things being equal, agents try to minimize their effort in doing what they intend to do. All communication relies on the trade-off between efficiency and robustness –The cost of producing a perfectly interpretable utterance may be more than producing a flawed utterance, which can be easily repaired. –People normally rely on the error detection and recovery capabilities of the other speaker. It would not be efficient to never be misunderstood.

Miscommunication errors in SDS Speech Detection –Barge-in problems, truncated utterances, artifacts ASR –Deletions, Substitutions, Insertions –Out of vocabulary utterances Parsing/NLU –Concept failure Dialog management –Reference resolution –Plan recognition Response generation –Ambiguous references –Too much information at once

Errors in human-computer dialogue Derriks & Willems (1998) compares –Human-Human dialogue Miscommunication occurs due to overlapping speech and missing elements (ellipsis), perception of names and numbers. –Human-Computer dialogue (WOZ) Less spontaneous; Less overlapping speech and ellipsis; Fewer problems Still problems with recognition of numbers New problem sources –Artificially imposed constraints –Complete and standardized responses to particular and partial requests

Types of miscommunication Non-understanding –A participant fails to obtain any interpretation at all, or is not able to choose among several possible interpretations. Misunderstanding –A participant obtains an interpretation which she believes is complete and correct, but which is, however, not in line with the speaker’s intentions. Misinterpretation (misconception) –A participant’s interpretation of an utterance suggests that the speakers’ beliefs about the world are out of alignment.

Error handling in spoken dialogue systems Prevention Prediction (Prevention) ERROR Detection Recovery (Prevention)

Grounding in human-computer dialogue The computer must show a display of understanding in order to detect errors. Explicit verification U: I want to go to Milano S: Do you want to go to Merano? Implicit verification U: I want to travel from Milano S: At what time do you want to leave from Merano?

First-turn repair U: I want to travel to Stockh.. [Detection: disfluency] Stocksund

Second-turn repair U: I want to travel to Stocksund. S: [Detection: non-understanding] Sorry, I didn’t understand. Please repeat. U: I want to travel to Stocksund. S: [Detection: self-misunderstanding] Did you say Stockholm?

Third-turn repair U: I want to travel from Stockholm S: When do you want to leave from Stocksund? U: [Detection: other-misunderstanding] I said I want to go from Stockholm!

Fourth-turn repair U: I am on Kungstensgatan S: Take to the left U: Ok, now I am on Upplandsgatan S: [Detection: self-misunderstanding] Weren’t you on Kungsträdgårdgatan before you turned?

Error detection: approaches Early detection –Decide on the basis of the current user utterance whether it will be recognized and interpreted correctly or not. (Error awareness) Late detection –Decide on the basis of the current user utterance whether the processing of a previous user utterance gave rise to communication problems. Error prediction –Decide on the basis of the current user utterance whether the dialogue will become problematic. (prediction)

Using the approaches together Error prediction –Choosing a dialogue strategy to prevent errors. Early detection –Determining confidence of understanding. Choosing an appropriate grounding act. How should the system display the understanding? Late detection –Interpreting the user’s response to the grounding act. Was the previous understanding correct?

Early and late detection in grounding U: I want to travel from Stockholm S: [Early detection] When do you want to leave from Stocksund? U: I said I want to go from Stockholm! S: [Late detection] Ok, when do you want to leave from Stockholm?

Error detection: methods Early detection (error awareness) –Feature-based detection Acoustic confidence score Prosody NLP, Dialogue & Discourse History Late detection –Detection of negative and positive cues –Dialogue expectations –Plan-based models Error prediction

ASR confidence and prosodic features Train schedules (Litman et al 2000) –Ripper classification (“if-then-else”) WER>0CA<1 ASR confidence77.77%86.48% Prosody (F0, RMS, Duration, Prior Paus, Tempo, % Silence) 87.24%81.82% ASR Confidence + Prosody89.01%88.66% ASR Confidence + Prosody + ASR String + ASR Grammar 93.47%89.57%

Features from all dialogue components Automated call center (Walker et al 2000) –ASR Num.words, asr-duration, tempo 78.89% –NLU task, confidence, context-shift, salience 84.80% –Discourse (DM & History) Prompt, reprompt, subdialogue, confirmation 71.97% –All components 86.16%

Error detection: methods Early detection (error awareness) –Feature-based detection Acoustic confidence score Prosody NLP, Dialogue & Discourse History Late detection –Detection of negative and positive cues –Dialogue expectations –Plan-based models Error prediction

Verification: Positive and negative cues Positive Cues (’Go on’)Negative cues (’Go back’) Short turnsLong turns Unmarked word orderMarked word order ConfirmDisconfirm AnswerNo answer No correctionsCorrections No repetitionsRepetitions New infoNo new info

Verification: Cue detection Detection of positive and negative cues (Krahmer et al, 2001) ExplicitImplicit Negative cueNo confirm 88% precision 94% recall Corrected slots 100% precision 92% recall Positive cueConfirm 97% precision 93% recall No corrected slots 98% precision 100% recall

Dialogue expectations Error detection by expectations –Unexpected utterances can be signs of misunderstanding. Plan-based models –Detection and repair of misunderstandings are embedded in the goal-directed behaviour of maintaining intersubjectivity. Model third and fourth turn repairs. (McRoy & Hirst 1995) But –Broken expectations are not always signs of misunderstanding. Topic and focus shifts can also lead to unexpected utterances.

Error detection: methods Early detection (error awareness) –Feature-based detection Acoustic confidence score Prosody NLP, Dialogue & Discourse History Late detection –Detection of negative and positive cues –Dialogue expectations –Plan-based models Error prediction

Approach: –Decide on the basis of the current user utterance(s) whether the dialogue will be problematic. Walker et al (2000): –Dialogues were classified as “problematic” (36%) or “task success” (64%; baseline) –Trained on features from ASR, NLU and DM –First turn: 72% –Second turn: 80% –Whole dialogue: 87%

Important issues Mobile environments –Laboratory assessments often overestimate recognition rates in natural field settings (20-50% drop in accuracy) –Noise, social interchange, multi-tasking, stress Multimodal error handling –Error prevention and error recovery –Choice of less error-prone modality, simpler utterances, alternation of modality, mutual disambiguation