Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Slides:

Advertisements

Similar presentations

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Advertisements

A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.

Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.

Interface Design for ICT4B Speech, Dialects, and Interfaces Prof. Dan Klein and Prof. Marti Hearst.

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System Presented by: Dan Bohus Joint work with:Paul Carpenter, Chun Jin, Daniel Wilson,

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

Detecting missrecognitions Predicting with prosody.

1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Theories of Discourse and Dialogue. Discourse Any set of connected sentences This set of sentences gives context to the discourse Some language phenomena.

Speech User Interfaces Katherine Everitt CSE 490 JL Section Wednesday, Oct 27.

Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.

Modal Interfaces & Speech User Interfaces Katherine Everitt CSE 490F Section Nov 20 & 21, 2006.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.

Turn-taking Discourse and Dialogue CS 359 November 6, 2001.

1 Natural Language Processing Lecture Notes 14 Chapter 19.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.

1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Lexical, Prosodic, and Syntactics Cues for Dialog Acts.

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.

Improving (Meta)cognitive Tutoring by Detecting and Responding to Uncertainty Diane Litman & Kate Forbes-Riley University of Pittsburgh Pittsburgh, PA.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Investigating Pitch Accent Recognition in Non-native Speech

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Towards Emotion Prediction in Spoken Tutoring Dialogues

Noticing and Text-Based Chat

For Evaluating Dialog Error Conditions Based on Acoustic Information

Error Detection and Correction in SDS

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Turn-taking and Disfluencies

Spoken Dialogue Systems

Spoken Dialogue Systems

Presentation transcript:

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002

Errors in Spoken-Language Interfaces Speech Recognition is problematic:  Input signal quality  Accents, Non-native speakers  Spoken language disfluencies: stutters, false- starts, /mm/, /um/ Typical Word Error Rates in SDS: 10-30% Systems today lack the ability to gracefully recover from error

An example S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

Pathway to a solution Make systems aware of unreliability in their inputs  Confidence scores Develop a model which learns to optimally choose between several prevention/repair strategies  Identify strategies  Express them in a computable manner  Develop the model

Papers Error Detection in Spoken Human- Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals [Let’s look at dialog on page 2] (1) Analysis of positive an negative cues we use in response to implicit and explicit verification questions (2) Explore the possibilities of spotting errors on line

Explicit vs. Implicit Explicit  Presumably easier for the system to verify But there’s evidence that it’s not as easy …  Leads to more turns, less efficiency, frustration Implicit  Efficiency  But induces a higher cognitive burden which can result in more confusion  ~ Systems don’t deal very well with it…

Clarke & Schaeffer Grounding model  Presentation phase  Acceptance phase Various indicators  Go ON / YES  Go BACK / NO Can we detect them reliably (when following implicit and explicit verification questions) ?

Positive and Negative Cues PositiveNegative Short turnsLong turns Unmarked word orderMarked word order ConfirmDiscomfirm AnswerNo answer No correctionsCorrections No repetitionsRepetitions New infoNo new info

Experimental Setup / Data 120 dialogs : Dutch SDS providing train timetable information 487 utterances  44 (~10%) not used Users accepting a wrong result Barge-in Users starting their own contribution  Left 443 resulting adjacent S/U utterances

Results – Nr of words ~ProblemsProblems Explicit Implicit

Results – Empty turns (%) ~ProblemsProblems Explicit0%2.6% Implicit3.4%10.3%

Results – Marked word order % ~ProblemsProblems Explicit3.3%4.4% Implicit1.2%26.9%

Results – Yes/No ~ProblemsProblems ExplicitYes92.8%6.1% No0%56.6% Other7.1%37.1% ImplicitYes0% No0%15.4% Other100% ?84.6%

Results – Repeated/Corrected/New ~ProblemsProblems ExplicitRepeated8.5%23.9% Corrected0%72.6% New11.4%12.4% ImplicitRepeated2.4%61.0% Corrected0%92.3% New53.6%36.5%

First conclusion People use more negative cues when there are problems And even more so for implicit confirmations (vs. explicit ones)

How well can you classify Using individual features  Look at precision/recall Explicit: absence of confirmation Implicit: non-zero number of corrections Multiple features  Used memory based learning 97% accuracy (maj. Baseline 68%) Confirm + Correct is winning, although individually less good This is overall, right ? How about for explicit vs. implicit ?

BUT !!! How many of these features are available on-line? PositiveNegative Short turnsLong turns Unmarked word orderMarked word order ConfirmDisconfirm AnswerNo answer No corrections ?Corrections ? No repetitions ?Repetitions ? New info ?No new info ?

What else can we throw at it ? Prosody (next paper) Lexical information Acoustic confidence scores  Maybe also of previous utterances Repetitions/Corrections/New info on transcript ? …

Papers Error Detection in Spoken Human- Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] Problem Spotting in Human-Machine Interaction [E.Krahmer, M. Swerts, M. Theune, M. Weegels] The Dual of Denial: Discomfirmations in Dialogue and Their Prosodic Correlates [E.Krahmer, M. Swerts, M. Theune, M. Weegels]

Goals Investigate the prosodic correlates of disconfirmations  Is this slightly different than before ? (i.e. now looking at any corrections? Answer: No)  Looked at prosody on “NO” as a go_on vs a go_back:  Do you want to fly from Pittsburgh ?  Shall I summarize your trip ?

Human-human Higher pitch range, longer duration Preceded by a longer delay High H% boundary tone Expected to see same behavior for disconfirmation in human-machine

Prosodic correlates FeaturesPOSITIVE(‘go on’)NEGATIVE(‘go back’) Boundary toneLowHigh DurationShortLong DelayShortLong PauseShortLong Pitch rangeLowHigh  Yes, the correlations are there as expected

Perceptual analysis Took 40 “No” from No+stuff, 20 go_on and 20 go_back (note that some features are lost this way…) Forced choice randomized task, w/ no feedback; 25 native speakers of Dutch Results  17 go_on correctly identified above chance  15 go_back correctly identified above chance; but also 1 incorrectly identified above chance.

Discussion Q1: Blurred relationships …  Confidence annotation  Go_on / Go_back signal Is that the same as corrections ? Is that the most general case for responses to implicit/explicit verifications, or should we have a separate detector ? Q2: What other features could we throw at these problems ? What are the “most juicy” ones ?

Discussion Q3: For implicit confirms, are these different in terms of induced response behavior ?  When do you want to leave Pittsburgh ?  Travelling from Pittsburgh … when do you want to leave ?  When do you want to leave from Pittsburgh to Boston ?