What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information.
Rhee Dong Gun. Chapter The speaking process The differences between spoken and written language Speaking skills Speaking in the classroom Feedback.
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
SpeechTEK August 22, 2007 Better Recognition by manipulation of ASR results Generic concepts for post computation recognizer result components. Emmett.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.
The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.
The Comparison of the Software Cost Estimating Methods
Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.
HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.
HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User:I want to go to.
Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.
What Is A Scientific Question?. How would you define creative thinking? How would you define creative thinking? Creative Arts Creative Arts Thinking “outside.
U1, Speech in the interface:2. Dialogue Management1 Module u1: Speech in the Interface 2: Dialogue Management Jacques Terken HG room 2:40 tel. (247) 5254.
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.
Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,
1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department
Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.
A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
Basic Counselling Skills
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Classroom Discussions in Math: A Teachers Guide for using talk moves to support the Common Core and more, Third Edition Adapted Original Presentation by.
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
C. N.Haigh, CEPD CHALLENGES IN SMALL GROUP TUTORIALS Non-Participating Unprepared Dominating Students Guiding – Not Giving.
SEG3120 User Interfaces Design and Implementation
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Issues in Multiparty Dialogues Ronak Patel. Current Trend  Only two-party case (a person and a Dialog system  Multi party (more than two persons Ex.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
ENTERFACE 08 Project 1 “MultiParty Communication with a Tour Guide ECA” Mid-term presentation August 19th, 2008.
Dialog Management for Rapid-Prototyping of Speech-Based Training Agents Victor Hung, Avelino Gonzalez, Ronald DeMara University of Central Florida.
Author: Younghee Sheen Reporter: NA1C0003洪志隆
1 Natural Language Processing Lecture Notes 14 Chapter 19.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
NESPOLE! is a project which aims at providing a system capable of supporting communication in the field of e-commerce and e-service by resorting to automatic.
DIALOG SYSTEMS FOR AUTOMOTIVE ENVIRONMENTS Presenter: Joseph Picone Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng. Mississippi.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Corpus-based evaluation of Referring Expression Generation Albert Gatt Ielka van der Sluis Kees van Deemter Department of Computing Science University.
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
ENTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008.
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Introduction to Evaluation “Informal” approaches.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Usability Evaluation or, “I can’t figure this out...do I still get the donuts?”
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.
Spoken Dialogue Systems
Integrating Learning of Dialog Strategies and Semantic Parsing
Turn-taking and Disfluencies
Spoken Dialogue Systems
Presentation transcript:

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003

Question We’re trying to build systems that can deal with a noisy recognition channel Q: How good are humans are that? More importantly, how do they do it? What strategies do they use? How do they decide which one to use when? What kind of knowledge used in the process?

WOZ experiments Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result (text in these cases) Exploring Human Error Handling Strategies [Gabriel Skantze] A Study of Human Dialogue Strategies in the Presence of Speech Recognition Errors [Teresa Zollo]

WOZ experiments Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result (text in these cases) Exploring Human Error Handling Strategies [Gabriel Skantze] A Study of Human Dialogue Strategies in the Presence of Speech Recognition Errors [Teresa Zollo]

Domain/Task, Experiments Problem-solving task Wizard is guiding user through a campus Wizard has detailed map User has small fraction of map with their current surroundings Experiments 8 users, 8 operators, balanced male/female 5 scenarios per user → 40 dialogs

WOZ / Experimental Setting Wizard receives recognition results on a GUI Not parsed (user plays parser also) Confidence denoted by color intensity Users know they are talking to a human Normal wizard more costly Hard to maintain subjects for longitudinal studies Conflicting information on change in linguistic patterns when speaking to a machine vs. to a human Operators are naïve, they are also subjects of the study

43% WER, 7.3% OOV Manual labeling of operator understanding Full understanding Partial understanding Non-understanding Misunderstanding Very few misunderstandings Operators good at rejecting Users thought they were almost always understood Results

Results (continued) 3 main operator strategies (approx equally distributed) for dealing with non- and partial understandings: Continuation of route description Signal of non-understanding Task-related question PARADISE-like regression indicates strategy 2 is inversely correlated with “how well do you think you did?”

WOZ experiments Modify the WOZ setting so that the wizard does not hear the user, but rather receives the recognition result Exploring Human Error Handling Strategies [Gabriel Skantze] A Study of Human Dialogue Strategies in the Presence of Speech Recognition Errors [Teresa Zollo]

Domain / Experiments TRIPS-Pacifica: planning the evacuation of the fictitious island Pacifica Construct a plan to transport all the civilians on Pacifica to Barnacle by 5 am so that they can be evacuated from the island (the play will be deployed at midnight) + the road between Calypso and Ocean Beach is impassable Only 7 dialogs (September ’99)

WOZ / Experimental Setting Wizard assisted by GUI for quick information access and generating synthesized responses Sphinx-2 (CMU), TrueTalk (Entropics) Wizard receives string of words (paper does not mention confidence scores) User debriefing questionnaire Wizard annotates interaction transcript with knowledge sources used in decisions, etc…

Results Small corpus 7 dialogs 348 utterances Manually labeled misunderstandings Overall WER: 30% Looked at positive and negative feedback

Negative feedback Request for full repetition: 33/80 24/33 cases users complied and repeated/rephrased WH-replacement of missing or erroneous word: 12/80 8/12 cases users responded with the precise info Attempt to salvage correct word: 20/80 Possibly increase user satisfaction? Similar responses to ask for repeat Request for verification: 15/80 10/15 responded by explicit affirmations

What if we wanted to do these? Request for full repetition: 33/80 24/33 cases users complied and repeated/rephrased WH-replacement of missing or erroneous word: 12/80 8/12 cases users responded with the precise info Attempt to salvage correct word: 20/80 Possibly increase user satisfaction? Similar responses to ask for repeat Request for verification: 15/80 10/15 responded by explicit affirmations

More negative feedback results Wizards gave negative feedback in 80 cases (35%) of the total 227 recognized incorrectly Compensation for ASR: Ignoring words that are not salient in the TRIPS domain Hypothesizing correct words based on phonetic similarity Q: So, what does that say? Better parsing?

Positive feedback Using an acknowledgement term (okay, right) Simple response to question (next relevant contribution) Conversational/social response i.e. greetings/thanks Providing a next unsolicited relevant contribution Clarifying or correcting Paraphrasing

Conclusions Observations consistent with theoretical grounding models (Clark et al) Negative feedback only when really needed Unless ASR is perfect (and sometimes even then), wizards give explicit indications of their understanding

Discussion… WOZ setting… Wizard = Parser + Dialog Manager Seems that humans can extract more info from text than current parsers we need better, more robust parsers? How about Wizard = Dialog Manager? Domain choice Skantze results make sense in chosen domain How can such results hold across domains?