Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

Slides:



Advertisements
Similar presentations
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Advertisements

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Results: Word prominence detection models Each feature set increases accuracy over the 69% baseline accuracy. Word Prominence Detection using Robust yet.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Context and Learning in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago May 18, 2007.
TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.
Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
Characterizing and Recognizing Spoken Corrections in Human-Computer Dialog Gina-Anne Levow August 25, 1998.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Understanding Spoken Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago MAICS April 1, 2006.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition Thurid Vogt, Elisabeth André ICME 2005 Multimedia concepts.
SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Mitchell Peabody, Chao Wang, and Stephanie Seneff June 19, 2004 Lexical.
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Detecting missrecognitions Predicting with prosody.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Robust Recognition of Emotion from Speech Mohammed E. Hoque Mohammed Yeasin Max M. Louwerse {mhoque, myeasin, Institute for Intelligent.
Math – Getting Information from the Graph of a Function 1.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
Exploiting video information for Meeting Structuring ….
1 AUTOMATIC CLASSIFICATION AND RECOGNITION OF SHOEPRINTS DAI.WEIYUN  BACKGROUND  match collected impression against known shoeprint database  shoeprint.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Using Word-level Features to Better Predict Student Emotions during Spoken Tutoring Dialogues Mihai Rotaru Diane J. Litman Graduate Research Competition.
National Taiwan University, Taiwan
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
Adapting Dialogue Models Discourse & Dialogue CMSC November 19, 2006.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,
circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Investigating Pitch Accent Recognition in Non-native Speech
Mr. Darko Pekar, Speech Morphing Inc.
Spoken Dialogue Systems
Spoken Dialogue Systems
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Automatic Prosodic Event Detection
Presentation transcript:

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004

Roadmap Problem Data collection & analysis Identifying local corrections Conclusions & future work

The Problem U: October eleventh S: Okay, leaving October fifth… U: October eleventh Goal: Pinpoint WHAT is being corrected –Builds on recognition of corrections (Kirchoff, 2001; SHL, 2000; Levow 1998)

Why Identify Local Corrections? Miscommunication is inevitable –SER still high for conversational speech Error resolution is crucial –Easy recovery more important than WER (Walker et al, 2001; Shriberg et al, 1992) –Facilitates recovery Adaptive dialogue strategy

Challenge & Response Few lexical/syntactic cues –Cue phrases rare, e.g. “No I meant…” –May be identical to legal original input Near repetitions common –E.g. departure and return dates Approach: Exploit prosodic cues –Wizard-of-Oz study found significant contrasts Increases in duration, pitch, intensity (Oviatt et al 1998)

Data Collection Corpus: 2000, 2001 Communicator Eval’n –Telephone-only interface to travel information Air, hotel, car –>160 hours of interactions,~43K utts Local corrections –Single focus of correction –Error identifiable from system response

Local Correction Set Lexically matched –U: October eleventh –S: Okay, leaving October fifth… –U: October eleventh Lexically unmatched –U: October eleventh –S: Okay, leaving October fifth –U: The eleventh of October 57 utterances: 200 total words, 57 corrective –Automatically identified from logs, manually checked

Prosodic Features & Analysis Pitch, Intensity –Maximum, Minimum, Mean, Range –From Praat (Boersma 2001), smoothed Utterance normalized, per-word Duration –Normalized ( ATIS-based phoneme durs, Chung & Seneff 1997 ) Significant increases in duration –Local correction words ONLY No other measures reach significance (cf. Oviatt)

Local Correction

Local Correction II

Local Correction Classification Classifier: Boostexter (Schapire & Singer, 2000) Feature selection, avoid overfitting –5-way cross-validation Report average over runs Features: –Duration –Pitch, Intensity (Max, Min, Mean, Range) Normalized values Within utterance ranks

Localizing Corrections Baseline: Most common class: 71.5% Overall: 85.5% Lexically matched: 81.25% (Baseline: 59%) Unmatched: 87% (Baseline:80%) Rank-based features crucial –Using normalized values degrades performance Key features: –Pitch range: Approaches best –Maximum pitch, Maximum intensity Duration less useful

Conclusion & Future Work Prosodic cues identify focus of correction –Pitch range; Pitch, Intensity Maximum –Rank-based features key –Correspond to utterance level prominence Increased pitch max, range, intensity,duration Extend beyond single correction point –Phrasal units, Multi-point Integrate recognition with dialogue management