TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.

Slides:

Advertisements

Similar presentations

A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)

Advertisements

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.

The HIGGINS domain The primary domain of HIGGINS is city navigation for pedestrians. Secondarily, HIGGINS is intended to provide simple information about.

Results Clear distinction between two question intonations: perception and understanding level Three distinct prototypes for different interpretations.

HIGGINS Error handling strategies in a spoken dialogue system Rolf Carlson, Jens Edlund and Gabriel Skantze Error handling research issues The long term.

HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User:I want to go to.

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.

ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Detecting missrecognitions Predicting with prosody.

1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.

English Pronunciation Learning System for Japanese Students Based on Diagnosis of Critical Pronunciation Errors Yasushi Tsubota, Tatsuya Kawahara, Masatake.

Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.

Some Advances in Transformation-Based Part of Speech Tagging

Clarification in Spoken Dialogue Systems: Modeling User Behaviors Julia Hirschberg Columbia University 1.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

1 Computational Linguistics Ling 200 Spring 2006.

Te Kaitito A dialogue system for CALL Peter Vlugter, Alistair Knott, and Victoria Weatherall Department of Computer Science School of Māori, Pacific, and.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Exploiting Subjective Annotations Dennis Reidsma and Rieks op den Akker Human Media Interaction University of Twente

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Pattern Recognition with N-Tuple Systems Simon Lucas Computer Science Dept Essex University.

Introduction to Dialogue Systems. User Input System Output ?

Towards a Method For Evaluating Naturalness in Conversational Dialog Systems Victor Hung, Miguel Elvir, Avelino Gonzalez & Ronald DeMara Intelligent Systems.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

Modeling Student Benefits from Illustrations and Graphs Michael Lipschultz Diane Litman Intelligent Tutoring Systems Conference (2014)

Why predict emotions? Feature granularity levels [1] uses pitch features computed at the word-level Offers a better approximation of the pitch contour.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Programming Errors. Errors of different types Syntax errors – easiest to fix, found by compiler or interpreter Semantic errors – logic errors, found by.

User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog Gabriel Skantze, David House & Jens Edlund.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Grounding and Repair Joe Tepperman CS 599 – Dialogue Modeling Fall 2005.

Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Linguistic knowledge for Speech recognition

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Error Detection and Correction in SDS

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Information Structure and Prosody

Spoken Dialogue Systems

Artificial Intelligence 2004 Speech & Natural Language Processing

CS 325: Software Engineering

Presentation transcript:

TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology Department of Speech, Music and Hearing KTH, Sweden

Overview How do we handle errors in conversational human- computer dialogue? Which features are useful for error detection in ASR results? Two studies on selected features: –Machine learning –Human subjects’ judgement

Error detection Early error detection –Detect if a given recognition result contains errors –e.g. Litman, D. J., Hirschberg, J., & Swertz, M. (2000). Late error detection –Feed back the interpretation of the utterance to the user (grounding) –Based on the user’s reaction to that feedback, detect errors in the original utterance –e.g. Krahmer, E., Swerts, M., Theune, T. & Weegels, M. E. (2001). Error prediction –Detect that errors may occur later on in the dialogue –e.g. Walker, M. A., Langkilde-Geary, I., Wright Hastie, H., Wright, J., & Gorin, A. (2002).

Why early error detection? ASR errors reflect errors in acoustic and language models. Why not fix them there? –Post-processing may consider systematic errors in the models, due to mismatched training and usage conditions. –Post-processing may help to pinpoint the actual problems in the models. –Post-processing can include factors not considered by the ASR, such as: Prosody Semantics Dialogue history

Corpus collection Vocoder User Operator Listens Speaks Reads Speaks ASR I have the lawn on my right and a house with number two on my left i have the lawn on right is and a house with from two on left

Study I: Machine learning 4470 words 73.2% correct (baseline) 4/5 training data, 1/5 test data Two ML algorithms tested –Transformation-based learning (µ-TBL) Learn a cascade of rules that transforms the classification –Memory-based learning (TiMBL) Simply store each training instance in memory Compare the test instance to the stored instances and find the closest match

Features GroupFeatureExplanation Confidence Speech recognition word confidence score LexicalWordThe word POSThe part-of-speech for the word LengthThe number of syllables in the word ContentIs it a content word? ContextualPrevPOSThe part-of-speech for the previous word NextPOSThe part-of-speech for the next word PrevWordThe previous word DiscoursePrevDialogueActThe dialogue act of the previous operator utterance MentionedIs it a content word that has been mentioned previously by the operator in the discourse?

Results Feature setµ-TBLTiMBL Confidence77.3%76.0% Lexical77.5%78.0% Lexical + Contextual81.4%82.8% Lexical + Confidence81.3%81.0% Lexical + Confidence + Contextual83.9%83.2% Lexical + Confidence + Contextual + Discourse85.1%84.1% Content-words: –Baseline: 69.8%, µ-TBL: 87.7%, TiMBL: 87.0%

Rules learned by µ-TBL TransformationRule TRUE > FALSEConfidence < 50 & Content = TRUE TRUE > FALSEConfidence < 60 & POS = Verb & Length = 2 TRUE > FALSEConfidence < 40 & POS = Adverb & Length = 1 TRUE > FALSEConfidence < 50 & POS = Adverb & Length = 2 TRUE > FALSEConfidence < 40 & POS = Verb & Length = 1 FALSE > TRUEConfidence > 40 & Mentioned = TRUE & POS = Noun & Length = 2

Study II: Human error detection First 15 user utterances from 4 dialogues with high WER 50% of the words correct (baseline) 8 judges Features were varied for each utterance: –ASR information –Context information

Features NoContextNo context. ASR output only. PreviousContextPrevious utterance visible. FullContextThe dialogue history is given incrementally. MapContextAs FullContext, with the addition of the map. NoConfidenceRecognised string only. ConfidenceRecognised string, colour coded for word confidence. NBestListAs Confidence, but the 5-best ASR result was given.

The judges’ interface Utterance confidence Grey scale reflect word confidence 5-best list Dialogue so far Correction field

Results

Conclusions & Discussion ML can be used for early error detection on word level, especially for content words. Word confidence scores have some use. Utterance context and lexical information improve the ML performance. A rule-learning algorithm such as transformation-based learning can be used to pinpoint the specific problems. N-best lists are useful for human subjects. How do we operationalise them for ML?

Conclusions & Discussion The ML improved only slightly from the discourse context. –Further work in operationalising context for ML should focus on the previous utterance The classifier should be tested together with a parser or keyword spotter to see if it can improve performance. Other features should be investigated, such as prosody. These may improve performance further.

TT Centre for Speech Technology The End Thank you for your attention! Questions?