1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.

Slides:



Advertisements
Similar presentations
Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.
Advertisements

15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Confidence Measures for Speech Recognition Reza Sadraei.
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish R. San-Segundo, J. Macías-Guarasa, J.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Advances in WP1 and WP2 Paris Meeting – 11 febr
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Why is ASR Hard? Natural speech is continuous
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
Yuya Akita , Tatsuya Kawahara
1 Broadcast News Segmentation using Metadata and Speech-To-Text Information to Improve Speech Recognition Sebastien Coquoz, Swiss Federal Institute of.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
1 A Two-pass Framework of Mispronunciation Detection & Diagnosis for Computer-aided Pronunciation Training Xiaojun Qian, Member, IEEE, Helen Meng, Fellow,
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Conditional Random Fields for ASR
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Jun Wu and Sanjeev Khudanpur Center for Language and Speech Processing
On the Integration of Speech Recognition into Personal Networks
A maximum likelihood estimation and training on the fly approach
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speaker Identification:
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University

2 Introduction Current ASR technologies not robust against: –Acoustic mismatch: noise, channel, speaker variance –Linguistic mismatch: disfluencies, OOV, OOD Assess confidence of recognition hypothesis, and detect recognition errors Effective user feedback Select recovery strategy based on type of error and specific application

3 Previous Works on Confidence Measures Feature-based –[Kemp] word-duration, AM/LM back-off Explicit model-based –[Rahim] likelihood ratio test against cohort model Posterior probability –[Komatani, Soong, Wessel] estimate posterior probability given all competing hypotheses in a word-graph Approaches limited to “low-level” information available during ASR decoding

4 Proposed Approach Exploit knowledge sources outside ASR framework for estimating recognition confidence e.g. knowledge about application domain, discourse flow Incorporate CM based on “high-level” knowledge sources In-domain confidence – degree of match between utterance and application domain Discourse coherence – consistency between consecutive utterances in dialogue

5 CM in-domain (X i ):in-domain confidence CM discourse (X i |X i-1 ):discourse coherence CM(X i ): joint confidence score, combine above with generalized posterior probability CM gpp (X i ) CM discourse (X i |X i-1 ) XiXi CM in-domain (X i ) Topic Classification In-domain Verification dist (X i,X i-1 ) Input utterance CM gpp (X i ) ASR front-end Out-of-domain Detection CM(X i ) X i-1 CM in-domain (X i-1 ) Topic Classification In-domain Verification ASR front-end Out-of-domain Detection Utterance Verification Framework

6 In-domain Confidence Measure of topic consistency with application domain –Previously applied in out-of-domain utterance detection Examples of errors detected via in-domain confidence Mismatch of domain REF: How can I print this WORD file double-sided ASR: How can I open this word on the pool-side hypothesis not consistent by topic  in-domain confidence low Erroneous recognition hypothesis REF: I want to go to Kyoto, can I go by bus ASR: I want to go to Kyoto, can I take a bath hypothesis not consistent by topic  in-domain confidence low REF: correct transcriptionASR: speech recognition hypothesis

7 Input Utterance X i (recognition hypothesis) Feature Vector Topic confidence scores ( C(t 1 | X i ),...,C(t m | X i ) ) Transformation to Vector-space In-Domain Verification V in-domain (X i ) CM in-domain (X i ) In-domain confidence Classification of Multiple Topics SVM (1~m) In-domain Confidence

8 Input Utterance X i (recognition hypothesis) In-Domain Verification V in-domain (X i ) CM in-domain (X i ) Classification of Multiple Topics SVM (1~m) In-domain Confidence e.g. ‘could I have a non-smoking seat’ (a, an, …, room, …, seat, …, I+have, … (1, 0, …, 0, …, 1, …, 1, … accom. airplane airport … % Transformation to Vector-space

9 In-domain Verification Model Linear discriminate verification model applied 1, …, m trained on in-domain data using “deleted interpolation of topics” and GPD [lane ‘04] C(t j |X i ): topic classification confidence score of topic t j for input utterance X j : discriminate weight for topic t j

10 Discourse Coherence Topic consistency with preceding utterance Examples of errors detected via discourse-coherence Erroneous recognition hypothesis Speaker A: Previous utterance [X i-1 ] REF: What type of shirt are you looking for? ASR: What type of shirt are you looking for? Speaker B: Current utterance [X i ] REF: I’m looking for a white T-shirt. ASR: I’m looking for a white teacher. topic not consistent across utterances  discourse coherence low REF: correct transcriptionASR: speech recognition hypothesis

11 Euclidean distance between current (X i ) and previous (X i-1 ) utterances in topic confidence space CM discourse large when X i, X i-1 related, low when differ Discourse Coherence

12 Joint Confidence Score Generalized Posterior Probability Confusability of recognition hypothesis against competing hypotheses [Lo & Soong] At utterance level: GWPP(x j ):generalized word posterior probability of x j x j :j-th word in recognition hypothesis of X

13 Joint Confidence Score where For utterance verification compare CM(X i ) to threshold (  ) Model weights ( gpp, in-domain, discourse ), and threshold (  ) trained on development set

14 Experimental Setup Training-set: ATR BTEC (basic-travel-expressions-corpus) –~400k sentences (Japanese/English pairs) –14 topic classes (accommodation, shopping, transit, …) –Train: topic-classification + in-domain verification models Evaluation data: ATR MAD (machine aided dialogue) –Natural dialogue between English and Japanese speakers via ATR speech-to-speech translation system –Dialogue data collected based on set of pre-defined scenarios –Development-set: 270 dialoguesTest-set: 90 dialogues On development set train: CM sigmoid transforms CM weights ( gpp, in-domain, discourse ) Verification threshold (  )

15 Speech Recognition Performance DevelopmentTest # dialogues27090 Japanese Side # utterances WER10.5%10.7% SER41.9%42.3% English Side # utterances WER17.0%16.2% SER63.5%55.2% ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM

16 Evaluation Measure Utterance-based Verification –No definite “keyword” set in S-2-S translation –If recognition error occurs (one or more errors)  prompt user to rephrase entire utterance CER (confidence error rate) –FA: false acceptance of incorrectly recognized utterance –FR: false rejection of correctly recognized utterance

17 GPP-based Verification Performance Accept All: Assume all utterances are correctly recognized GPP: Generalized posterior probability Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English) Accept All GPP Accept All

18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3%  15.9% (8.0% relative) for “GPP+IC+DC” case Incorporation of IC and DC Measures (Japanese) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

19 Similar performance for English side CER 15.3%  14.4% for “GPP+IC+DC” case Incorporation of IC and DC Measures (English) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP

20 Proposed novel utterance verification scheme incorporating “high-level” knowledge In-domain confidence: degree of match between utterance and application domain Discourse coherence: consistency between consecutive utterances Two proposed measures effective Relative reduction in CER of 8.0% and 6.1% (Japanese/English) Conclusions

21 “High-level” content-based verification Ignore ASR-errors that do not affect translation quality Further improvement in performance Topic Switching –Determine when users switch task Consider single task per dialogue session Future work