Download presentation
Presentation is loading. Please wait.
Published byLucas Allison Modified over 9 years ago
1
1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara Spoken Language Communications Research Laboratories, ATR School of Informatics, Kyoto University
2
2 Introduction Current ASR technologies not robust against: –Acoustic mismatch: noise, channel, speaker variance –Linguistic mismatch: disfluencies, OOV, OOD Assess confidence of recognition hypothesis, and detect recognition errors Effective user feedback Select recovery strategy based on type of error and specific application
3
3 Previous Works on Confidence Measures Feature-based –[Kemp] word-duration, AM/LM back-off Explicit model-based –[Rahim] likelihood ratio test against cohort model Posterior probability –[Komatani, Soong, Wessel] estimate posterior probability given all competing hypotheses in a word-graph Approaches limited to “low-level” information available during ASR decoding
4
4 Proposed Approach Exploit knowledge sources outside ASR framework for estimating recognition confidence e.g. knowledge about application domain, discourse flow Incorporate CM based on “high-level” knowledge sources In-domain confidence – degree of match between utterance and application domain Discourse coherence – consistency between consecutive utterances in dialogue
5
5 CM in-domain (X i ):in-domain confidence CM discourse (X i |X i-1 ):discourse coherence CM(X i ): joint confidence score, combine above with generalized posterior probability CM gpp (X i ) CM discourse (X i |X i-1 ) XiXi CM in-domain (X i ) Topic Classification In-domain Verification dist (X i,X i-1 ) Input utterance CM gpp (X i ) ASR front-end Out-of-domain Detection CM(X i ) X i-1 CM in-domain (X i-1 ) Topic Classification In-domain Verification ASR front-end Out-of-domain Detection Utterance Verification Framework
6
6 In-domain Confidence Measure of topic consistency with application domain –Previously applied in out-of-domain utterance detection Examples of errors detected via in-domain confidence Mismatch of domain REF: How can I print this WORD file double-sided ASR: How can I open this word on the pool-side hypothesis not consistent by topic in-domain confidence low Erroneous recognition hypothesis REF: I want to go to Kyoto, can I go by bus ASR: I want to go to Kyoto, can I take a bath hypothesis not consistent by topic in-domain confidence low REF: correct transcriptionASR: speech recognition hypothesis
7
7 Input Utterance X i (recognition hypothesis) Feature Vector Topic confidence scores ( C(t 1 | X i ),...,C(t m | X i ) ) Transformation to Vector-space In-Domain Verification V in-domain (X i ) CM in-domain (X i ) In-domain confidence Classification of Multiple Topics SVM (1~m) In-domain Confidence
8
8 Input Utterance X i (recognition hypothesis) In-Domain Verification V in-domain (X i ) CM in-domain (X i ) Classification of Multiple Topics SVM (1~m) In-domain Confidence e.g. ‘could I have a non-smoking seat’ (a, an, …, room, …, seat, …, I+have, … (1, 0, …, 0, …, 1, …, 1, … accom. airplane airport … 0.050.360.94 90 % Transformation to Vector-space
9
9 In-domain Verification Model Linear discriminate verification model applied 1, …, m trained on in-domain data using “deleted interpolation of topics” and GPD [lane ‘04] C(t j |X i ): topic classification confidence score of topic t j for input utterance X j : discriminate weight for topic t j
10
10 Discourse Coherence Topic consistency with preceding utterance Examples of errors detected via discourse-coherence Erroneous recognition hypothesis Speaker A: Previous utterance [X i-1 ] REF: What type of shirt are you looking for? ASR: What type of shirt are you looking for? Speaker B: Current utterance [X i ] REF: I’m looking for a white T-shirt. ASR: I’m looking for a white teacher. topic not consistent across utterances discourse coherence low REF: correct transcriptionASR: speech recognition hypothesis
11
11 Euclidean distance between current (X i ) and previous (X i-1 ) utterances in topic confidence space CM discourse large when X i, X i-1 related, low when differ Discourse Coherence
12
12 Joint Confidence Score Generalized Posterior Probability Confusability of recognition hypothesis against competing hypotheses [Lo & Soong] At utterance level: GWPP(x j ):generalized word posterior probability of x j x j :j-th word in recognition hypothesis of X
13
13 Joint Confidence Score where For utterance verification compare CM(X i ) to threshold ( ) Model weights ( gpp, in-domain, discourse ), and threshold ( ) trained on development set
14
14 Experimental Setup Training-set: ATR BTEC (basic-travel-expressions-corpus) –~400k sentences (Japanese/English pairs) –14 topic classes (accommodation, shopping, transit, …) –Train: topic-classification + in-domain verification models Evaluation data: ATR MAD (machine aided dialogue) –Natural dialogue between English and Japanese speakers via ATR speech-to-speech translation system –Dialogue data collected based on set of pre-defined scenarios –Development-set: 270 dialoguesTest-set: 90 dialogues On development set train: CM sigmoid transforms CM weights ( gpp, in-domain, discourse ) Verification threshold ( )
15
15 Speech Recognition Performance DevelopmentTest # dialogues27090 Japanese Side # utterances26741011 WER10.5%10.7% SER41.9%42.3% English Side # utterances30911006 WER17.0%16.2% SER63.5%55.2% ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM
16
16 Evaluation Measure Utterance-based Verification –No definite “keyword” set in S-2-S translation –If recognition error occurs (one or more errors) prompt user to rephrase entire utterance CER (confidence error rate) –FA: false acceptance of incorrectly recognized utterance –FR: false rejection of correctly recognized utterance
17
17 GPP-based Verification Performance Accept All: Assume all utterances are correctly recognized GPP: Generalized posterior probability Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English) Accept All GPP Accept All
18
18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3% 15.9% (8.0% relative) for “GPP+IC+DC” case Incorporation of IC and DC Measures (Japanese) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP
19
19 Similar performance for English side CER 15.3% 14.4% for “GPP+IC+DC” case Incorporation of IC and DC Measures (English) GPP: Generalized posterior probability IC: In-domain confidence DC: Discourse coherence GPP +IC GPP +DC GPP +IC +DC GPP
20
20 Proposed novel utterance verification scheme incorporating “high-level” knowledge In-domain confidence: degree of match between utterance and application domain Discourse coherence: consistency between consecutive utterances Two proposed measures effective Relative reduction in CER of 8.0% and 6.1% (Japanese/English) Conclusions
21
21 “High-level” content-based verification Ignore ASR-errors that do not affect translation quality Further improvement in performance Topic Switching –Determine when users switch task Consider single task per dialogue session Future work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.