Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor.

Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor : Hsin-Min Wang Berlin Chen

2 Outline Introduction The category of estimation methods of confidence measure (CM) –Featured based –Posterior probability based –Explicit model based –Incorporation of high-level information for CM* The application of CM to improve speech recognition Summary

3 Introduction (1/9) It is extremely important to be able to make an appropriate and reliable judgement based on the error- prone ASR result. Researchers have proposed to compute a score (preferably 0~1), called confidence measure (CM), to indicate reliability of any recognition decision made by an ASR system.

4 Introduction (2/9) Feature extraction Decoding speech signal Acoustic model Language model recognized word sequence feature vector Lexicon Confidence Measure Verification 臺北到魚籃 1 2 1. 臺北到魚籃 2. 臺北到宜蘭 Some application of CM

5 Introduction (3/9) First of all, we can backtrack some early research on CM to rejection in word-spotting systems. Other early CM-related works lie in automatic detection of new words in LVCSR. From the past few years, the CM has been applied to more and more research areas, e.g., –To improve speech recognition –The algorithm about look-head in LVCSR –To guide the system to perform unsupervised learning –…

6 Recognized units Introduction (4/9) The general procedure of CM for verification Confidence estimation judgment Predefined threshold > threshold< threshold Confidence of unit acceptance rejection

7 Introduction (5/9) Four situations when judging a hypothesis 宜蘭 ref hyp 宜蘭 AcceptCorrect acceptance rejectCorrect rejection rejectfalse rejection Acceptfalse acceptance 宜蘭 ref hyp 魚籃宜蘭 ref hyp 宜蘭 ref hyp 魚籃

8 Introduction (6/9) The evaluation metric : –Confidence error rate : 三民候選人通過有三名候選人通過審查了 FACAFRCAFA hyp ref

9 Introduction (7/9) The evaluation metric : –Confidence error rate : 三民候選人通過有三名候選人通過審查了 FACA FA hyp ref

10 Introduction (8/9) The evaluation metric (cont): –Receiver operator characteristics (ROC) curve :simply contains a plot of the false acceptance rate over the detection rate.

11 Introduction (9/9) All methods proposed for computing CMs can be roughly classified into three major categories [7]: –Feature based –Posterior probability based –Explicit model based (utterance verification, UV) –Incorporation of high-level information for CM*

12 Feature-based confidence measure

13 Feature-based confidence measure (1/8) The feature can be collected during the decoding procedure and may include acoustic, language and syntactic information Any feature can be called a predictor if its p.d.f. of correctly recognized words is clearly distinct from that of misrecognized words misrecognized wordcorrectly recognized word

14 Feature-based confidence measure (2/8) Some common predictor features –Pure normalized likelihood score related : acoustic score per frame. –N-best related : count in the N-best list, N-best homogeneity score –Duration related : word duration divided by its number of phones

15 Feature-based confidence measure (3/8) Some common predictor features (cont) –Hypothesis density : 靜音結果建國有由又三名候選人沒有審查候選人通過三名候選人通過

16 Feature-based confidence measure (4/8) Some common predictor features (cont) –Acoustic stability 今天天氣很好 Hypothesized word sequence 天氣很好今天天氣今天 Hypothesized word sequence 今天天氣今天天氣不佳今天天氣很好今天天氣 Hypothesized word sequence 不佳

17 Feature-based confidence measure (6/8) We can combine the above features with any one of the following classifiers –Line discriminant function –Generalized linear model –Neural networks –Decision tree –Support vector machine –Boosting –Naïve Bayes classifier

18 Feature-based confidence measure (7/8) Naïve Bayes Classifier [3]

19 Feature-based confidence measure (8/8) Experiments [3] Corpus : an Italian speech corpus of phone calls to the front desk of a hotel feature CER(%) relative reduction (%) acoustic stability16.322.4 language model score 18.810 hypothesis density18.910.5 duration19.38.1 acoustic score19.66.7 baseline21

20 Posterior probability based confidence measure

21 Posterior probability based confidence measure (1/11) Posterior probability of a word sequence : To adopt some approximation methods Impossible to estimate in a precise manner

22 靜音建國 Posterior probability based confidence measure (2/11) Word graph based approximation 結果有由又又有三名候選人沒有靜音通過靜音候選人通過三名候選人靜音

23 Posterior probability based confidence measure (3/11) Posterior probability of a word arc : –Some issues are addressed and the word posterior probability is generalized Reduced search space Relaxed time registration Optimal acoustic and language model weights

24 Posterior probability based confidence measure (4/11) Posterior probability of a word arc [6] : 靜音結果建國有由又又有三名候選人沒有靜音通過靜音候選人通過三名候選人靜音

26 三名 Posterior probability based confidence measure (6/11) Posterior probability of a word arc [6] : 靜音結果建國有由又又有三名候選人沒有靜音通過靜音候選人通過三名候選人靜音

28 Posterior probability based confidence measure (8/11) The drawbacks of the above methods – all need an additional pass. In [8], the “local word confidence measure” is proposed 今天

29 Posterior probability based confidence measure (8/11) local word confidence measure (cont) bigram applied forward/backward bigram applied

30 Posterior probability based confidence measure (9/11) Impact of word graph density on the quality of posterior probability [9] Baseline 27.3 15.4

31 Posterior probability based confidence measure (10/11) Experiments [6] corpusbaselineC normal C sec C med C max ARISE13.611.58.98.88.9 Verbmobil27.323.319.020.018.9 NAB 20k11.310.39.2 NAB 64k9.28.47.2 Broadcast news27.723.720.620.420.6

32 Explicit model based confidence measure (1/10) The CM problem is formulated as a statistical hypothesis testing problem. Under the framework of binary hypothesis testing, there are two complementary hypotheses We test against

33 Explicit model based confidence measure (3/10) The above LRT score can be transformed to a CM based on a monotonic 1-1 mapping function. The major difficulty with LRT is how to model the alternative hypothesis. In practice, the same HMM structure is adopted to model the alternative hypothesis. A discriminative training procedure plays a crucial role in improving modeling performance.

34 Explicit model based confidence measure (3/10) Two-pass procedure : 今天天氣很好

35 Explicit model based confidence measure (4/10) One-pass procedure 今天天氣很好

36 Explicit model based confidence measure (5/10) How to calculate the confidence of a recognized word?

37 Explicit model based confidence measure (6/10) How to calculate the confidence of a recognized word (cont)?

38 Explicit model based confidence measure (7/10) Discriminative training [10] –The goal of the training procedure is to increase the average value of for correct hypotheses and decrease the average value of for false acceptance.

39 Explicit model based confidence measure (8/10) Discriminative training (cont)

40 Explicit model based confidence measure (9/10) Why discriminative training works?

41 Explicit model based confidence measure (10/10) Experiments [10] This task, referred to as the “movie locator”,

42 Incorporation of high-level information for CM

43 Incorporation of high-level information for CM (1/4) LSA –The key property of LSA is that words whose vectors are close to each other are semantically similar words. –These similarities can be used to provide an estimate of the likelihood of the words co-occurring within the same utterance. A U

44 Incorporation of high-level information for CM (2/4) LSA (cont) –The entry of matrix : –The confidence of a recognized word :

45 Incorporation of high-level information for CM (3/4) Inter-word mutual information :

46 Incorporation of high-level information for CM (4/4) Experiments [14] CMSwitchbordMandarin dictation LSA44.738.5 MI41.033.7 Cmed24.417.5 N-best count28.321.1 MI+Cmed23.916.2

47 The application of CM to improve speech recognition

48 The application of CM to improve speech recognition (1/10) Statistical decision theory aims at minimizing the expected of making error 靜音結果建國有由又又有三名候選人沒有靜音通過靜音候選人通過三名候選人靜音

49 The application of CM to improve speech recognition (2/10) Method 1 [16]:

50 The application of CM to improve speech recognition (3/10) Method 2 [18] :

51 The application of CM to improve speech recognition (4/10) Method 3 (Time Frame Error decoding) [17]: –In minimum Bayes risk decoding –if

52 The application of CM to improve speech recognition (5/ 10 ) Method 3 (cont)

53 The application of CM to improve speech recognition (6/10) Method 3 (cont) –we are now faced with a conceptual mismatch between the decision rule and the evaluation criterion for speech recognizers- the word error rate –The easiest way to overcome this mismatch is to use the same cost function for evaluation – Levensthein distance –In (Stolcke et. al 1997), the pairwise alignment is restricted to N- best list. –Let us assume that sub. were the one type of error. A dynamic programming alignment would thus not be necessary.

54 The application of CM to improve speech recognition (7/10) Method 3 (cont) 今天天氣很好今天天氣

55 The application of CM to improve speech recognition (8/10) Method 3 (cont)

56 The application of CM to improve speech recognition (9/10) Method 3 (cont) Can be interpreted as the normalized Probability of a word being incorrect.

57 The application of CM to improve speech recognition (10/10) Experiments (Method 3) corpusbaselineTime frame error ARISE15.815.0 Verbmobil33.632.5 NAB 20k13.212.9 NAB 64k11.110.8 broadcast news33.332.3

58 Summary Almost all CMs rely almost entirely on a single information source :how much the underlying decision can overtake other possible competitors. We believe it is critical to improve performance of CMs by –taking this segmentation issue into account. –Deciding a dynamic threshold for different word.

59 Reference (1/4) Main reference –[1] H. Jiang,“Confidence Measures for Speech Recognition : A Survey”, Speech communication 2005. Feature based confidence measure –[2] S. Cox and R. Rose, “Confidence Measures for The Switchboard Database”, ICASSP 1996. –[3] T. Schaaf and T. Kemp, “Confidence Measures for Spontaneous Speech Recognition”, ICASSP 1997. –[4] A. Sanchis, A. Juan and E. Vidal, “New Features based on Multiple Word Graphs for Utterance Verification”, ICSLP 2003. –[5] R. Zhang and A.I. Rudnicky, “Word Level Confidence Annotation Using Combinations of Features”, EuroSpeech 2001. Posterior based confidence measure –[6] F. Wessel, R. Schluter, K. Macherey, and H. Ney, “Confidence Measures for Large Vocabulary Continuous Speech Recognition”, IEEE SAP 2001. –[7] F. K. Soong and W. K. Lo, “Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences”, ICASSP 2005

60 Reference (2/4) Posterior based confidence measure –[8] J. Razik, O. Mella, D. Fohr, J.-P. Haton, “Local Word Confidence Measure Using Word Graph and N-Best List.” –[9] T, Fabian, R. Lieb, G. Ruske, M. Thomae, “Impact of Word Graph Density on the Quality of Posterior Probability Based Confidence Measures. Explicit model based confidence measure –[10] E. Lleida, R. C. Rose, “Utterance Verification in Continuous Speech Recognition: Decoding and Training Procedures”, IEEE SAP 2000. –[11] M. G. Rahim and C.-H Lee, “String-based Minimum Verification Error (SB-MVE) Training for Speech Recognition”, computer speech and language 1997. –[12] H. Jiang, F. K. Soong and C.-H. Lee, “A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification”, IEEE SAP 2005.

61 Reference (3/4) Incorporation of high-level information for CM –[13] R. Sarikaya, Y. Gao, M. Picheny and H. Erdogan, “Semantic Confidence Measurement for Spoken Dialog Systems.”, IEEE SAP 2005. –[14] G. Guo, C. Huang, H. Jiang, R.-H. Wang, “A Comparative Study on Various Confidence Measures in Large Vocabulary Speech Recognition”, ISCSLP 2004. Some application for CM –[15] M. Afify, F. Liu, H. Jiang and O. Siohan, “A New Verification-based Fast-Match for Large Vocabulary Continuous Speech Recognition” IEEE SAP 2005 –[16] F. Wessel, R. Schluter and H. Ney, “Using Posterior Word Probabilities for Improved Speech Recognition”, ICASSP 2000. –[17] F. Wessel, R. Schluter and H. Ney, “Explicit Word Error Minimization Using Word Hypothesis Posterior Probabilities”, ICASSP 2001.

62 Reference (2/4) Some application for CM –[18] A. Kobayashi, K. Onoe, S. Sato and T. Imai, “Word Error Minimization Using an Integrate Confidence Measure”, INTERSPEECH 2005. –[19] Y. Qian, T. Lee and F. K. Soong, “Tone Information as a Confidence Measure for Improving Cantonese LVCSR “, ICSLP 2004.

Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor.

Similar presentations

Presentation on theme: "Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor.

Similar presentations

Presentation on theme: "Confidence Measures for Automatic Speech Recognition Presented by Tzan-Hwei Chen National Taiwan Normal University Spoken Language Processing Lab Advisor."— Presentation transcript:

Similar presentations

About project

Feedback