Download presentation
Presentation is loading. Please wait.
Published byBrent Mason Modified over 9 years ago
1
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal
2
2 Reference A. Sanchis, A. Juan and E. Vidal, “Improving Utterance Verification Using a Smoothed Naïve Bayes Model”, ICASSP 2003 A. Sanchis, A. Juan, and E. Vidal, “Estimating confidence measures for speech recognition verification using a smoothed naïve bayes model”, IbPRIA 2003.
3
3 Outline Introduction Smoothed Naïve Bayes Model Predictor Features Experiment Conclusion
4
4 Introduction Current speech recognition systems are not error-free. To predict the reliability of each hypothesized word. It can be seen as a conventional pattern recognition problem in which hypothesized word is to be transformed into a feature vector and then classified as either correct or incorrect.
5
5 Smoothed Na ï ve Bayes Model
6
6 Smoothed Na ï ve Bayes Model (cont)
7
7 Unfortunately, these frequencies often underestimate the true probabilities involving rare training data. To circumvent this problem, we have considered an absolute discounting smoothing model. To discount a small constant to every positive count and then distribute the gained probability mass among the null counts.
8
8 Smoothed Na ï ve Bayes Model (cont)
9
9
10
10 Predictor Features A set of Well-known features has been selected to perform. Acoustic stability : Number of times that a hypothesized word appears at the same position (as computed by Levenshtein alignment) in K alternative outputs of the speech recognizer obtained using different weight between acoustic and language model scores. LMprob : Language model probability. Hypothesis density (HD) : the average number of the active hypotheses within the hypothesized word boundaries.
11
11 Predictor Features (cont) A set of Well-known features has been selected to perform (cont). PrecPh : the percentage of hypothesized word phones that match the phones obtained in a “phone-only” decoding. Duration : the word duration in frames divided by its number of phones ACscore : the acoustic log-score of the word divided by its number of phones.
12
12 Predictor Features (cont) A set of Well-known features has been selected to perform (cont). Word Trellis Stability (WTS) : the motivation of the WTS comes from the following observations : a word is most probably correct if it is appears, within approximately the same time interval, in the majority of the most probable hypotheses.
13
13 Predictor Features (cont) WTS : Figure 1. Path scores of three competing Partial hypotheses
14
14 Predictor Features (cont) WTS (cont):
15
15 Experiments Using two different corpora Traveler task, a Spanish speech corpus of person-to- person communication utterance at the reception desk of a hotel. FUB task, an Italian speech corpus of phone calls to the front desk of a hotel Table 1. Traveler and FUB speech corpus
16
16 Experiments (cont) Traveler task : 24 context-independent Spanish phonemes were modeled by conventional left-to-right HMM. A bigram language model was estimated using the whole training text corpus of the Traveler task. The test set Word Error Rate was 5.5% FUB task The HMMs were trained using Linear discriminant analysis. Decision-tree clustered generalized triphones were used as phone units. A smoothed trigram LM was estimated using the transcription of the training utterances. The test-set Word Error Rate was 27.5%
17
17 Experiments (cont) In evaluating verification systems two measure are of interest : the True Rejection Rate (TRR) and the False Rejection Rate (FRR) A Receiver Operating Characteristic (ROC) curve represents TRR against FRR for different values of
18
18 Experiments (cont) In evaluating verification systems two measure are of interest (cont): The area under a ROC curve divided by the area of a worst-case diagonal ROC curve, provides an adequate overall estimation of the classification accuracy. We denote this area ratio as AROC.
19
19 Experiments (cont) Table 2. AROC values for each individual feature Table 3. AROC values for the best feature combinations
20
20 Experiments (cont) Fig. 1. Comparative ROC curves for single features AS and HD versus the best feature combination (Traveler corpus)
21
21 Experiments (cont) Fig. 2. Comparative ROC curves for single features AS and HD versus the best feature combination (FUB corpus)
22
22 Conclusion The result show that the combination of different features is significantly better than the single-feature performance of a set of well-known features. The WTS feature has demonstrated to be particularly effective to improve the classification performance.
23
New Features based on Multiple Word Graphs for Utterance Verification Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal
24
24 Reference A. Sanchis, A. Juan and E. Vidal, “New Feature based on Multiple Word Graphs for Utterance Verification”, ICSLP 2004 A. Sanchis, A. Juan and E. Vidal, “Improving Utterance Verification Using a Smoothed Naïve Bayes Model”, ICASSP 2003
25
25 Outline Features based on multiple word graphs Experiments Conclusions
26
26 Features based on multiple word graphs A word graph G is a directed, acyclic, weighted graph. 台東 妙語 無端 太重 良心 不斷 太多 台中 良心 兩任 SIL 豪雨 無端 台東 不斷 兩人 陶藝 失蹤 私人 良心 自任
27
27 Features based on multiple word graphs (cont)
28
28 Features based on multiple word graphs (cont) The posterior probabilities can be based on different kinds of knowledge depending on the weights associated to the word graph edges. In this work, we study three features: WgAC : the weights are acoustic scores. WgLM : the weights are language model scores. WgTOT : the weights are the combination of acoustic and language model scores.
29
29 Features based on multiple word graphs (cont) The algorithm proposed to compute each of these feature is described in right Figure tT w …
30
30 Experiments To compare the proposed features with alternative predictor features, a set of well-know features has been selected : Acoustic stability : Number of times that a hypothesized word appears at the same position in K alternative outputs of the speech recognizer obtained using different weight between acoustic and language model scores. LMprob : Language model probability. Hypothesis density (HD) : the average number of the active hypotheses within the hypothesized word boundaries.
31
31 Experiments (cont) To compare the proposed features with alternative predictor features, a set of well-know features has been selected (cont). PrecPh : the percentage of hypothesized word phones that match the phones obtained in a “phone- only” decoding. Duration : the word duration in frames divided by its number of phones ACscore : the acoustic log-score of the word divided by its number of phones.
32
32 Experiments (cont) To compare the proposed features with alternative predictor features, a set of well-know features has been selected (cont). Word Trellis Stability (WTS) :
33
33 Experiments (cont) Corpus : The test-set Word Error Rate was 27.5% Table 1: FUB speech corpus
34
34 Experiments (cont) Metrics for the evaluation of the classification accuracy. A ROC curve represents TRR against FRR for different values of AROC Confidence Error Rate (CER) : defined as the number of classification errors divided by the total number of recognized words. A baseline CER is obtained assuming that all recognized words are classified as correct.
35
35 Experiments (cont) Table 2: AROC, CER and relative reduction in baseline CER for each individual feature.
36
36 Experiments (cont) Table 3: AROC, CER and relative reduction in baseline CER for the best feature combinations.
37
37 Experiments (cont) Figure 2. Comparative ROC curves for the best single feature versus the best feature combination.
38
38 Experiments (cont) Table 4 : Comparative AROC and CER values for each feature computed using a single or multiple word graphs.
39
39 Conclusions The result show that the proposed features outperform those computed on a single word graph and other well-known predictor features. The single feature performance is improved through the combination of the proposed features along with other kind of features.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.