Download presentation
Presentation is loading. Please wait.
Published byBeverley Nichols Modified over 8 years ago
1
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A.
2
Abstract Error analysis on Switchboard data show that : 87% of words proceeded by a correct word were correctly decoded 47% of words proceeded by incorrect word was correctly decoded Speech recognition errors limit the capability of language models to predict subsequent words correctly An effective way to enhance the functions of the language model is by using confidence measures Most of current efforts for developing confidence measures for speech recognition focus on the verification of the final result but doesn’t make any effort to correct recognition errors In this work, we use confidence measures early during the search process. A word-based acoustic confidence metric is used to define a dynamic language weight.
3
Using Confidence To Guide The Search The search score is changed from To the confidence based score Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W
4
We used the functional form The word sequence confidence is estimated by the average of its words’ confidence. Where N: The number of words in sequence W C(w j ) : The confidence of word w j C 0 : The operation point threshold LW 0 : The static language weight r : A smoothing parameter
5
For bigram models we approximate by the current and previous words confidence LW as a function of C(W), LW 0 =6.5, C 0 =0.65
6
Rank Based Confidence Measures Where C(a): Confidence score of phone a n s : The start frame of phone a n e : The end frame of phone a bp: The base phone of phone a N base : The number of base phones Phone level score
7
Where C(w) : the confidence score of word w a i : phone constituent of word w N: The number of phones in word w Word level score Histogram for the position of the correct phone in the ranking list
8
Operation Point Threshold Selection Distributions of C(w) for the correct and incorrect classes of ATIS training data
9
Experimental Results Decoder r Error rate Baseline 015.30% Modified112.44% Modified212.42% Modified312.47% Error rate for different values of r: WSJ data Decoder r Error rate Baseline 0.07.27% Modified1.15.8% Modified1.25.76% Modified1.35.9% Error rate for different values of r: ATIS data
10
CONCLUSION AND FUTURE WORK We used a confidence metric to improve the integration of system models and guide the search towards the most promising paths Dynamic tuning of the language model weight parameter proved to be effective for performance improvement Rank based confidence measures are efficient and can be extracted from the online search side information.It doesn’t require the training of anti-models The average base phone rank achieved a good performance as a predictor for correctness of word hypothesis Future work: We plan to extend this work for the cases when we have high confidence only for one of the words, we should back off to the unigram language model score not completely reduce the language model score.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.