Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Slides:



Advertisements
Similar presentations
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Advertisements

Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Confidence Measures for Speech Recognition Reza Sadraei.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
7-Speech Recognition Speech Recognition Concepts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
An overview of decoding techniques for LVCSR
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Statistical Models for Automatic Speech Recognition
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Missing feature theory
LECTURE 15: REESTIMATION, EM AND MIXTURES
Speech recognition, machine learning
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Speech recognition, machine learning
Presentation transcript:

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University of Miami Coral Gables, Florida 33124, U.S.A. DSAP

Abstract Speech recognition errors limit the capability of language models to predict subsequent words correctly Error analysis on Switchboard data show that : 87% of words proceeded by a correct word were correctly decoded 47% of words proceeded by incorrect word was correctly decoded An effective way to enhance the function of the language model is by using confidence measures Most of current efforts for developing confidence measures for speech recognition focus on the verification of the final result but doesn’t make any effort to correct recognition errors In this work, we use confidence measures early during the search process. A word-based acoustic confidence metric is used to define a dynamic language weight.

Using Confidence To Guide The Search The search score is changed from To the confidence based score Where A : Acoustic input W : The hypothesized word sequence P(A/W): The acoustic model score P(W) : The language model score LW :The language weight C(W) : The confidence of word sequence W

We used the functional form The word sequence confidence is estimated by the average of its words’ confidence. Where N : The number of words in sequence W C(w j ) : The confidence of word w j C 0 : The operation point threshold LW 0 : The static language weight r : A smoothing parameter

For bigram models we approximate by the current and previous words confidence LW as a function of C(W), LW 0 =6.5, C 0 =0.65

Constraints On The Measures Used For Confidence-Based Language Model (CBLM) Efficiency: Has to be computationally inexpensive Synchronization: Can be extracted from on-line information Source of information : Extracted only from acoustic data

Word Posterior As a Confidence Measure Ignored in all ASR systems

Observation Probability Estimation Theoretically: Discrete HMM: Semi-Continuous HMM: q : model states m(x) : vector quantization of x C : number of mixtures w iq : mixture weights g i (x): mixtures

Continuous HMM: Building a catch-all model

Mixtures Clustering Technique B distance : Bhattacharyya distance

Vector Quantization OV: observation vector CV i : code vector : Gaussian mixture mean Computation reduction using VQ

The Catch-all Model Performance Relative ROC performance of reduced catch-all models

Word Level Confidence Measures Arithmetic Mean Geometric Mean Weighted Mean CM(a): confidence score of phoneme a a, b : linear model parameters

Word Level Confidence Measures Performance ROC curves indicating the relative performance of CM am, CM gm and CM wm

Performance Evaluation Compared With Other Approaches Comparison of the catch-all model measure, the likelihood ratio(LR) measure and the word lattice based measure

Experimental Results Smoothing Parameter( r) Threshold % 118.6%18.43%18.41% % 218.9%18.42% % 318.9%18.47% % WER for different threshold and r values Recognition accuracy for words following correctly decoded and incorrectly decoded words

CONCLUSION AND FUTURE WORK We used a confidence metric to improve the integration of system models and guide the search towards the most promising paths Dynamic tuning of the language model weight parameter proved to be effective for performance improvement Word posterior based confidence measures are efficient and can be extracted from the online search side information.It doesn’t require the training of anti-models With CBLM the language model score will be favored in regions of ambiguous acoustics, but will plays a second fiddle when the acoustics are well matched. Future work: We plan to extend this work for the cases when we have high confidence only for one of the words, we should back off to the unigram language model score not completely reduce the language model score.