Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

Slides:



Advertisements
Similar presentations
On-line learning and Boosting
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
Confidence Measures for Speech Recognition Reza Sadraei.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Present by: Fang-Hui Chu A Survey of Large Margin Hidden Markov Model Xinwei Li, Hui Jiang York University.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Online Learning Algorithms
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
A Survey of Boosting HMM Acoustic Model Training
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Benk Erika Kelemen Zsolt
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1classifier 2classifier.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Automatic Speech Recognition
Reading: R. Schapire, A brief introduction to boosting
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Hierarchical Multi-Stream Posterior Based Speech Recognition System
Statistical Models for Automatic Speech Recognition
Statistical Models for Automatic Speech Recognition
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Presentation transcript:

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories, Germany SPEECH COMMUNICATION 2006

2 AdaBoost introduction The AdaBoost algorithm was presented for transforming a “weak” learning rule into a “strong” one The basic idea is to train a series of classifiers based on the classification performance of the previous classifier on the training data In multi-class classification, a popular variant is the AdaBoost.M2 algorithm AdaBoost.M2 is applicable when a mapping can be defined for classifier which is related to the classification criterion

3 AdaBoost.M2 (Freund and Schapire, 1997)

4 AdaBoost introduction The update rule is designed to guarantee an upper bound on the training error of the combined classifier which is exponentially decreasing with the number of individual classifiers In multi-class problems, the weights are summed up to give a weight for each training pattern :

5 Introduction Why there are only a few studies so far applying boosting to acoustic model training? –Speech recognition is an extremely complex large scale classification problem The main motivation to apply AdaBoost to speech recognition is –Its theoretical foundation providing explicit bounds on the training and–in terms of margins–on the generalization error

6 Introduction In most previous applications to speech recognition, boosting was applied to classifying each individual feature vector to a phoneme symbol [ICASSP04][Dimitrakakis] –Needing the phoneme posterior probabilities But the problem is.. –The conventional HMM speech recognizers do not involve an intermediate phoneme classification step for individual feature vectors –So the frame-level boosting approach cannot straightforwardly be applied

7 Utterance approach for boosting in ASR An intuitive way of applying boosting to HMM speech recognition is at the utterance level –Thus, boosting is used to improve upon an initial ranking of candidate word sequences The utterance approach has two advantages: –First, it is directly related to the sentence error rate –Second, it is computationally much less expensive than boosting applied at the level of feature vectors

8 Utterance approach for boosting in ASR In utterance approach, we define the input patterns to be the sequence of feature vectors corresponding to the entire utterance denotes one possible candidate word sequence of the speech recognizer, being the correct word sequence for utterance The a posteriori confidence measure is calculated on basis of the N-best list for utterance

9 Utterance approach for boosting in ASR Based on the confidence values and AdaBoost.M2 algorithm, we calculate an utterance weight for each training utterance Subsequently, the weight are used in maximum likelihood and discriminative training of Gaussian mixture model

10 Utterance approach for boosting in ASR Some problem encountered when apply it to large-scale continuous speech application: –The N-best lists of reasonable length (e.g. N=100) generally contain only a tiny fraction of the possible classification results This has two consequences: –In training, it may lead to sub-optimal utterance weights –In recognition, Eq. (1) cannot be applied appropriately

11 Utterance approach for CSR--Training Training –A convenient strategy to reduce the complexity of the classification task and to provide more meaningful N-best lists consists in “chopping” of the training data –For long sentences, it simply means to insert additional sentence break symbols at silence intervals with a given minimum length –This reduces the number of possible classifications of each sentence “fragment”, so that the resulting N-best lists should cover a sufficiently large fraction of hypotheses

12 Utterance approach for CSR--Decoding Decoding: lexical approach for model combination –A single pass decoding setup, where the combination of the boosted acoustic models is realized at a lexical level –The basic idea is to add a new pronunciation model by “replicating” the set of phoneme symbols in each boosting iteration (e.g. by appending the suffix “_t” to the phoneme symbol) –The new phoneme symbols, represent the underlying acoustic model of boosting iteration “au”, “au_1”,“au_2”,…

13 Utterance approach for CSR--Decoding Decoding: lexical approach for model combination (cont.) –Add to each phonetic transcription in the decoding lexicon a new transcription using the corresponding phoneme set –Use the reweighted training data to train the boosted classifier –Decoding is then performed using the extended lexicon and the set of acoustic models weighted by their unigram prior probabilities which are estimated on the training data “sic_a”, “sic_1 a_1”,… weighted summation

14 In more detail Boosting Iteration t “_t” MtMt Training corpus training corpus(M t ) phonetically transcribed ML/MMI training M 1,M 2,…,M t Lexicon pronunciation variant extend Training “sic_a”, “sic_1 a_1”,… unweighted model combination weighted model combination Decoding

15 In more detail

16 Weighted model combination Word level model combination

17 Experiments Isolated word recognition –Telephone-bandwidth large vocabulary isolated word recognition –SpeechDat(II) German meterial Continuous speech recognition –Professional dictation and Switchboard

18 Isolated word recognition Database: –Training corpus: consists of 18k utterances (4.3h) of city, company, first and family names –Evaluations: LILI test corpus: 10k single word utterances (3.5h); 10k words lexicon; (matched conditions) Names corpus: an inhouse collection of 676 utterances (0.5h); two different decoding lexica: 10k lex, 190k lex; (acoustic conditions are matched, whereas there is a lexical mismatch) Office corpus: 3.2k utterances (1.5h), recorded over microphone in clean conditions; 20k lexicon; (an acoustic mismatch to the training conditions)

19 Isolated word recognition Boosting ML models

20 Isolated word recognition Combining boosting and discriminative training –The experiments in isolated word recognition showed that boosting may improve the best test error rates

21 Continuous speech recognition Database –Professional dictation An inhouse data collection of real-life recordings of medical reports The acoustic training corpus consists of about 58h of data Evaluations were carried out on two test corpora: –Development corpus consists of 5.0h of speech –Evaluation corpus consists of 3.3h of speech –Switchboard Consisting of spontaneous conversations recorded over telephone line; 57h(73h) of male(female) Evaluations corpus: –Containing about 1h(0.5h) of male(female)

22 Continuous speech recognition Professional dictation:

23 Switchboard:

24 Conclusions In this paper, a boosting approach which can be applied to any HMM based speech recognizer was be presented and evaluated The increased recognizer complexity and thus decoding effort of the boosted systems is a major drawback compared to other training techniques like discriminative training

25 References [ICASSP02][C.Meyer] Utterance-Level Boosting of HMM Speech Recognizers [ICML02][C.Meyer] Towards Large Margin Speech Recognizers by Boosting and Discriminative Training [ICSLP00][C.Meyer] Rival Training: Efficient Use of Data in Discriminative Training [ICASSP00][Schramm and Aubert] Efficient Integration of Multiple Pronunciations in a Large Vocabulary Decoder