Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.

Slides:



Advertisements
Similar presentations
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Advertisements

Component Analysis (Review)
1 Patrol LID System for DARPA RATS P1 Evaluation Pavel Matejka Patrol Team Language Identification System for DARPA RATS P1 Evaluation Pavel Matejka 1,
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptation Resources: RS: Unsupervised vs. Supervised RS: Unsupervised.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
Investigation on Mandarin Broadcast News Speech Recognition Mei-Yuh Hwang, Xin Lei, Wen Wang*, Takahiro Shinozaki University of Washington, *SRI 9/19/2006,
Optimal Adaptation for Statistical Classifiers Xiao Li.
Discriminative Feature Optimization for Speech Recognition
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
Multimodal Interaction Dr. Mike Spann
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Discriminant Functions
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Power Linear Discriminant Analysis (PLDA) M. Sakai, N. Kitaoka and S. Nakagawa, “Generalization of Linear Discriminant Analysis Used in Segmental Unit.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
NTNU SPEECH AND MACHINE INTELEGENCE LABORATORY Discriminative pronunciation modeling using the MPE criterion Meixu SONG, Jielin PAN, Qingwei ZHAO, Yonghong.
Jeff Ma and Spyros Matsoukas EARS STT Meeting March , Philadelphia Post-RT04 work on Mandarin.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
D ISCRIMINATIVE T RAINING B ASED O N A N I NTEGRATED V IEW O F MPE A ND MMI I N M ARGIN A ND E RROR S PACE Erik McDermott, Shinji Watanabe and Atsushi.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
LECTURE 11: Advanced Discriminant Analysis
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Classification Discriminant Analysis
Classification Discriminant Analysis
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
ECE539 final project Instructor: Yu Hen Hu Fall 2005
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
Automatic Speech Recognition: Conditional Random Fields for ASR
LECTURE 09: DISCRIMINANT ANALYSIS
Sequence Student-Teacher Training of Deep Neural Networks
Learning Long-Term Temporal Features
Linear Discrimination
Presentation transcript:

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton St. Cambridge Reporter : Chang Chih Hao

Introduction LDA and HLDA –Better classification accuracy –some common Limitations None of them assumes any prior knowledge of confusable hypotheses Their objective functions do not directly relate to the word error rate (WER) Minimum Phoneme Error –Minimize phoneme errors in lattice-based training frameworks –Since this criterion is closely related to WER, MPE_HLDA tends to be more robust than other projection methods, which makes it potentially better suited for a wider variety of features.

MPE Objection Function MPE-HLDA model MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function

MPE Objection Function

MPE Objection Function It can be shown that the derivative of (4) with respect to A is

MPE Objection Function

MPE Objection Function Therefore, Eq.(6) can be rewritten as 39*39 39*162

MPE-HLDA Implementation In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory. Two steps –First, run a forward-backward pass over the training lattices to acumulate –Second, uses these statistics together with the full covariance matrices to synthesize the derivative. The Paper used gradient descent in updating the projection matrix.

MPE-HLDA Implementation

Experimental Framework A L p*n n*l l*1 p*1 Global feature projection ---there is more useful information in longer contexts ---Reduce the computational cost

Experimentation Conversational Telephone Speech (CTS) –2300 hours of training data 800 hours : training the initial ML model 1500 hours : held-out training data –Lattice generation –Discriminative training –MPE-HLDA : only 370 hours –Testing set Eval03 Dev04

Experimentation Conversational Telephone Speech (CTS) –Feature Frame concatenated PLP cepstra –15 frames, l = 225, n = 130, p = 60

Experimentation

Broadcast News (BN) –600 hours : training the initial model (Hub4 and TDT) –330 hours : held-out data

Thanks