I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Brief introduction on Logistic Regression
國立雲林科技大學 National Yunlin University of Science and Technology Application of LVQ to novelty detection using outlier training data Hyoung-joo Lee, Sungzoon.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.
Topic 6: Introduction to Hypothesis Testing
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Department of electrical and computer engineering An Equalization Technique for High Rate OFDM Systems Mehdi Basiri.
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.
Active Learning for Class Imbalance Problem
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Discriminative Training and Acoustic Modeling for Automatic Speech Recognition - Chap. 4 Discriminative Training Wolfgang Macherey Von der Fakult¨at f¨ur.
Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Linear Models for Classification
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
ECE 8443 – Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional Likelihood Mutual Information Estimation (CMLE) Maximum MI Estimation.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Jen-Tzung Chien, Meng-Sung Wu Minimum Rank Error Language Modeling.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
D ISCRIMINATIVE T RAINING B ASED O N A N I NTEGRATED V IEW O F MPE A ND MMI I N M ARGIN A ND E RROR S PACE Erik McDermott, Shinji Watanabe and Atsushi.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
An Iterative Approach to Discriminative Structure Learning
Statistical Models for Automatic Speech Recognition
Maximum Likelihood Estimation
Statistical Models for Automatic Speech Recognition
Mathematical Foundations of BME Reza Shadmehr
Generally Discriminant Analysis
LECTURE 23: INFORMATION THEORY REVIEW
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Linear Discrimination
SVMs for Document Ranking
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Discriminative Training
Presentation transcript:

I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering National Taiwan Normal University

Outline Introduction MLE &MCE I-smooth Method on MCE Experimental results Conclusions

Introduction The most common DT methods are minimum classification error (MCE) and maximum mutual information (MMI) (or its variant minimum phone error (MPE) ) training. Both these DT methods focus on reducing the empirical error in the training set. One of the more important drawbacks of DT – the overfitting problem – seems to be addressed only through solutions that are either a departure from the DT framework, or have limited practical applicability.

Introduction(cont.) Interpolation between MLE and the discriminative objective functions has been applied in MMIE and MPE in a manner that depends on the amount of data available for each Gaussian or has been directly applied as hard interpolation. They use I-smoothing to interpolate between two MCEs with different steepness parameters, as opposed to interpolating between MLE and MCE.

MLE The Baum-Welch algorithm uses it to do soft alignment of the training utterance and assigns the aligned speech to the hidden states to adjust the model’s parameters in an iterative procedure.

MCE MCE aims to minimize the number of smoothed empirical errors. This discriminative method makes use of a competing model’s additional information to train a model. Misclassification measure:

MCE(cont.) Subscript cor and com indicate that the state this component belongs to locates in correct and competing state sequence respectively. θ(O) indicates that normalized speech O is weighted by a function L.

MCE(cont.) If the steepness parameter r is very large, the curve of L(d(Y )) is sharp. It will only select those utterances that locate in the score boundary area in order to tune the model parameters.

MCE(cont.) Implementation details for the baseline MCE: (1) Average frame misclassification measure d(Y )/T is used instead of d(Y ) to help the tuning process. (2) We use a chopped sigmoid (3) Variable learning rates

I-smooth Method on MCE I-smoothing means increasing the number of Gaussian occupancies, but keeping invariant the average data values and average squared data values. They propose a different strategy to deal with the problem of over-training in that the interpolation is applied between the occupancies obtained in MCE with different steepness parameters rather than interpolation with MLE.

I-smooth Method on MCE(cont.) They use misclassification measure to define another data selection criteria: The selected data are close to the boundary area and may cause some confusion during training; as such, the selected speech data is referred to as confusion data.

I-smooth Method on MCE(cont.) (1) In theory, the boundary data define a more accurate score boundary, but because of the lack of availability, it may not represent the true score boundary data distribution. (2) On the other side, Conf_OCC is much larger than Bound_OCC, however, the confusion data define a less accurate score boundary.

I-smooth Method on MCE(cont.) (1) If a component has Bound_OCC larger than a certain threshold, we assume its assigned boundary data could reflect the true boundary data distribution and does not need to be interpolated. (2) Assume Conf_OCC is less affected by some random factor and the ratio between Bound_OCC and Conf_OCC should locate in a certain range for each Gaussian component

I-smooth Method on MCE(cont.)

Experimental results

Conclusions In this paper they propose a method for applying interpolation between the boundary data and the confusion data, aimed to alleviate the overfitting problem in MCE. They interpolate the boundary data with the confusion data when the number of the boundary occupancy is small and when the number ratio between them is small.