Discriminative Training

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
ECG Signal processing (2)
An Introduction of Support Vector Machine
Support Vector Machines
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Present by: Fang-Hui Chu A Survey of Large Margin Hidden Markov Model Xinwei Li, Hui Jiang York University.
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Speaker Adaptation for Vowel Classification
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.
Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.
Topic 7 Support Vector Machine for Classification.
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
An Introduction to Support Vector Machines Martin Law.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Introduction to Support Vector Machines (M. Law)
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.
I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Ch 5b: Discriminative Training (temporal model) Ilkka Aho.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Neural networks and support vector machines
Lecture 15. Pattern Classification (I): Statistical Formulation
Deep Feedforward Networks
Large Margin classifiers
ECE 5424: Introduction to Machine Learning
Dan Roth Department of Computer and Information Science
Adversarial Learning for Neural Dialogue Generation
Jan Rupnik Jozef Stefan Institute
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
An Introduction to Support Vector Machines
Pawan Lingras and Cory Butz
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Tomasz Maszczyk and Włodzisław Duch Department of Informatics,
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Week 3.
Pattern Recognition and Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
LECTURE 23: INFORMATION THEORY REVIEW
Concave Minimization for Support Vector Machine Classifiers
Linear Discrimination
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Presentation transcript:

Discriminative Training ICSLP 2006

Papers Soft Margin Estimation of Hidden Markov Model Parameters Jinyu Li, Ming Yuan, Chin-Hui Lee Georgia Institute of Technology

Outline Introduction Soft margin estimation Comparison with LME and MCE Experiments Conclusions

Introduction If the training set matches well with the testing set, these discriminative training methods usually achieve very good performance in testing However, such a good match can not always be expected for most practical pattern recognition problems The power to deal with possible mismatches between the training and testing conditions can often be measured by the generalization ability of the machine learning algorithms

Introduction Large margin classification tools, such as support vector machine have demonstrated superior generalization ability over other conventional classifier learning algorithms It is well known that misclassified samples are also critical for training classifier. In SVM learning, for the real inseparable cases, the misclassified samples are used to define a penalty, and a soft margin is found by minimizing a penalized objective function

Introduction LME ignores the misclassified samples, and the separation margin it achieved is often hard to be justified as a true margin for good generalization The SME method defines a unified objective function to integrate frame selection, sample selection and discriminative separation in a flexible framework

Soft margin estimation SVM solve the following optimization problem:

Soft margin estimation In order to get good generalization, we try to maximize the separation between different models For every utterance, we need to define a separation measure, and impose a margin on those separation errors A common choice is to use log likelihood ratio (LLR):

Soft margin estimation Frame selection For every utterance, we select the frames that have different model label in the target and competitor string. Only those frames can provide discriminative information for models

Soft margin estimation SME objective function and sample selection where U is the set of utterances that have the separation measure less than the soft margin

Soft margin estimation SME focus on difficult samples, which often have a tendency to be misclassified In this study, we search for a sub-optimal solution. First, we choose a margin heuristically. Because of a fixed ,we only need to consider the samples with separation smaller than the margin: Now, this problem can be solved by GPD:

Comparison with LME and MCE SME v.s LME LME often needs a very good preliminary estimation from the training set to make the influence of ignoring misclassified sample small. SME works on all training data, both the correctly classified and misclassified samples SME v.s MCE SME normalizes LLR while MCE not MCE does not update parameters with those far away misclassified samples. Ignoring these difficult samples may restrict the learning methods from finding optimal model parameters

Experiments

Conclusions By combining the advantages in SVM and MCE it directly maximizes the separation of the competing models to enhance the testing samples to approach a correct decision if the deviation from training models is within a safe margin