Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Discriminative Training 2

Our Concerns  Feature extraction and HMM modeling should be jointly performed.  Common objective function should be considered.  To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory.  Model parameters should be calculated rapidly without applying descent algorithm. 3

 MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications.  Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors.  Gradient descent algorithm was used to estimate HMM parameters. Minimum Classification Error (MCE) 4

 Procedure of training discriminative models using observations X  Discriminant function  Anti-discriminant function  Misclassification measure MCE Training Procedure 5

 Loss function is calculated by mapping into a range between zero to one through a sigmoid function.  Minimize the expected loss or classification error to find discriminative model. Expected Loss 6

Hypothesis Test 7

 New training criterion was derived from hypothesis test theory.  We are testing null hypothesis against alternative hypothesis.  Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma  Higher likelihood ratio imply stronger confidence towards accepting null hypothesis. Likelihood Ratio Test 8

 Null and alternative hypotheses : Observations X are from target HMM state j : Observation X are not from target HMM state j  We develop discriminative HMM parameters for target state against non-target states.  Problem turns out to verify the goodness of data alignment to the corresponding HMM states. Hypotheses in HMM Training 9

Maximum Confidence Hidden Markov Model 10

 MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix Maximum Confidence HMM 11

 Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation  E-step Hybrid Parameter Estimation 12

Expectation Function 13

MC Estimates of HMM Parameters 14

MC Estimates of HMM Parameters 15

MC Estimate of Transformation Matrix 16

MC Classification Rule  Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y 18

Summary  A new maximum confidence HMM framework was proposed.  Hypothesis test principle was used for building training criterion.  Discriminative feature extraction and HMM modeling were performed under the same criterion.  “ Maximum Confidence Hidden Markov Modeling for Face Recognition” Chien, Jen-Tzung; Liao, Chih-Pin; Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 30, Issue 4, April 2008 Page(s):606 – 616 Pattern Analysis and Machine Intelligence, IEEE Transactions onIssue 4 19

Machine Learning Approaches 20

Introduction  Conditional Random Fields (CRF)  relax the normal conditional independence assumption of the likelihood model  enforce the homogeneity of labeling variables conditioned on the observation  Due to the weak assumptions of CRF model and its discriminative nature  allows arbitrary relationship among data  may require less resources to train its parameters 21

 Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs)  language and text processing problem  Object recognition problems  Image and video segmentation  tracking problem in video sequences 22

Generative & Discriminative Model 23

Two Classes of Models 24  Generative model (HMM) - model the distribution of states  Direct model (MEMM and CRF) - model the posterior probability directly MEMMCRF

Comparisons of Two Kinds of Model 25  Generative model – HMM  Use Bayesian rule approximation  Assume that observations are independent  Multiple overlapping features are not modeled  Model is estimated through recursive Viterbi algorithm

 Direct model - MEMM and CRF  Direct modeling of posterior probability  Dependencies of observations are flexibly modeled  Model is estimated through recursive Viterbi algorithm 26

Hidden Markov Model & Maximum Entropy Markov Model 27

HMM for Human Motion Recognition  HMM is defined by  Transition probability  Observation probability 28

Maximum Entropy Markov Model 29  MEMM is defined by  is used to replace transition and observation probability in HMM model

Maximum Entropy Criterion 30  Definition of feature functions where  Constrained optimization problem where empirical expectation model expectation

Solution of MEMM  Lagrange multipliers are used for constrained optimization where are the model parameters  Solution is obtained by 31

GIS Algorithm  Optimize the Maxmimum Mutual Information Criterion (MMI)  Step1: Calculate the empirical expectation  Step2: Start from an initial value  Step3: Calculate the model expectation  Step4: Update model parameters  Repeat step 3 and 4 until convergence 32

Conditional Random Field 33

Conditional Random Field 34  Definition Let be a graph such that. When conditioned on, and obeyed the Markov property Then, is a conditional random field

CRF Model Parameters  The undirected graphical structure can be used to factorize into a normalized product of potential functions  Consider the graph as a linear-chain structure  Model parameter set  Feature function set 35

CRF Parameter Estimation 36  We can rewrite and maximize the posterior probability where and  Log posterior probability is given by

Parameter Updating by GIS Algorithm 37  Differentiating the log posterior probability with respect to parameter  Setting this derivative to zero yields the constraint in maximum entropy model  This estimation has no closed-form solution. We can use GIS algorithm.

CRFMEMM DifferenceObjective FunctionMax. posterior probability with Gibbs distribution Max. entropy under constrain Complexity of calculating normalization term Full DP N-Best Top One Inference in model SimilarityFeature functionState & observation State & state ParameterWeight of feature function DistributionGibbs distribution 38

Summary and Future works 39  We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied.  In the future, the variational inference algorithm will be developed for improving calculation of conditional probability.  The posterior probability can be calculated directly by a approximating approach.  “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung; ICASSP 2008. IEEE International Conference on March 31 2008- April 4 2008 Page(s):1969 - 1972 ICASSP 2008. IEEE International Conference on

Thanks for your attention and Discussion 40

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Similar presentations

Presentation on theme: "Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Similar presentations

Presentation on theme: "Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao."— Presentation transcript:

Similar presentations

About project

Feedback