Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Similar presentations


Presentation on theme: "Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao."— Presentation transcript:

1 Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

2 Discriminative Training 2

3 Our Concerns  Feature extraction and HMM modeling should be jointly performed.  Common objective function should be considered.  To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory.  Model parameters should be calculated rapidly without applying descent algorithm. 3

4  MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications.  Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors.  Gradient descent algorithm was used to estimate HMM parameters. Minimum Classification Error (MCE) 4

5  Procedure of training discriminative models using observations X  Discriminant function  Anti-discriminant function  Misclassification measure MCE Training Procedure 5

6  Loss function is calculated by mapping into a range between zero to one through a sigmoid function.  Minimize the expected loss or classification error to find discriminative model. Expected Loss 6

7 Hypothesis Test 7

8  New training criterion was derived from hypothesis test theory.  We are testing null hypothesis against alternative hypothesis.  Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma  Higher likelihood ratio imply stronger confidence towards accepting null hypothesis. Likelihood Ratio Test 8

9  Null and alternative hypotheses : Observations X are from target HMM state j : Observation X are not from target HMM state j  We develop discriminative HMM parameters for target state against non-target states.  Problem turns out to verify the goodness of data alignment to the corresponding HMM states. Hypotheses in HMM Training 9

10 Maximum Confidence Hidden Markov Model 10

11  MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix Maximum Confidence HMM 11

12  Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation  E-step Hybrid Parameter Estimation 12

13 Expectation Function 13

14 MC Estimates of HMM Parameters 14

15 MC Estimates of HMM Parameters 15

16 MC Estimate of Transformation Matrix 16

17 17

18 MC Classification Rule  Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y 18

19 Summary  A new maximum confidence HMM framework was proposed.  Hypothesis test principle was used for building training criterion.  Discriminative feature extraction and HMM modeling were performed under the same criterion.  “ Maximum Confidence Hidden Markov Modeling for Face Recognition” Chien, Jen-Tzung; Liao, Chih-Pin; Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 30, Issue 4, April 2008 Page(s):606 – 616 Pattern Analysis and Machine Intelligence, IEEE Transactions onIssue 4 19

20 Machine Learning Approaches 20

21 Introduction  Conditional Random Fields (CRF)  relax the normal conditional independence assumption of the likelihood model  enforce the homogeneity of labeling variables conditioned on the observation  Due to the weak assumptions of CRF model and its discriminative nature  allows arbitrary relationship among data  may require less resources to train its parameters 21

22  Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs)  language and text processing problem  Object recognition problems  Image and video segmentation  tracking problem in video sequences 22

23 Generative & Discriminative Model 23

24 Two Classes of Models 24  Generative model (HMM) - model the distribution of states  Direct model (MEMM and CRF) - model the posterior probability directly MEMMCRF

25 Comparisons of Two Kinds of Model 25  Generative model – HMM  Use Bayesian rule approximation  Assume that observations are independent  Multiple overlapping features are not modeled  Model is estimated through recursive Viterbi algorithm

26  Direct model - MEMM and CRF  Direct modeling of posterior probability  Dependencies of observations are flexibly modeled  Model is estimated through recursive Viterbi algorithm 26

27 Hidden Markov Model & Maximum Entropy Markov Model 27

28 HMM for Human Motion Recognition  HMM is defined by  Transition probability  Observation probability 28

29 Maximum Entropy Markov Model 29  MEMM is defined by  is used to replace transition and observation probability in HMM model

30 Maximum Entropy Criterion 30  Definition of feature functions where  Constrained optimization problem where empirical expectation model expectation

31 Solution of MEMM  Lagrange multipliers are used for constrained optimization where are the model parameters  Solution is obtained by 31

32 GIS Algorithm  Optimize the Maxmimum Mutual Information Criterion (MMI)  Step1: Calculate the empirical expectation  Step2: Start from an initial value  Step3: Calculate the model expectation  Step4: Update model parameters  Repeat step 3 and 4 until convergence 32

33 Conditional Random Field 33

34 Conditional Random Field 34  Definition Let be a graph such that. When conditioned on, and obeyed the Markov property Then, is a conditional random field

35 CRF Model Parameters  The undirected graphical structure can be used to factorize into a normalized product of potential functions  Consider the graph as a linear-chain structure  Model parameter set  Feature function set 35

36 CRF Parameter Estimation 36  We can rewrite and maximize the posterior probability where and  Log posterior probability is given by

37 Parameter Updating by GIS Algorithm 37  Differentiating the log posterior probability with respect to parameter  Setting this derivative to zero yields the constraint in maximum entropy model  This estimation has no closed-form solution. We can use GIS algorithm.

38 CRFMEMM DifferenceObjective FunctionMax. posterior probability with Gibbs distribution Max. entropy under constrain Complexity of calculating normalization term Full DP N-Best Top One Inference in model SimilarityFeature functionState & observation State & state ParameterWeight of feature function DistributionGibbs distribution 38

39 Summary and Future works 39  We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied.  In the future, the variational inference algorithm will be developed for improving calculation of conditional probability.  The posterior probability can be calculated directly by a approximating approach.  “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung; ICASSP 2008. IEEE International Conference on March 31 2008- April 4 2008 Page(s):1969 - 1972 ICASSP 2008. IEEE International Conference on

40 Thanks for your attention and Discussion 40


Download ppt "Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao."

Similar presentations


Ads by Google