Presentation is loading. Please wait.

Presentation is loading. Please wait.

240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

Similar presentations


Presentation on theme: "240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri."— Presentation transcript:

1 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha montri@coe.psu.ac.th http://fivedots.coe.psu. ac.th/~montri 240-650 Principles of Pattern Recognition

2 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 2 Chapter 3 Maximum-Likelihood and Bayesian Parameter Estimation

3 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 3 Introduction We could design an optimum classifier if we know P(  i ) and p(x|  i ) We rarely have knowledge about the probabilistic structure of the problem We often estimate P(  i ) and p(x|  i ) from training data or design samples

4 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 4 Maximum-Likelihood Estimation ML Estimation Always have good convergence properties as the number of training samples increases Simpler that other methods

5 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 5 The General Principle Suppose we separate a collection of samples according to class so that we have c data sets, D 1, …, D c with the samples in D j having been drawn independently according to the probability law p(x|  j ) We say such samples are i.i.d.– independently and identically distributed random variable

6 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 6 The General Principle We assume that p(x|  j ) has a known parametric form and is determined uniquely by the value of a parameter vector  j For example We explicitly write p(x|  j ) as p(x|  j,  j )

7 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 7 Problem Statement To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors  1,…  c associated with each category

8 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 8 Simplified Problem Statement If samples in D i give no information about  j if i = j We now have c separated problems of the following form: To use a set D of training samples drawn independently from the probability density p(x|  ) to estimate the unknown vector .

9 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 9 Suppose that D contains n samples, x 1,…,x n. Then we have The Maximum-Likelihood estimate of  is the value of that maximizes p(D|  ) Likelihood of q with respect to the set of samples

10 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 10

11 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 11 Let  = (  1, …,  p ) t Let be the gradient operator

12 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 12 Log-Likelihood Function We define l (  ) as the log-likelihood function We can write our solution as

13 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 13 MLE From We have And Necessary condition for MLE

14 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 14 The Gaussian Case: Unknown  Suppose that the samples are drawn from a multivariate normal population with mean  and covariance matrix  Let  is the only unknown Consider a sample point x k and find and

15 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 15 The MLE of  must satisfy After rearranging

16 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 16 Sample Mean The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean) If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud

17 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 17 The Gaussian Case: Unknown  and  This is a more typical case where mean and covariance matrix are unknown Consider the univariate case with  1 =  and  2 =  2

18 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 18 And its derivative is Set to 0 and

19 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 19 With a little rearranging, we have

20 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 20 MLE for multivariate case

21 240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 21 Bias The MLE for the variance  2 is biased The expected value over all data sets of size n of the sample variance is not equal to the true variance An Unbiased estimator for  is given by


Download ppt "240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri."

Similar presentations


Ads by Google