Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.

Similar presentations


Presentation on theme: "Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P."— Presentation transcript:

1 Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

2 Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) l Introduction l Maximum-Likelihood Estimation l Example of a Specific Case l The Gaussian Case: unknown  and  l Bias l Appendix: ML Problem Statement

3 Pattern Classification, Chapter 3 l Introduction l Data availability in a Bayesian framework l We could design an optimal classifier if we knew: l P(  i ) (priors) l P(x |  i ) (class-conditional densities) Unfortunately, we rarely have this complete information! l Design a classifier from a training sample l No problem with prior estimation l Samples are often too small for class-conditional estimation (large dimension of feature space!)

4 Pattern Classification, Chapter 3 l A priori information about the problem l Normality of P(x |  i ) P(x |  i ) ~ N(  i,  i ) l Characterized by 2 parameters l Estimation techniques l Maximum-Likelihood (ML) and the Bayesian estimations l Results are nearly identical, but the approaches are different

5 Pattern Classification, Chapter 3 l Parameters in ML estimation are fixed but unknown! l Best parameters are obtained by maximizing the probability of obtaining the samples observed l Bayesian methods view the parameters as random variables having some known distribution l In either approach, we use P(  i | x) for our classification rule!

6 Pattern Classification, Chapter 3 l Maximum-Likelihood Estimation l Has good convergence properties as the sample size increases l Simpler than any other alternative techniques l General principle l Assume we have c classes and P(x |  j ) ~ N(  j,  j ) P(x |  j )  P (x |  j,  j ) where:

7 Pattern Classification, Chapter 3 l Use the information provided by the training samples to estimate  = (  1,  2, …,  c ), each  i (i = 1, 2, …, c) is associated with each category l Suppose that D contains n samples, x 1, x 2,…, x n l ML estimate of  is, by definition the value that maximizes P(D |  ) “It is the value of  that best agrees with the actually observed training sample”

8 Pattern Classification, Chapter 3

9 l Optimal estimation l Let  = (  1,  2, …,  p ) t and let   be the gradient operator l We define l(  ) as the log-likelihood function l(  ) = ln P(D |  ) l New problem statement: determine  that maximizes the log-likelihood

10 Pattern Classification, Chapter 3 Set of necessary conditions for an optimum is:   l = 0

11 Pattern Classification, Chapter 3 l Example of a specific case: unknown  l P(x i |  ) ~ N( ,  ) (Samples are drawn from a multivariate normal population)  =  therefore: The ML estimate for  must satisfy:

12 Pattern Classification, Chapter 3 Multiplying by  and rearranging, we obtain: Just the arithmetic average of the samples of the training samples! Conclusion: If P(x k |  j ) (j = 1, 2, …, c) is supposed to be Gaussian in a d- dimensional feature space; then we can estimate the vector  = (  1,  2, …,  c ) t and perform an optimal classification!

13 Pattern Classification, Chapter 3 l ML Estimation: l Gaussian Case: unknown  and   = (  1,  2 ) = ( ,  2 )

14 Pattern Classification, Chapter 3 Summation: Combining (1) and (2), one obtains:

15 Pattern Classification, Chapter 3 l Bias l ML estimate for  2 is biased l An elementary unbiased estimator for  is:

16 Pattern Classification, Chapter 3 l Appendix: ML Problem Statement l Let D = {x 1, x 2, …, x n } P(x 1,…, x n |  ) =  1,n P(x k |  ); |D| = n Our goal is to determine (value of  that makes this sample the most representative!)

17 Pattern Classification, Chapter 3 |D| = n x1x1 x2x2 xnxn.................. x 11 x 20 x 10 x8x8 x9x9 x1x1 N(  j,  j ) = P(x j,  1 ) D1D1 DcDc DkDk P(x j |  1 ) P(x j |  k )

18 Pattern Classification, Chapter 3  = (  1,  2, …,  c ) Problem: find such that:


Download ppt "Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P."

Similar presentations


Ads by Google