Download presentation
Presentation is loading. Please wait.
1
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher
2
Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) l Introduction l Maximum-Likelihood Estimation l Example of a Specific Case l The Gaussian Case: unknown and l Bias l Appendix: ML Problem Statement
3
Pattern Classification, Chapter 3 2 l Introduction l Data availability in a Bayesian framework l We could design an optimal classifier if we knew: l P( i ) (priors) l P(x | i ) (class-conditional densities) Unfortunately, we rarely have this complete information! l Design a classifier from a training sample l No problem with prior estimation l Samples are often too small for class-conditional estimation (large dimension of feature space!) 1
4
Pattern Classification, Chapter 3 3 l A priori information about the problem l Do we know something about the distribution? l find parameters to characterize the distribution l Example: Normality of P(x | i ) P(x | i ) ~ N( i, i ) l Characterized by 2 parameters l Estimation techniques l Maximum-Likelihood (ML) and the Bayesian estimations l Results are nearly identical, but the approaches are different 1
5
Pattern Classification, Chapter 3 4 l Parameters in ML estimation are fixed but unknown! l Best parameters are obtained by maximizing the probability of obtaining the samples observed l Bayesian methods view the parameters as random variables having some known distribution l In either approach, we use P( i | x) for our classification rule! 1
6
Pattern Classification, Chapter 3 5 l Maximum-Likelihood Estimation l Has good convergence properties as the sample size increases l Simpler than any other alternative techniques l General principle l Assume we have c classes and P(x | j ) ~ N( j, j ) P(x | j ) P (x | j, j ) where: 2
7
Pattern Classification, Chapter 3 6 l Use the information provided by the training samples to estimate = ( 1, 2, …, c ), each i (i = 1, 2, …, c) is associated with each category l Suppose that D contains n samples, x 1, x 2,…, x n l ML estimate of is, by definition the value that maximizes P(D | ) “It is the value of that best agrees with the actually observed training sample” 2
8
Pattern Classification, Chapter 3 7 2
9
8 l Optimal estimation l Let = ( 1, 2, …, p ) t and let be the gradient operator l We define l( ) as the log-likelihood function l( ) = ln P(D | ) (recall D is the training data) l New problem statement: determine that maximizes the log-likelihood 2
10
Pattern Classification, Chapter 3 9 The definition of l() is: and Set of necessary conditions for an optimum is: l = 0 (eq. 7) 2
11
Pattern Classification, Chapter 3 10 l Example, the Gaussian case: unknown l We assume we know the covariance l p(x i | ) ~ N( , ) (Samples are drawn from a multivariate normal population) = therefore: The ML estimate for must satisfy: 2
12
Pattern Classification, Chapter 3 11 Multiplying by and rearranging, we obtain: Just the arithmetic average of the samples of the training samples! Conclusion: If P(x k | j ) (j = 1, 2, …, c) is supposed to be Gaussian in a d- dimensional feature space; then we can estimate the vector = ( 1, 2, …, c ) t and perform an optimal classification! 2
13
Pattern Classification, Chapter 3 12 Example, Gaussian Case: unknown and l First consider univariate case: unknown and = ( 1, 2 ) = ( , 2 ) 2
14
Pattern Classification, Chapter 3 13 Summation (over the training set): Combining (1) and (2), one obtains: 2
15
Pattern Classification, Chapter 3 14 l The ML estimates for the multivariate case is similar The scalars and are replaced with vectors The variance 2 is replaced by the covariance matrix
16
Pattern Classification, Chapter 3 15 Bias l ML estimate for 2 is biased l Extreme case: n=1, E[ ] = 0 ≠ 2 l As the n increases the bias is reduced this type of estimator is called asymptotically unbiased 2
17
Pattern Classification, Chapter 3 16 l An elementary unbiased estimator for is: This estimator is unbiased for all distributions Such estimators are called absolutely unbiased 2
18
Pattern Classification, Chapter 3 17 l Our earlier estimator for is biased: In fact it is asymptotically unbiased: Observe that 2
19
Pattern Classification, Chapter 3 18 l Appendix: ML Problem Statement l Let D = {x 1, x 2, …, x n } P(x 1,…, x n | ) = 1,n P(x k | ); |D| = n Our goal is to determine (value of that maximizes the likelihood of this sample set!) 2
20
Pattern Classification, Chapter 3 19 |D| = n x1x1 x2x2 xnxn.................. x 11 x 20 x 10 x8x8 x9x9 x1x1 N( j, j ) = P(x j, 1 ) D1D1 DcDc DkDk P(x j | 1 ) P(x j | k ) 2
21
Pattern Classification, Chapter 3 20 = ( 1, 2, …, c ) Problem: find such that: 2
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.