Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Model: Extension of Markov Chains
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter Two Probability Distributions: Discrete Variables
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability theory retro
Parameter Estimation 主講人:虞台文.
Pattern Classification, Chapter 3
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 07: BAYESIAN ESTIMATION
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) l Introduction l Maximum-Likelihood Estimation l Example of a Specific Case l The Gaussian Case: unknown  and  l Bias l Appendix: ML Problem Statement

Pattern Classification, Chapter 3 2 l Introduction l Data availability in a Bayesian framework l We could design an optimal classifier if we knew: l P(  i ) (priors) l P(x |  i ) (class-conditional densities) Unfortunately, we rarely have this complete information! l Design a classifier from a training sample l No problem with prior estimation l Samples are often too small for class-conditional estimation (large dimension of feature space!) 1

Pattern Classification, Chapter 3 3 l A priori information about the problem l Do we know something about the distribution? l  find parameters to characterize the distribution l Example: Normality of P(x |  i ) P(x |  i ) ~ N(  i,  i ) l Characterized by 2 parameters l Estimation techniques l Maximum-Likelihood (ML) and the Bayesian estimations l Results are nearly identical, but the approaches are different 1

Pattern Classification, Chapter 3 4 l Parameters in ML estimation are fixed but unknown! l Best parameters are obtained by maximizing the probability of obtaining the samples observed l Bayesian methods view the parameters as random variables having some known distribution l In either approach, we use P(  i | x) for our classification rule! 1

Pattern Classification, Chapter 3 5 l Maximum-Likelihood Estimation l Has good convergence properties as the sample size increases l Simpler than any other alternative techniques l General principle l Assume we have c classes and P(x |  j ) ~ N(  j,  j ) P(x |  j )  P (x |  j,  j ) where: 2

Pattern Classification, Chapter 3 6 l Use the information provided by the training samples to estimate  = (  1,  2, …,  c ), each  i (i = 1, 2, …, c) is associated with each category l Suppose that D contains n samples, x 1, x 2,…, x n l ML estimate of  is, by definition the value that maximizes P(D |  ) “It is the value of  that best agrees with the actually observed training sample” 2

Pattern Classification, Chapter 3 7 2

8 l Optimal estimation l Let  = (  1,  2, …,  p ) t and let   be the gradient operator l We define l(  ) as the log-likelihood function l(  ) = ln P(D |  ) (recall D is the training data) l New problem statement: determine  that maximizes the log-likelihood 2

Pattern Classification, Chapter 3 9 The definition of l() is: and Set of necessary conditions for an optimum is:   l = 0 (eq. 7) 2

Pattern Classification, Chapter 3 10 l Example, the Gaussian case: unknown  l We assume we know the covariance l p(x i |  ) ~ N( ,  ) (Samples are drawn from a multivariate normal population)  =  therefore: The ML estimate for  must satisfy: 2

Pattern Classification, Chapter 3 11 Multiplying by  and rearranging, we obtain: Just the arithmetic average of the samples of the training samples! Conclusion: If P(x k |  j ) (j = 1, 2, …, c) is supposed to be Gaussian in a d- dimensional feature space; then we can estimate the vector  = (  1,  2, …,  c ) t and perform an optimal classification! 2

Pattern Classification, Chapter 3 12 Example, Gaussian Case: unknown  and  l First consider univariate case: unknown  and   = (  1,  2 ) = ( ,  2 ) 2

Pattern Classification, Chapter 3 13 Summation (over the training set): Combining (1) and (2), one obtains: 2

Pattern Classification, Chapter 3 14 l The ML estimates for the multivariate case is similar The scalars  and  are replaced with vectors The variance  2 is replaced by the covariance matrix

Pattern Classification, Chapter 3 15 Bias l ML estimate for  2 is biased l Extreme case: n=1, E[ ] = 0 ≠  2 l As the n increases the bias is reduced  this type of estimator is called asymptotically unbiased 2

Pattern Classification, Chapter 3 16 l An elementary unbiased estimator for  is: This estimator is unbiased for all distributions  Such estimators are called absolutely unbiased 2

Pattern Classification, Chapter 3 17 l Our earlier estimator for  is biased: In fact it is asymptotically unbiased: Observe that 2

Pattern Classification, Chapter 3 18 l Appendix: ML Problem Statement l Let D = {x 1, x 2, …, x n } P(x 1,…, x n |  ) =  1,n P(x k |  ); |D| = n Our goal is to determine (value of  that maximizes the likelihood of this sample set!) 2

Pattern Classification, Chapter 3 19 |D| = n x1x1 x2x2 xnxn x 11 x 20 x 10 x8x8 x9x9 x1x1 N(  j,  j ) = P(x j,  1 ) D1D1 DcDc DkDk P(x j |  1 ) P(x j |  k ) 2

Pattern Classification, Chapter 3 20  = (  1,  2, …,  c ) Problem: find such that: 2