Download presentation
Presentation is loading. Please wait.
Published byPhilomena Roberts Modified over 9 years ago
1
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis
2
Parameter Estimation Bayesian Decision Theory allows us to design an optimal classifier given that we know P( i ) and p(x/ i ): Estimating P( i ) is usually not difficult. Estimating p(x/ i ) is more difficult: – Number of samples is often too small – Dimensionality of feature space is large.
3
Assumptions – A set of training samples D ={x 1, x 2,...., x n }, where the samples were drawn according to p(x| j ). – p(x| j ) has some known parametric form: Parameter estimation problem: Parameter Estimation (cont’d) Given D, find the best possible also denoted as p(x / ) where =(μ i, Σ i ) e.g., p(x / i ) ~ N(μ i, i )
4
Main Methods in Parameter Estimation Maximum Likelihood (ML) Bayesian Estimation (BE)
5
Main Methods in Parameter Estimation Maximum Likelihood (ML) – Assumes that the values of the parameters are fixed but unknown. – Best estimate is obtained by maximizing the probability of obtaining the samples x 1,x 2,..,x n actually observed (i.e., training data):
6
Main Methods in Parameter Estimation (cont’d) Bayesian Estimation (BE) – Assumes that the parameters θ are random variables that have some known a-priori distribution p(θ . – Estimates a distribution rather than making point estimates like ML: Note: the BE solution might not be of the parametric form assumed!
7
ML Estimation - Assumptions Let us assume c classes and that the training data consists of c sets (i.e., one for each class): Samples in D j have been drawn independently according to p(x/ω j ). p(x/ω j ) has known parametric form with parameters j : e.g., j =(μ j, Σ j ) for Gaussian distribution D 1, D 2,...,D c
8
ML Estimation - Problem Formulation and Solution Problem: given D 1, D 2,...,D c and a model for each class, estimate If the samples in D j give no information about i ( ), we need to solve c independent problems (i.e., one for each class) The ML estimate for D={x 1,x 2,..,x n } is the value that maximizes p(D / ) (i.e., best supports the training data). 1, 2,…, c (using independence assumption)
9
ML Estimation - Problem Definition and Solution (cont’d) How should we find the maximum of p(D/ ) ? where
10
ML Estimation Using Log-Likelihood Consider the log-likelihood for simplicity: The solution maximizes ln p(D/ θ)
11
ML Estimation Using Log-Likelihood (cont’d) ln p(D/ θ) p(D / θ) =μ=μ=μ=μ training data, unknown mean, known variance
12
ML for Multivariate Gaussian Density: Case of Unknown θ=μ Consider Computing the gradient, we have
13
Setting we have: The solution is given by The ML estimate is simply the “sample mean”. ML for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d)
14
Special Case of ML: Maximum A-Posteriori Estimator (MAP) Assume that θ is a random variable with known p(θ). Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ): Consider:
15
Special Case of ML: Maximum A-Posteriori Estimator (MAP) What happens when p(θ) is uniform? MAP is equivalent to ML
16
MAP for Multivariate Gaussian Density: Case of Unknown θ=μ Assume MAP maximizes ln p(D/ μ)p(μ): maximize where (known)
17
MAP for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d) If, then What happens when
18
ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ 2 ) Assume p(x k /θ) θ =(θ 1,θ 2 )=(μ,σ 2 )
19
ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ 2 ) (cont’d) =0 p(x k /θ)=0 The solutions are given by: =0 sample mean sample variance
20
ML for Multivariate Gaussian Density: Case of Unknown θ=(μ,Σ) In the general case (i.e., multivariate Gaussian) the solutions are: sample mean sample covariance
21
Biased and Unbiased Estimates An estimate is unbiased when where θ is the true value. The ML estimate is unbiased, i.e., The ML estimate and is biased:
22
Biased and Unbiased Estimates (cont’d) The following are unbiased estimates of and
23
Comments about ML ML estimation is usually simpler than alternative methods. It provides more accurate estimates as the number of training samples increases. If the model chosen for p(x/ θ) is correct, and independence assumptions among samples are true, ML will give very good results.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.