Download presentation
Presentation is loading. Please wait.
1
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
2
Estimation/Training Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class
3
Supervised-Unsupervised Supervised training: All data has been (manually) labeled, i.e., assigned to classes Unsupervised training: Data is not assigned a class label
4
Observable data Fully observed data: all information necessary for training is available (features, class labels etc.) Partially observed data: some of the features or some of the class labels are missing
5
Supervised Training (fully observable data) Maximum likelihood estimation (ML) Maximum a posteriori estimation (MAP) Bayesian estimation (BE)
6
Training process Collected data used for training consists of the following examples D = {x 1, x 2, … x N } Step 1: Label each example with the corresponding class label ω 1, ω 2,... ω Κ Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D 1, D 2..D K
7
Training Process: Step 1 D = {x 1, x 2, x 3, x 4, x 5, … x N } Label manually ω 1, ω 2,... ω Κ D 1 = {x 11, x 12, x 13, … x 1 N 1 } D 2 = {x 21, x 22, x 23, … x 2 N 2 } ………… D K = {x K 1, x K 2, x K 3, … x KN k }
8
Training Process: Step 2 Maximum Likelihood θ 1 = argmax Θ P(D 1 |θ 1 ) Maximum-a-posteriori θ 1 = argmax Θ P(D 1 |θ 1 ) P(θ 1 ) Bayesian estimation P (x|ω 1 ) = P(x| θ 1 )P( θ 1 |D 1 ) d θ 1
9
ML Estimation Assumptions 1.P(x|ω i ) follows a parametric distribution with parameters θ 2.D j tells us nothing about P(x|ω i ) (functional independence) 3.Observations x 1, x 2, x 3, … x N are iid (independent identically distributed 4a (ML only!) θ is a quantity whose value is fixed but unknown
10
ML estimation θ = argmax Θ P(θ|D) = argmax Θ P(D|θ) P(θ) = 4 argmax Θ P(D|θ) = argmax Θ P( x 1, x 2, … x N |θ) = 3 argmax Θ Π j P(x j |θ) => Π j P(x j |θ) / θ = 0 => θ = …
11
ML estimate for Gaussian pdf If P(x|ω) = Ν(μ,σ 2 ) and θ=(μ,σ 2 ) then 1-D μ = (1/Ν) Σ j=1..N x j σ 2 = (1/Ν) Σ j=1..N (x j – μ) 2 Multi-D : θ=( μ, Σ ) μ = (1/Ν) Σ j=1..N x j Σ = (1/Ν) Σ j=1..N (x j – μ) Τ (x j – μ)
12
Bayesian Estimat. Assumptions 1.P(x|ω i ) follows a parametric distribution with parameters θ 2.D j tells us nothing about P(x|ω i ) (functional independence) 3.Observations x 1, x 2, x 3, … x N are iid (independent identically distributed) 4b (MAP, BE) θ is a random variable whose prior distribution p(θ) is known
13
Bayesian Estimation P (x|D) = P(x,θ|D) dθ = P(x|θ,D)P(θ|D) dθ = P(x|θ)P(θ|D) dθ STEP 1: P(θ) P(θ|D) P(x|D) = P(D|θ)P(θ)/P(D) STEP 2: P(x|θ) P (x|D)
14
Bayesian Estimate for Gaussian pdf and priors If P(x|θ) = Ν(μ, σ 2 ) and p(θ) = Ν(μ 0, σ 0 2 ) then STEP 1: P(θ|D)=Ν(μ n, σ n 2 ) STEP 2: P(x|D)=N(μ n, σ 2 +σ n 2 ) μ n = σ 0 2 /(n σ 0 2 + σ 2 ) ( Σ j x j ) + σ 2 /(n σ 0 2 + σ 2 ) μ 0 σ n 2 = σ 2 σ 0 2 /(n σ 0 2 + σ 2 ) For large n (number of training samples) maximum likelihood and Bayesian estimation equivalent!!!
15
Conclusions Maximum likelihood estimation is simple and gives good estimates when the number of training samples is large Bayesian adaptation gives good estimates even for small amounts of training data provided that a good prior is selected Bayesian adaptation is hard and often does not have a closed form solution (in which case try: iterative recursive Bayesian estimation)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.