Presentation is loading. Please wait.

Presentation is loading. Please wait.

PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Similar presentations


Presentation on theme: "PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."— Presentation transcript:

1 PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

2 Estimation/Training  Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class

3 Supervised-Unsupervised  Supervised training: All data has been (manually) labeled, i.e., assigned to classes  Unsupervised training: Data is not assigned a class label

4 Observable data  Fully observed data: all information necessary for training is available (features, class labels etc.)  Partially observed data: some of the features or some of the class labels are missing

5 Supervised Training (fully observable data)  Maximum likelihood estimation (ML)  Maximum a posteriori estimation (MAP)  Bayesian estimation (BE)

6 Training process  Collected data used for training consists of the following examples D = {x 1, x 2, … x N }  Step 1: Label each example with the corresponding class label ω 1, ω 2,... ω Κ  Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D 1, D 2..D K

7 Training Process: Step 1 D = {x 1, x 2, x 3, x 4, x 5, … x N } Label manually ω 1, ω 2,... ω Κ D 1 = {x 11, x 12, x 13, … x 1 N 1 } D 2 = {x 21, x 22, x 23, … x 2 N 2 } ………… D K = {x K 1, x K 2, x K 3, … x KN k }

8 Training Process: Step 2  Maximum Likelihood θ 1 = argmax Θ P(D 1 |θ 1 )  Maximum-a-posteriori θ 1 = argmax Θ P(D 1 |θ 1 ) P(θ 1 )  Bayesian estimation P (x|ω 1 ) =  P(x| θ 1 )P( θ 1 |D 1 ) d θ 1

9 ML Estimation Assumptions 1.P(x|ω i ) follows a parametric distribution with parameters θ 2.D j tells us nothing about P(x|ω i ) (functional independence) 3.Observations x 1, x 2, x 3, … x N are iid (independent identically distributed 4a (ML only!) θ is a quantity whose value is fixed but unknown

10 ML estimation θ = argmax Θ P(θ|D) = argmax Θ P(D|θ) P(θ) = 4 argmax Θ P(D|θ) = argmax Θ P( x 1, x 2, … x N |θ) = 3 argmax Θ Π j P(x j |θ) =>  Π j P(x j |θ) /  θ = 0 => θ = …

11 ML estimate for Gaussian pdf If P(x|ω) = Ν(μ,σ 2 ) and θ=(μ,σ 2 ) then 1-D μ = (1/Ν) Σ j=1..N x j σ 2 = (1/Ν) Σ j=1..N (x j – μ) 2 Multi-D : θ=( μ, Σ ) μ = (1/Ν) Σ j=1..N x j Σ = (1/Ν) Σ j=1..N (x j – μ) Τ (x j – μ)

12 Bayesian Estimat. Assumptions 1.P(x|ω i ) follows a parametric distribution with parameters θ 2.D j tells us nothing about P(x|ω i ) (functional independence) 3.Observations x 1, x 2, x 3, … x N are iid (independent identically distributed) 4b (MAP, BE) θ is a random variable whose prior distribution p(θ) is known

13 Bayesian Estimation P (x|D) =  P(x,θ|D) dθ =  P(x|θ,D)P(θ|D) dθ =  P(x|θ)P(θ|D) dθ STEP 1: P(θ)  P(θ|D) P(x|D) = P(D|θ)P(θ)/P(D) STEP 2: P(x|θ)  P (x|D)

14 Bayesian Estimate for Gaussian pdf and priors If P(x|θ) = Ν(μ, σ 2 ) and p(θ) = Ν(μ 0, σ 0 2 ) then STEP 1: P(θ|D)=Ν(μ n, σ n 2 ) STEP 2: P(x|D)=N(μ n, σ 2 +σ n 2 ) μ n = σ 0 2 /(n σ 0 2 + σ 2 ) ( Σ j x j ) + σ 2 /(n σ 0 2 + σ 2 ) μ 0 σ n 2 = σ 2 σ 0 2 /(n σ 0 2 + σ 2 ) For large n (number of training samples) maximum likelihood and Bayesian estimation equivalent!!!

15 Conclusions  Maximum likelihood estimation is simple and gives good estimates when the number of training samples is large  Bayesian adaptation gives good estimates even for small amounts of training data provided that a good prior is selected  Bayesian adaptation is hard and often does not have a closed form solution (in which case try: iterative recursive Bayesian estimation)


Download ppt "PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."

Similar presentations


Ads by Google