Download presentation
Presentation is loading. Please wait.
1
Comp. Genomics Recitation 6 14/11/06 ML and EM
2
Outline Maximum likelihood estimation HMM Example EM
Baum-Welch algorithm
3
Maximum likelihood One of the methods for parameter estimation
Likelihood: L=P(Data|Parameters) Simple example: Simple coin with P(head)=p 10 coin tosses 6 heads, 4 tails L=P(Data|Params)=(106)p6 (1-p)4
4
Maximum likelihood We want to find p that maximizes L=(106)p6 (1-p)4
Infi 1, Remember? Log is a monotonically increasing function, we can optimize logL=log[(106)p6 (1-p)4]= log(106)+6logp+4log(1-p)] Deriving by p we get: 6/p-4/(1-p)=0 Estimate for p:0.6 (Makes sense?)
5
ML in Profile HMMs Transition Probabilities Emission probabilities
Mi Mi+1 Mi Di+1 Mi Ii Ii Mi+1 Ii Ii Di Di+1 Di Mi+1 Di Ii Emission probabilities Mi a Ii a
6
Parameter Estimation for HMMs
Input: X1,…,Xn independent training sequences Goal: estimation of = (A,E) (model parameters) Note: P(X1,…,Xn | ) = i=1…nP(Xi | ) (indep.) l(x1,…,xn | ) = log P(X1,…,Xn | ) = i=1…nlog P(Xi | ) Case 1 - Estimation When State Sequence is Known: Akl = #(occurred kl transitions) Ek(b) = #(emissions of symbol b that occurred in state k) Max. Likelihood Estimators: akl = Akl / l’Akl’ ek(b) = Ek(b) / b’Ek(b’) small sample, or prior knowledge correction: A’kl = Akl + rkl E’k(b) = Ek(b) + rk(b)
7
Example Suppose we are given the aligned sequences
**---* AG---C A-AT-C AG-AA- --AAAC AG---C Suppose also that the “match” positions are marked...
8
Calculating A, E count transitions and emissions: **---* transitions
AG---C A-AT-C AG-AA- --AAAC AG---C **---* transitions emissions
9
Calculating A, E count transitions and emissions: transitions **---*
AG---C A-AT-C AG-AA- --AAAC AG---C **---* emissions
10
Estimating Maximum Likelihood probabilities using Fractions
emissions
11
Estimating ML probabilities (contd)
transitions
12
EM - Mixture example Assume we are given heights of 100 individuals (men/women): y1…y100 We know that: The men’s heights are normally distributed with (μm,σm) The women’s heights are normally distributed with (μw,σw) If we knew the genders – estimation is “easy” (How?) What we don’t know the genders in our data! X1…,X100 are unknown P(w),P(m) are unknown
13
Mixture example Our goal: estimate the parameters (μm,σm), (μn,σn), p(m) A classic “estimation with missing data” (In an HMM: we know the emmissions, but not the states!) Expectation-Maximization (EM): Compute the “expected” gender for every sample height Estimate the parameters using ML Iterate
14
EM Widely used in machine learning
Using ML for parameter estimation at every iteration promises that the likelihood will consistently improve Eventually we’ll reach a local minima A good starting point is important
15
Mixture example If we have a mixture of M gaussians, each with a probability αi and density θi=(μm,σm) Likelihood the observations (X): The “incomplete-data” log-likelihood of the sample x1,…,xN: Difficult to estimate directly…
16
Mixture example Now we introduce y1,…,y100: hidden variables telling us what Gaussian every sample came from If we knew y, the likelihood would be: Of course, we do not know the ys… We’ll do EM, starting from θg=(α1g ,..,αMg, μ1g,..,μMg,σ1g,.., σMg)
17
Estimation Given θg, we can estimate the ys! We want to find:
The expectation is over the states of y Bayes rule: P(X|Y)=P(Y|X)P(X)/P(Y):
18
Estimation We write down the Q: Daunting?
19
Estimation Simplifying: Now the Q becomes:
20
Maximization Now we want to find parameter estimates, such that:
Infi 2, remember? To impose the constraint Sum{αi}=1, we introduce Lagrange multiplier λ: After summing both sides over l:
21
Maximization Estimating μig+1,σig+1 is more difficult
Out of scope here What turns out is actually quite straightforward:
22
What you need to know about EM:
When: If we want to estimate model parameters, and some of the data is “missing” Why: Maximizing likelihood directly is very difficult How: Initial guess of the parameters Finding a proper term for Q(θg, θg+1) Deriving and finding ML estimators
23
EM estimation in HMMs Input: X1,…,Xn independent training sequences
Baum-Welch alg. (1972): Expectation: compute expected # of kl state transitions: P(i=k, i+1=l | X, ) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1) Akl = j[1/P(Xj)] · i fkj(i) · akl · el(xji+1) · blj(i+1) compute expected # of symbol b appearances in state k Ek(b) = j[1/P(Xj)] · {i|xji=b} fkj(i) · bkj(i) (ex.) Maximization: re-compute new parameters from A, E using max. likelihood. repeat (1)+(2) until improvement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.