Download presentation
Presentation is loading. Please wait.
1
Hidden Markov Models
2
Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN:the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2.Estimation when the “right answer” is unknown Examples: GIVEN:the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION:Update the parameters of the model to maximize P(x| )
3
1.When the right answer is known Given x = x 1 …x N for which the true = 1 … N is known, Define: A kl = # times k l transition occurs in E k (b) = # times state k in emits b in x We can show that the maximum likelihood parameters are: A kl E k (b) a kl = ––––– e k (b) = ––––––– i A ki c E k (c)
4
2.When the right answer is unknown We don’t know the true A kl, E k (b) Idea: We estimate our “best guess” on what A kl, E k (b) are We update the parameters of the model, based on our guess We repeat
5
2.When the right answer is unknown Starting with our best guess of a model M, parameters : Given x = x 1 …x N for which the true = 1 … N is unknown, We can get to a provably more likely parameter set Principle: EXPECTATION MAXIMIZATION 1.Estimate A kl, E k (b) in the training data 2.Update according to A kl, E k (b) 3.Repeat 1 & 2, until convergence
6
Estimating new parameters To estimate A kl : (assume | , in all formulas below) At each position i of sequence x, find probability transition k l is used: P( i = k, i+1 = l | x) = [1/P(x)] P( i = k, i+1 = l, x 1 …x N ) = Q/P(x) where Q = P(x 1 …x i, i = k, i+1 = l, x i+1 …x N ) = = P( i+1 = l, x i+1 …x N | i = k) P(x 1 …x i, i = k) = = P( i+1 = l, x i+1 x i+2 …x N | i = k) f k (i) = = P(x i+2 …x N | i+1 = l) P(x i+1 | i+1 = l) P( i+1 = l | i = k) f k (i) = = b l (i+1) e l (x i+1 ) a kl f k (i) f k (i) a kl e l (x i+1 ) b l (i+1) So: P( i = k, i+1 = l | x, ) = –––––––––––––––––– P(x | )
7
Estimating new parameters So, f k (i) a kl e l (x i+1 ) b l (i+1) A kl = i P( i = k, i+1 = l | x, ) = i ––––––––––––––––– P(x | ) Similarly, E k (b) = [1/P(x | )] {i | xi = b} f k (i) b k (i) kl x i+1 a kl e l (x i+1 ) b l (i+1)f k (i) x 1 ………x i-1 x i+2 ………x N xixi
8
The Baum-Welch Algorithm Initialization: Pick the best-guess for model parameters (or arbitrary) Iteration: 1.Forward 2.Backward 3.Calculate A kl, E k (b) 4.Calculate new model parameters a kl, e k (b) 5.Calculate new log-likelihood P(x | ) GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION Until P(x | ) does not change much
9
The Baum-Welch Algorithm Time Complexity: # iterations O(K 2 N) Guaranteed to increase the log likelihood P(x | ) Not guaranteed to find globally best parameters Converges to local optimum, depending on initial conditions Too many parameters / too large model:Overtraining
10
Alternative: Viterbi Training Initialization:Same Iteration: 1.Perform Viterbi, to find * 2.Calculate A kl, E k (b) according to * + pseudocounts 3.Calculate the new parameters a kl, e k (b) Until convergence Notes: Convergence is guaranteed – Why? Does not maximize P(x | ) Maximizes P(x, | , * ) In general, worse performance than Baum-Welch
11
Variants of HMMs
12
Higher-order HMMs How do we model “memory” larger than one time point? P( i+1 = l | i = k)a kl P( i+1 = l | i = k, i -1 = j)a jkl … A second order HMM with K states is equivalent to a first order HMM with K 2 states state Hstate T a HT (prev = H) a HT (prev = T) a TH (prev = H) a TH (prev = T) state HHstate HT state THstate TT a HHT a TTH a HTT a THH a THT a HTH
13
Modeling the Duration of States Length distribution of region X: E[l X ] = 1/(1-p) Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions XY 1-p 1-q pq
14
Sol’n 1: Chain several states XY 1-p 1-q p q X X Disadvantage: Still very inflexible l X = C + geometric with mean 1/(1-p)
15
Sol’n 2: Negative binomial distribution Duration in X: m turns, where During first m – 1 turns, exactly n – 1 arrows to next state are followed During m th turn, an arrow to next state is followed m – 1 P(l X = m) = n – 1 (1 – p) n-1+1 p (m-1)-(n-1) = n – 1 (1 – p) n p m-n X p X X p 1 – p p …… Y 1 – p
16
Example: genes in prokaryotes EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A Negative binomial with n = 3
17
Solution 3:Duration modeling Upon entering a state: 1.Choose duration d, according to probability distribution 2.Generate d letters according to emission probs 3.Take a transition to next state according to transition probs Disadvantage: Increase in complexity: Time: O(D 2 ) Space: O(D) where D = maximum duration of state X
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.