Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models

Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN:the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2.Estimation when the “right answer” is unknown Examples: GIVEN:the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION:Update the parameters  of the model to maximize P(x|  )

1.When the right answer is known Given x = x 1 …x N for which the true  =  1 …  N is known, Define: A kl = # times k  l transition occurs in  E k (b) = # times state k in  emits b in x We can show that the maximum likelihood parameters  are: A kl E k (b) a kl = ––––– e k (b) = –––––––  i A ki  c E k (c)

2.When the right answer is unknown We don’t know the true A kl, E k (b) Idea: We estimate our “best guess” on what A kl, E k (b) are We update the parameters of the model, based on our guess We repeat

2.When the right answer is unknown Starting with our best guess of a model M, parameters  : Given x = x 1 …x N for which the true  =  1 …  N is unknown, We can get to a provably more likely parameter set  Principle: EXPECTATION MAXIMIZATION 1.Estimate A kl, E k (b) in the training data 2.Update  according to A kl, E k (b) 3.Repeat 1 & 2, until convergence

Estimating new parameters To estimate A kl : (assume | , in all formulas below) At each position i of sequence x, find probability transition k  l is used: P(  i = k,  i+1 = l | x) = [1/P(x)]  P(  i = k,  i+1 = l, x 1 …x N ) = Q/P(x) where Q = P(x 1 …x i,  i = k,  i+1 = l, x i+1 …x N ) = = P(  i+1 = l, x i+1 …x N |  i = k) P(x 1 …x i,  i = k) = = P(  i+1 = l, x i+1 x i+2 …x N |  i = k) f k (i) = = P(x i+2 …x N |  i+1 = l) P(x i+1 |  i+1 = l) P(  i+1 = l |  i = k) f k (i) = = b l (i+1) e l (x i+1 ) a kl f k (i) f k (i) a kl e l (x i+1 ) b l (i+1) So: P(  i = k,  i+1 = l | x,  ) = –––––––––––––––––– P(x |  )

Estimating new parameters So, f k (i) a kl e l (x i+1 ) b l (i+1) A kl =  i P(  i = k,  i+1 = l | x,  ) =  i ––––––––––––––––– P(x |  ) Similarly, E k (b) = [1/P(x |  )]  {i | xi = b} f k (i) b k (i) kl x i+1 a kl e l (x i+1 ) b l (i+1)f k (i) x 1 ………x i-1 x i+2 ………x N xixi

The Baum-Welch Algorithm Initialization: Pick the best-guess for model parameters (or arbitrary) Iteration: 1.Forward 2.Backward 3.Calculate A kl, E k (b) 4.Calculate new model parameters a kl, e k (b) 5.Calculate new log-likelihood P(x |  ) GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATION Until P(x |  ) does not change much

The Baum-Welch Algorithm Time Complexity: # iterations  O(K 2 N) Guaranteed to increase the log likelihood P(x |  ) Not guaranteed to find globally best parameters Converges to local optimum, depending on initial conditions Too many parameters / too large model:Overtraining

Alternative: Viterbi Training Initialization:Same Iteration: 1.Perform Viterbi, to find  * 2.Calculate A kl, E k (b) according to  * + pseudocounts 3.Calculate the new parameters a kl, e k (b) Until convergence Notes:  Convergence is guaranteed – Why?  Does not maximize P(x |  )  Maximizes P(x, | ,  * )  In general, worse performance than Baum-Welch

Variants of HMMs

Higher-order HMMs How do we model “memory” larger than one time point? P(  i+1 = l |  i = k)a kl P(  i+1 = l |  i = k,  i -1 = j)a jkl … A second order HMM with K states is equivalent to a first order HMM with K 2 states state Hstate T a HT (prev = H) a HT (prev = T) a TH (prev = H) a TH (prev = T) state HHstate HT state THstate TT a HHT a TTH a HTT a THH a THT a HTH

Modeling the Duration of States Length distribution of region X: E[l X ] = 1/(1-p) Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions XY 1-p 1-q pq

Sol’n 1: Chain several states XY 1-p 1-q p q X X Disadvantage: Still very inflexible l X = C + geometric with mean 1/(1-p)

Sol’n 2: Negative binomial distribution Duration in X: m turns, where  During first m – 1 turns, exactly n – 1 arrows to next state are followed  During m th turn, an arrow to next state is followed m – 1 P(l X = m) = n – 1 (1 – p) n-1+1 p (m-1)-(n-1) = n – 1 (1 – p) n p m-n X p X X p 1 – p p …… Y 1 – p

Example: genes in prokaryotes EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A Negative binomial with n = 3

Solution 3:Duration modeling Upon entering a state: 1.Choose duration d, according to probability distribution 2.Generate d letters according to emission probs 3.Take a transition to next state according to transition probs Disadvantage: Increase in complexity: Time: O(D 2 ) Space: O(D) where D = maximum duration of state X

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Similar presentations

Presentation on theme: "Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Similar presentations

Presentation on theme: "Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where."— Presentation transcript:

Similar presentations

About project

Feedback