Download presentation
Presentation is loading. Please wait.
1
. Class 5: Hidden Markov Models
2
Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that positions are independent l This means that the order of elements in the sequence did not play a role u In this class we learn about probabilistic models of sequences
3
Probability of Sequences Fix an alphabet Let X 1,…,X n be a sequence of random variables over We want to model P(X 1,…,X n )
4
Markov Chains Assumption: X i+1 is independent of the past once we know X i This allows us to write:
5
Markov Chains (cont) Assumption: P(X i+1 |X i ) is the same for all i Notation P(X i+1 =b |X i =a ) = A ab By specifying the matrix A and initial probabilities, we define P(X 1,…,X n ) To avoid the special case of P(X 1 ), we can use a special start state, and denote P(X 1 = a) = A sa
6
Example: CpG islands In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high chance) mutate to a T Promotor regions are CpG rich l These regions are not methylated, and thus mutate less often These are called CpG islands
7
CpG Islands u We construct Markov chain for CpG rich and poor regions u Using maximum likelihood estimates from 60K nucleotide, we get two models
8
Ratio Test for CpG islands Given a sequence X 1,…,X n we compute the likelihood ratio
9
Empirical Evalation
10
Finding CpG islands Simple Minded approach: Pick a window of size N ( N = 100, for example) u Compute log-ratio for the sequence in the window, and classify based on that Problems: How do we select N ? u What do we do when the window intersects the boundary of a CpG island?
11
Alternative Approach u Build a model that include “+” states and “-” states u A state “remembers” last nucleotide and the type of region u A transition from a - state to a + describes a start of CpG island
12
Hidden Markov Models Two components: A Markov chain of hidden states H 1,…,H n with L values P(H i+1 =k |H i =l ) = A kl Observations X 1,…,X n Assumption: X i depends only on hidden state H i l P(X i =a |H i =k ) = B ka
13
Semantics
14
Example: Dishonest Casino
15
Computing Most Probable Sequence Given: x 1,…,x n Output: h* 1,…,h* n such that
16
Idea: If we know the value of h i, then the most probable sequence on i+1,…,n does not depend on observations before time i Let V i (l) be the probability of the best sequence h 1,…,h i such that h i = l
17
Dynamic Programming Rule u so
18
Viterbi Algorithm Set V 0 (0) = 1, V 0 (l) = 0 for l > 0 for i = 1, …, n l for l = 1,…,L H set Let h* n = argmax l V n (l) for i = n-1,…,1 set h* i = P i+1 (h* i+1 )
19
Computing Probabilities Given: x 1,…,x n Output: P(x* 1,…,x* n ) How do we sum of exponential number of hidden sequences?
20
Forward Algorithm u Perform dynamic programming on sequences Let f i (l) = P(x 1,…,x i,H i =l) u Recursion rule: u Conclusion
21
Backward Algorithm u Perform dynamic programming on sequences Let b i (l) = P(x i+1,…,x n |H i =l) u Recursion rule: u Conclusion
22
Computing Posteriors How do we compute P(H i | x 1,…,x n ) ?
23
Dishonest Casino (again) u Computing posterior probabilities for “fair” at each point in a long sequence:
24
Learning Given a sequence x 1,…,x n, h 1,…,h n How do we learn A kl and B ka ? We want to find parameters that maximize the likelihood P(x 1,…,x n, h 1,…,h n ) We simply count: N kl - number of times h i =k & h i+1 =l N ka - number of times h i =k & x i = a
25
Learning Given only sequence x 1,…,x n How do we learn A kl and B ka ? We want to find parameters that maximize the likelihood P(x 1,…,x n ) Problem: Counts are inaccessible since we do not observe h i
26
If we have A kl and B ka we can compute
27
Expected Counts We can compute expected number of times h i =k & h i+1 =l u Similarly
28
Expectation Maximization (EM) Choose A kl and B ka E-step: Compute expected counts E[N kl ], E[N ka ] M-Step: u Restimate: u Reiterate
29
EM - basic properties P(x 1,…,x n: A kl, B ka ) P(x 1,…,x n: A’ kl, B’ ka ) l Likelihood grows in each iteration If P(x 1,…,x n: A kl, B ka ) = P(x 1,…,x n: A’ kl, B’ ka ) then A kl, B ka is a stationary point of the likelihood l either a local maxima, minima, or saddle point
30
Complexity of E-step u Compute forward and backward messages Time & Space complexity: O(nL) u Accumulate expected counts Time complexity O(nL 2 ) Space complexity O(L 2 )
31
EM - problems Local Maxima: u Learning can get stuck in local maxima u Sensitive to initialization u Require some method for escaping such maxima Choosing L u We often do not know how many hidden values we should have or can learn
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.