Download presentation
Presentation is loading. Please wait.
1
Hidden Markov Model Lecture #6
Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001. .
2
Use of Markov Chains in Genome search: Modeling CpG Islands
In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (-C-phosphate-G-).
3
Example: CpG Island (Cont.)
We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? We solved Question 1 by using two models for DNA strings: “+” model and “-” model, for strings with and without CpG islands. Each model was a Markov chain over the states {A,C,G,T} with appropriate transition probabilities.
4
CpG Island: Question 2 Question 2: Given a long piece of genomic data, does it contain CpG islands, and where? For solving this question, we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model.
5
Model for question 2 Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. A+ C+ G+ T+ A- C- G- T- Therefore we use here Hidden Markov Model, which we define and study next.
6
Hidden Markov Model M T HMM consists of:
SL-1 SL x1 x2 XL-1 xL M T HMM consists of: A Markov chain over a set of states, and for each state s and symbol x, an emission probability p(Xi=x|Si=s). Notations: Markov Chain transition probabilities: p(Si+1= t|Si = s) = mst Emission probabilities: p(Xi = b| Si = s) = es(b)
7
Hidden Markov Model S1 S2 SL-1 SL x1 x2 XL-1 xL M T The probability of the chain S and the emitted letters X is :
8
Probability distribution defined by HMM
Claim: Let M=(mst) and Es=(es(b)) be given stochastic matrices. Then for each fixed L>0, the function p defined by: is a probability distribution over all hidden Markov models of length L. That is:
9
Probability distribution defined by HMM
Proof by induction on L. For L=1:
10
Probability distribution defined by HMM
Induction step: Assume correctness for L, prove for L+1: 1 1 1 (by induction)
11
Independence properties in HMM
We would like that HMM will satisfy certain independence properties, e.g: The distribution of states Sk is completely determined by the identity of the preceding state sk-1, The distribution of transmitted letter Xk is completely determined by the transmitting state sk. In the next slides we will formally prove that 2. is implied by the probability distribution of HMM which we just defined.
12
Independence of emission probabilities
Claim: The following equality holds: p(Xk=xk|x1,..,xk-1,xk+1,..,xL,s1,..,sk,..,sL) = esk(xk) B A S1 SL x1 xL M T Sk Xk=?
13
Independence of emission probabilities
Proof of claim: We use the definition of conditional probability, P(A|B) = P(A,B)/P(B). Note: p(A,B) denotes p(AB). A is the event Xk=xk. (the k-th output is xk). B is the event which specifies all the sequence except Xk : (X1= x1,.., Xk-1= xk-1, Xk+1= xk+1,.., XL= xL,S1= s1, .., SL= sL). (A,B) is the event (X1= x1,.., XL= xL,S1= s1, .., SL= sL).
14
Independence of emission probabilities
Proof (cont) S1 SL x1 xL M T Sk Xk=?
15
Independence of emission probabilities
Proof (end): From the previous equalities we have: = p(A,B)/esk(xk) Thus we conclude: P(A|B) = P(A,B)/P(B) = esk(xk) QED
16
Independence of emission probabilities
SL-1 SL x1 x2 XL-1 xL M T Exercise: Using the definition of conditional probability: P(A|B) = P(A,B)/P(B), prove formally that for any set of constraints B: B {X1= x1,.., Xi-1= xi-1, Xi+1= xi+1,.., XL= xL,S1= s1,..,Si= si , .., SL= sL}, such that “Si= si” B, it holds that p(Xi=xi|B) = esi(xi) Hint: express the probabilities as sum of p(S,X) over all possible S and X.
17
Hidden Markov Model: three questions of interest
SL-1 SL x1 x2 XL-1 xL M T Given the “visible” sequence x=(x1,…,xL), find: A most probable (hidden) path. The probability of x. For each i = 1,..,L, and for each state k, the probability that si=k.
18
1. Most Probable state path
SL-1 SL x1 x2 XL-1 xL M T Given an output sequence x = (x1,…,xL), A most probable path s*= (s*1,…,s*L) is one which maximizes p(s|x).
19
Most Probable path (cont.)
SL-1 SL x1 x2 XL-1 xL M T Since we need to find s which maximizes p(s,x) We use a DP algorithm, called Viterbi’s algorithm, which we present next.
20
Viterbi’s algorithm for most probable path
X1 X2 si Xi The task: compute Let the states be {1,…,m} Idea: for i=1,…,L and for each state l, compute: vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l .
21
Viterbi’s algorithm for most probable path
Si-1 X1 Xi-1 l Xi ... vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable path up to i, which ends in state l . For i = 1,…,L and for each state l we have:
22
Viterbi’s algorithm Result: p(s1*,…,sL*;x1,…,xl) =
si sL-1 sL X1 X2 Xi XL-1 XL We add the special initial state 0. Initialization: v0(0) = 1 , vk(0) = 0 for k > 0 For i=1 to L do for each state l : vl(i) = el(xi) MAXk {vk(i-1)mkl } ptri(l)=argmaxk{vk(i-1)mkl} [storing previous state for reconstructing the path] Termination: Result: p(s1*,…,sL*;x1,…,xl) =
23
2. Computing p(x) S1 S2 SL-1 SL x1 x2 XL-1 xL M T Given an output sequence x = (x1,…,xL), compute the probability that this sequence was generated by the given HMM: The summation taken over all state-paths s generating x.
24
Forward algorithm for computing p(x)
The task: compute Idea: for i=1,…,L and for each state l, compute: Fl(i) = p(x1,…,xi;si=l ), the probability of all the paths which emit (x1,..,xi) and end in state si=l. Use the recursive formula: si ? ? X1 Xi-1 Xi
25
Forward algorithm for computing p(x)
s1 s2 si sL-1 sL X1 X2 Xi XL-1 XL Similar to Viterbi’s algorithm (use sum instead of maximum): Initialization: f0(0) := 1 , fk(0) := 0 for k>0 For i=1 to L do for each state l : Fl(i) = el(xi) ∑k Fk(i-1)mkl Result: p(x1,…,xL) =
26
3. The distribution of Si, given x
SL-1 SL x1 x2 XL-1 xL M T Given an output sequence x = (x1,…,xL), Compute for each i=1,…,l and for each state k the probability that si = k. This helps to reply queries like: what is the probability that si is in a CpG island, etc.
27
Solution in two stages s1 s2 sL-1 sL X1 X2 XL-1 XL si Xi For a fixed i and each state k, an algorithm to compute p(si=k | x1,…,xL). 2. An algorithm which perform this task for every i = 1,..,L, without repeating the first task L times.
28
Computing for a fixed i:
s1 s2 sL-1 sL X1 X2 XL-1 XL si Xi
29
Computing for a fixed i (cont.)
s1 s2 sL-1 sL X1 X2 XL-1 XL si Xi p(x1,…,xL,Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL | x1,…,xi, Si=l) (by the equality p(A,B) = p(A)p(B|A ). p(x1,…,xi, Si=l)= Fl(i) , which is computed by the forward algorithm:
30
B(si): The Backward algorithm
sL-1 sL X1 X2 XL-1 XL si Xi Recall: p(x1,…,xL,Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL | x1,…,xi, Si=l) We are left with the task to compute the Backward algorithm Bl(i) ≡ p(xi+1,…,xL | x1,…,xi, Si=l), and get the desired result: p(x1,…,xL, Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL | Si=l) ≡ Fi(l)·Bi(l)
31
B(si): The Backward algorithm
sL X1 X2 Xi Xi+1 XL From the probability distribution of Hidden Markov Model and the definition of conditional probability: Bl(i) = p(xi+1,…,xL | x1,…,xi,Si=l) = p(xi+1,…,xL |Si=l) =
32
B(si): The backward algorithm (cont.)
SL Xi+1 Xi+2 XL Thus we compute Bl(i) from the values of Bk(i) for all states k using the backward recursion:
33
B(si): The backward algorithm (end)
SL-1 SL First step, step L-1: Compute Bl(L-1) for each possible state l: XL For i=L-2 down to 1, for each possible state l, compute Bl(i) from the values of Bk(i+1) :
34
The combined answer s1 s2 sL-1 sL X1 X2 XL-1 XL si Xi To compute the probability that Si=l for all states l, given x=(x1,…,xL), run the forward algorithm and compute Fl(i) = P(x1,…,xi,Si=l), run the backward algorithm to compute Bl(i) = P(xi+1,…,xL|Si=l), the product Fl(i)Bl(i) is the answer. 2. To compute these probabilities for every i simply run the forward and backward algorithms once, storing Fl(i) and Bl(i) for all i and l. Compute Fl(i)Bl(i) for all i and l.
35
Time and Space Complexity of the forward/backward algorithms
sL-1 sL X1 X2 XL-1 XL si Xi Time complexity is O(m2L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. Space complexity is also O(m2L).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.