Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Similar presentations


Presentation on theme: ". Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001."— Presentation transcript:

1 . Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

2 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m dimensional initial distribution vector ( p(1),.., p(m)). 2.An m×m transition probabilities matrix M= (m st ) For each integer L, a Markov Chain assigns probability to sequences (x 1 …x L ) over D (i.e, x i D) as follows: Similarly, (X 1,…, X i,…)is a sequence of probability distributions over D.

3 3 Ergodic Markov Chains The Fundamental Theorem of Finite-state Markov Chains: A Markov Chain is ergodic iff 1.It has a unique stationary distribution vector V > 0, (V is an Eigenvector of the transition matrix). 2.For any initial distribution, the distributions X i, as i  ∞, converges to V. A B C D 0.2 0.3 0.5 0.05 0.95 0.2 0.8 1 A Markov chain is ergodic if : 1.the corresponding graph is strongly connected 2.It is not peridoic

4 4 Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

5 5 Example: CpG Island (Cont.) We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? We solved Question 1 by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

6 6 Question 1: Using two Markov chains M + (For CpG islands): X i-1 XiXi ACGT A 0.180.270.430.12 C 0.17p + (C | C)0.274p + (T|C) G 0.16p + (C|G)p + (G|G)p + (T|G) T 0.08p + (C |T) p + (G|T)p + (T|T) The “+” model: Use transition matrix M + = (m + st ), Where: m + st = (the probability that t follows s in a CpG island)

7 7 Question 1: Using two Markov chains X i-1 XiXi ACGT A 0.30.20.290.21 C 0.32p - (C|C)0.078p - (T|C) G 0.25p - (C|G) p - (G|G) p - (T|G) T 0.18p - (C|T)p - (G|T)p - (T|T) The “-” model: Use transition matrix M - = (m - st ), Where: m - st = (the probability that t follows s in a non CpG island)

8 8 CpG Island: Question 2 Question 2: Given a long piece of genomic data, does it contain CpG islands, and where? For solving this question, we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model.

9 9 Model for question 2 Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): C+C+ T+T+ G+G+ A+A+ C-C- T-T- G-G- A-A- The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. Therefore we use here Hidden Markov Model, which we define and study next.

10 10 Hidden Markov Model A Markov chain over a set of states, and for each state s and symbol x, an emission probability p(X i =x|S i =s). S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Notations: Markov Chain transition probabilities: p(S i+1 = t|S i = s) = m st Emission probabilities: p(X i = b| S i = s) = e s (b) HMM consists of:

11 11 Hidden Markov Model S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT The probability of the chain S and the emitted letters X is :

12 12 Probability distribution defined by HMM is a probability distribution over all sequences of length L. That is: Claim: For each fixed L>0, the function p defined by:

13 13 Probability distribution defined by HMM Proof by induction on L. For L=1:

14 14 Probability distribution defined by HMM Induction step: Assume correctness for L, prove for L+1: 11 1 (by induction)

15 15 Independence properties in HMM We would like that HMM will satisfy certain independence properties, e.g: 1.The distribution of states S k is completely determined by the identity of the preceding state s k-1, 2. The distribution of transmitted letter X k is completely determined by the transmitting state s k. In the next slides we will formally prove that 2. is implied by the probability distribution of HMM which we just defined.

16 16 Claim: The following conditional independence holds: p(X k =x k |x 1,..,x k-1,x k+1,..,x L,s 1,..,s k,..,s L ) = p(X k =x k |S k =s k ) = e s k (x k ) Independence of emission probabilities S1S1 SLSL x1x1 xLxL M M M M T SkSk X k =? T T

17 17 Independence of emission probabilities Proof of claim: We use the definition of conditional probability, P(A|B) = P(A,B)/P(B). Note: p(A,B) denotes p(A  B). A is the event X k =x k. (the k-th output is x k ). B is the event which specifies all the sequence except X k : (X 1 = x 1,.., X k-1 = x k-1, X k+1 = x k+1,.., X L = x L,S 1 = s 1,.., S L = s L ). (A,B) is the event (X 1 = x 1,.., X L = x L,S 1 = s 1,.., S L = s L ).

18 18 Independence of emission probabilities S1S1 SLSL x1x1 xLxL M M M M T SkSk X k =? T T Proof (cont)

19 19 Independence of emission probabilities = p(A,B)/e s k (x k ) Thus we conclude: P(A|B) = P(A,B)/P(B) = e s k (x k ) QED Proof (end): From the previous equalities we have:

20 20 Independence of emission probabilities Exercise: Using the definition of conditional probability: P(A|B) = P(A,B)/P(B), prove formally that for any set of constraints B: B  {X 1 = x 1,.., X i-1 = x i-1, X i+1 = x i+1,.., X L = x L,S 1 = s 1,..,S i = s i,.., S L = s L }, such that “S i = s i ”  B, it holds that p(X i =x i |B) = e s i (x i ) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Hint: express the probabilities as sum of p(S,X) over all possible S and X.

21 21 Hidden Markov Model: three questions of interest Given the “visible” sequence x =(x 1,…,x L ), find: 1.A most probable (hidden) path. 2.The probability of x. 3.For each i = 1,..,L, and for each state k, the probability that s i =k. S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT

22 22 1. Most Probable state path S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Given an output sequence x = (x 1,…,x L ), A most probable path s*= (s * 1,…,s * L ) is one which maximizes p(s|x).

23 23 Most Probable path (cont.) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Since we need to find s which maximizes p(s,x)

24 24 Viterbi’s algorithm for most probable path s1s1 s2s2 X1X1 X2X2 sisi XiXi The task: compute v l (i) = the probability p(s 1,..,s i ;x 1,..,x i |s i =l ) of a most probable path up to i, which ends in state l. Let the states be {1,…,m} Idea: for i=1,…,L and for each state l, compute:

25 25 Viterbi’s algorithm for most probable path v l (i) = the probability p(s 1,..,s i ;x 1,..,x i |s i =l ) of a most probable path up to i, which ends in state l. For i = 1,…,L and for each state l we have: s1s1 S i-1 X1X1 X i-1 l XiXi...

26 26 Viterbi’s algorithm s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : v l (i) = e l (x i ) MAX k {v k (i-1)m kl } ptr i (l)=argmax k {v k (i-1)m kl } [storing previous state for reconstructing the path] Termination: Initialization: v 0 (0) = 1, v k (0) = 0 for k > 0 0 We add the special initial state 0. Result : p(s 1 *,…,s L * ;x 1,…,x l ) =

27 27 2. Computing p(x) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Given an output sequence x = (x 1,…,x L ), compute the probability that this sequence was generated by the given HMM: The summation taken over all state-paths s generating x.

28 28 Forward algorithm for computing p(x) ? ? X1X1 X i-1 sisi XiXi The task: compute Idea: for i=1,…,L and for each state l, compute: F l (i) = p(x 1,…,x i ;s i =l ), the probability of all the paths which emit (x 1,..,x i ) and end in state s i =l. Use the recursive formula:

29 29 Forward algorithm for computing p(x) s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : F l (i) = e l (x i ) ∑ k F k (i-1)m kl Initialization: f 0 (0) := 1, f k (0) := 0 for k>0 0 Similar to Viterbi’s algorithm (use sum instead of maximum): Result : p(x 1,…,x L ) =

30 30 3. The distribution of S i, given x S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Given an output sequence x = (x 1,…,x L ), Compute for each i=1,…,l and for each state k the probability that s i = k. This helps to reply queries like: what is the probability that s i is in a CpG island, etc.

31 31 Solution in two stages 1. For a fixed i and each state k, an algorithm to compute p(s i =k | x 1,…,x L ). s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi 2. An algorithm which perform this task for every i = 1,..,L, without repeating the first task L times.

32 32 Computing for a single i: s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi

33 33 Computing for a single i p(x 1,…,x L,s i ) = p(x 1,…,x i,s i ) p(x i+1,…,x L | x 1,…,x i,s i ) (by the equality p(A,B) = p(A)p(B|A ). s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi p(x 1,…,x i,s i )= f s i (i) ≡ F(s i ), which is the forward algorithm which we did already:

34 34 F(s i ): The Forward algorithm: s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : F(s i ) = e s i (x i )·∑ s i-1 F (s i-1 )m s i-1, s i Initialization: F (0) = 1 0 The algorithm computes F(s i ) = P(x 1,…,x i,s i ) for i=1,…,L

35 35 B(s i ): The Backward algorithm p(x 1,…,x L,s i ) = p(x 1,…,x i,s i ) p(x i+1,…,x L | x 1,…,x i,s i ) s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi We are left with the task to compute the Backward algorithm B(s i ) ≡ p(x i+1,…,x L | x 1,…,x i,s i ), and get the desired result: p(x 1,…,x L,s i ) = p(x 1,…,x i,s i ) p(x i+1,…,x L | s i ) ≡ F(s i )·B(s i )

36 36 B(s i ): The Backward algorithm s1s1 s2s2 S i+1 sLsL X1X1 X2X2 X i+1 XLXL sisi XiXi From the probability distribution of Hidden Markov Chain and the definition of conditional probability: B(s i ) = p(x i+1,…,x L | x 1,…,x i,s i ) = p(x i+1,…,x L | s i ) =

37 37 B(s i ): The backward algorithm (cont.) S i+2 SLSL X i+2 XLXL SiSi S i+1 X i+1 The Backward algorithm computes B(s i ) from the values of B(s i+1 ) for all states s i+1.

38 38 B(s i ): The backward algorithm (end) S L-1 SLSL XLXL First step, step L-1: Compute B(s L-1 ) for each possible state s L-1 : For i=L-2 down to 1, for each possible state s i, compute B(s i ) from the values of B(s i+1 ):

39 39 The combined answer 1. To compute the probability that S i =s i given x=(x 1,…,x L ), run the forward algorithm and compute F(s i ) = P(x 1,…,x i,s i ), run the backward algorithm to compute B(s i ) = P(x i+1,…,x L |s i ), the product F(s i )B(s i ) is the answer (for every possible value s i ). 2. To compute these probabilities for every s i simply run the forward and backward algorithms once, storing F(s i ) and B(s i ) for every i (and every value of s i ). Compute F(s i )B(s i ) for every i. s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi

40 40 Time and Space Complexity of the forward/backward algorithms Time complexity is O(m 2 L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi Space complexity is also O(m 2 L).


Download ppt ". Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001."

Similar presentations


Ads by Google