Presentation is loading. Please wait.

Presentation is loading. Please wait.

Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology.

Similar presentations


Presentation on theme: "Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology."— Presentation transcript:

1 Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI

2 Resources used for these slides Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press. 1998. Sections 3.1-3.3. Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press. 1998 Prof. Moran's Algorithms in Computational Biology course (Technion Univ.): Prof. Moran's Algorithms in Computational Biology course (Technion Univ.) – Ydo Wexler & Dan Geiger's Markov Chain Tutorial. Ydo Wexler & Dan Geiger's Markov Chain Tutorial. – Hidden Markov Models (HMMs) Tutorial. Hidden Markov Models (HMMs) Tutorial.

3 HMM: Coke/Pepsi Example start B R A C P C P Hidden States: start: fake start state A: The price of Coke and Pepsi are the same R: “Red sale”: Coke is on sale (cheaper than Pepsi) B: “Blue sale”: Pepsi is on sale (cheaper than Coke) Emissions: C: Coke P: Pepsi 0.6 0.1 0.3 0.2 0.4 0.7 0.1 0.3 0.8 0.3 0.1 C P 0.6 0.4 0.5 0.1 0.9

4 1. Finding the most likely trajectory Given a HMM and a sequence of observables: x 1,x 2,…,x L determine the most likely sequence of states that generated x 1,x 2,…,x L : S*= (s* 1,s* 2,…,s* L ) = argmax p( s 1,s 2,…,s L | x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L )/p(x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L

5 = argmax p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L = argmax p(s 1,s 2,…,s L-1 ; x 1,x 2,…,x L-1 )p(s L |s L-1 )p(x L |s L ) s 1,s 2,…,s L This inspires a recursive formulation of S*. Viterbi’s idea: This can be calculated using dynamic programming. v(k,t) = max p(s 1,..,s t = k ; x 1,..,x t ) that is, the probability of a most probable path up to time t that ends on state k. By the above derivation: v(k,t) = max p(s 1,..,s t-1 ; x 1,..,x t-1 )p(s t =k|s t-1 )p(x t |s t =k) = max v(j,t-1)p(s t =k|s j )p(x t |s t =k) j = p(x t |s t =k) max v(j,t-1)p(s t =k|s j ) j

6 Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 = C start 1000 A 0 R 0 B 0 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC initialization

7 Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 = C start1000 A 0p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|A ) max {v(start,0)p(A|start), 0, 0, 0} = p(C|A ) v(start,0)p(A|start) = 0.6 *1*0.6 = 0.36 Parent: start R 0 p(C|R) max {v(start,0)p(R|start), 0, 0, 0} = 0.9*1*0.1 = 0.09 Parent: start B 0 p(C|B) max {v(start,0)p(B|start), 0, 0, 0} = 0.5*1*0.3 = 0.15 Parent: start Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

8 Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.36 Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|A) max {v(start,1)p(A|start), v(A,1)p(A|A), v(R,1)p(A|R), v(B,1)p(A|B)} = 0.4* max{0, 0.36*0.2, 0.09*0.1, 0.15*0.4} = 0.4*0.072= 0.0288 Parent: A R 0 0.09 Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|R) max {v(start,1)p(R|start), v(A,1)p(R|A), v(R,1)p(R|R), v(B,1)p(R|B)} = 0.1* max{0, 0.36*0.1, 0.09*0.1, 0.15*0.3} = 0.1*0.045= 0.0045 Parent: B B 0 0.15 Parent: start = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(P|B) max {v(start,1)p(B|start), v(A,1)p(B|A), v(R,1)p(B|R), v(B,1)p(B|B)} = 0.5* max{0, 0.36*0.7, 0.09*0.8, 0.15*0.3} = 0.5*0.252= 0.126 Parent: A Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

9 Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.36 Parent: start 0.0288 Parent: A = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|A) max {v(start,2)p(A|start), v(A,2)p(A|A), v(R,2)p(A|R), v(B,2)p(A|B)} = 0.6* max{0, 0.0288*0.2, 0.0045*0.1, 0.126*0.4} = 0.6*0.0504= 0.03024 Parent: B R 0 0.09 Parent: start 0.0045 Parent: B = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|R) max {v(start,2)p(R|start), v(A,2)p(R|A), v(R,2)p(R|R), v(B,2)p(R|B)} = 0.9* max{0, 0.0288*0.1, 0.0045*0.1, 0.126*0.3} = 0.9*0.0378= 0.03402 Parent: B B 0 0.15 Parent: start 0.126 Parent: A = p(x t |s t =k) max j v(j,t-1)p(s t |s j ) = p(C|B) max {v(start,1)p(B|start), v(A,2)p(B|A), v(R,2)p(B|R), v(B,2)p(B|B)} = 0.5* max{0, 0.0288*0.7, 0.0045*0.8, 0.126*0.3} = 0.5*0.0378= 0.0189 Parent: B Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC

10 Viterbi’s Algorithm - Example v x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.36 Parent: start 0.0288 Parent: A 0.03024 Parent: B R 0 0.09 Parent: start 0.0045 Parent: B 0.03402 Parent: B B 0 0.15 Parent: start 0.126 Parent: A 0.0189 Parent: B Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the most likely path S*= (s* 1,s* 2,s* 3 ) that generated x 1,x 2,x 3 = CPC Hence, the most likely path that generated CPC is: start A B R This maximum likelihood path is extracted from the table as follows: The last state of the path is the one with the highest value in the right-most column The previous state in the path is the one recorded as Parent of the last Keep following the Parents trail backwards until you arrive at start

11 2. Calculating the probability of a sequence of observations Given a HMM and a sequence of observations: x 1,x 2,…,x L determine p(x 1,x 2,…,x L ): p(x 1,x 2,…,x L ) =  p( s 1,s 2,…,s L ; x 1,x 2,…,x L ) s 1,s 2,…,s L =  p(s 1,s 2,…,s L-1 ; x 1,x 2,…,x L-1 )p(s L |s L-1 )p(x L |s L ) s 1,s 2,…,s L

12 Let f(k,t) = p(s t = k ; x 1,..,x t ) that is, the probability of x 1,..,x t requiring s t = k. In other words, the sum of probabilities of all the paths that emit (x 1,..,x t ) and end in state s t =k. f(k,t) = p(s t = k ; x 1,..,x t, x t ) =  j p(s t-1 =j; x 1,x 2,…,x t-1 ) p(s t =k|s t-1 =j) p(x t |s t =k) = p(x t |s t =k)  j p(s t-1 =j; x 1,x 2,…,x t-1 ) p(s t =k|s t-1 =j) = p(x t |s t =k)  j f(j,t-1) p(s t =k|s t-1 )

13 Forward Algorithm - Example f x 1 = Cx 2 = Px 3 = C start 1000 A 0 R 0 B 0 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC). initialization

14 Forward Algorithm - Example f x 1 = Cx 2 = Px 3 = C start1000 A 0 p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|A )  {f(start,0)p(A|start), 0, 0, 0} = p(C|A ) f(start,0)p(A|start) = 0.6 *1*0.6 = 0.36 R 0 p(C|R)  {f(start,0)p(R|start), 0, 0, 0} = 0.9*1*0.1 = 0.09 B 0 p(C|B)  {f(start,0)p(B|start), 0, 0, 0} = 0.5*1*0.3 = 0.15 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

15 Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.36 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|A) (f(start,1)p(A|start), + f(A,1)p(A|A), + f(R,1)p(A|R), + f(B,1)p(A|B)) = 0.4* (0 + 0.36*0.2 + 0.09*0.1 + 0.15*0.4) = 0.4*0.141= 0.0564 R 0 0.09 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|R) (f(start,1)p(R|start) + f(A,1)p(R|A) + f(R,1)p(R|R) + f(B,1)p(R|B)) = 0.1* (0 + 0.36*0.1 + 0.09*0.1 + 0.15*0.3) = 0.1*0.09= 0.009 B 0 0.15 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(P|B) (f(start,1)p(B|start) + f(A,1)p(B|A) + f(R,1)p(B|R) + f(B,1)p(B|B)) = 0.5* (0 + 0.36*0.7 + 0.09*0.8 + 0.15*0.3) = 0.5*0.369= 0.1845 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

16 Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.360.0564 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|A)  {f(start,2)p(A|start), f(A,2)p(A|A), f(R,2)p(A|R), f(B,2)p(A|B)} = 0.6* (0 + 0.0564*0.2 + 0.009*0.1 + 0.1845*0.4} = 0.6*0.08598= 0.05159 R 0 0.090.009 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|R)  {f(start,2)p(R|start), f(A,2)p(R|A), f(R,2)p(R|R), f(B,2)p(R|B)} = 0.9* (0 + 0.0564*0.1 + 0.009*0.1 + 0.1845*0.3} = 0.9*0.06189= 0.05570 B 0 0.150.1845 = p(x t |s t =k)  j f(j,t-1)p(s t |s j ) = p(C|B)  {f(start,1)p(B|start), f(A,2)p(B|A), f(R,2)p(B|R), f(B,2)p(B|B)} = 0.5* (0 + 0.0564*0.7 + 0.009*0.8 + 0.1845*0.3} = 0.5*0.10203= 0.05102 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

17 Forward Algorithm - Example f x 1 = Cx 2 = Px 3 =C start 1000 A 0 0.360.05640.05159 R 0 0.090.0090.05570 B 0 0.150.18450.05102 Hence, the probability of CPC being generated by this HMM is: p(CPC) =  j f(j,3) = 0.05159 + 0.05570 + 0.05102 = 0.15831 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x 1,x 2,x 3 = CPC. That is, find p(CPC).

18 3. Calculating the probability of S t = k given a sequence of observations Given a HMM and a sequence of observations: x 1,x 2,…,x L determine the pr obability that the state visited at time t was k: p( s t =k| x 1,x 2,…,x L ), where 1 <= t <= L p( s t =k| x 1,x 2,…,x L ) = p(x 1,x 2,…,x L ; s t =k )/p(x 1,x 2,…,x L ) Note that p(x 1,x 2,…,x L ) can be found using the forward algorithm. We’ll focus now on determining p(x 1,x 2,…,x L ; s t =k )

19 p(x 1,…,x t,…,x L ; s t =k) = p(x 1,…,x t ; s t =k) p(x t+1,…,x L | x 1,…,x t ; s t =k) = p(x 1,…,x t ; s t =k) p(x t+1,…,x L | s t =k) f(k,t) b(k,t) forward algorithm backward algorithm b(k,t) = p(x t+1,…,x L | s t =k) =  j p(s t+1 =j|s t =k)p(x t+1 |s t+1 =j) p(x t+2,…,x L | s t+1 =j) b(j,t+1)

20 Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A 1 R 1 B 1 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k) initialization

21 Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|A) p(C|s 3 =j) b(j,3) = p(A|A)p(C|A)b(A,3) + p(R|A)p(C|R)b(R,3) + p(B|A)p(C|B)b(B,3) = 0.2*0.6*1 + 0.1*0.9*1 + 0.7*0.5*1 = 0.56 1 R  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|R) p(C|s 3 =j) b(j,3) = p(A|R)p(C|A)b(A,3) + p(R|R)p(C|R)b(R,3) + p(B|R)p(C|B)b(B,3) = 0.1*0.6*1 + 0.1*0.9*1 + 0.8*0.5*1 = 0.55 1 B  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 3 =j|R) p(C|s 3 =j) b(j,3) = p(A|B)p(C|A)b(A,3) + p(R|B)p(C|R)b(R,3) + p(B|B)p(C|B)b(B,3) = 0.4*0.6*1 + 0.3*0.9*1 + 0.3*0.5*1 = 0.66 1 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k)

22 Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|A) p(P|s 2 =j) b(j,2) = p(A|A)p(P|A)b(A,2) + p(R|A)p(P|R)b(R,2) + p(B|A)p(P|B)b(B,2) = 0.2*0.4*0.56 + 0.1*0.1*0.55 + 0.7*0.5*0.66 = 0.2813 0.56 1 R  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|R) p(P|s 2 =j) b(j,2) = p(A|R)p(P|A)b(A,2) + p(R|R)p(P|R)b(R,2) + p(B|R)p(P|B)b(B,2) = 0.1*0.4*0.56 + 0.1*0.1*0.55 + 0.8*0.5*0.66 = 0.2919 0.55 1 B  j p(s t+1 =j|s t =k) p(x t+1 |s t+1 =j) b(j,t+1) =  j p(s 2 =j|R) p(P|s 2 =j) b(j,2) = p(A|B)p(P|A)b(A,2) + p(R|B)p(P|R)b(R,2) + p(B|B)p(P|B)b(B,2) = 0.4*0.4*0.56 + 0.3*0.1*0.55 + 0.3*0.5*0.66 = 0.2051 0.66 1 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k)

23 Backward Algorithm - Example b x 1 = Cx 2 = Px 3 = C A 0.28130.56 1 R 0.29190.55 1 B 0.20510.66 1 Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the HMM emits x t+1,…,x L given that S t =k: p(x t+1,…,x L | s t =k) We can calculate the probability of CPC being generated by this HMM from the Backward table as follows: p(CPC) =  j b(j,1)p(j|start)p(C|j) = (0.2813+0.6*0.6) + (0.2919*0.1*0.9) + (0.2051*0.3*0.5)= 0.15831 though we can obtain the same probability from the Forward table (as we did in a previous slide).

24 3. (cont.) Using the Forward and Backward tables to calculate the probability of S t = k given a sequence of observations Example: Given: Coke/Pepsi HMM, and sequence of observations: CPC Find the probability that the state visited at time 2 was B, that is p(s 2 =B| CPC) In other words, given that the person drank CPC, what’s the probability that Pepsi was on sale during the 2 nd week? Based on the calculations we did on the previous slides: p( s 2 =B|CPC ) = p(CPC; s 2 =B )/p(CPC) = [ p( x 1 =C, x 2 =P; s 2 =B) p(x 3 =C| x 1 =C, x 2 =P ; s 2 =B) ] / p(x 1 =C, x 2 =P, x 3 =C) = [ p(x 1 =C, x 2 =P; s 2 =B) p(x 3 =C| s 2 =B) ] / p(CPC) = [ f(B,2) b(B,2) ] / p(CPC) = [0.1845 * 0.66] / 0.15831 = 0.7691 here, p(CPC) was calculated by summing up the last column of the Forward table. so there is a high probability that Pepsi was on sale during week 2, given that the person drank Pepsi that week!


Download ppt "Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology."

Similar presentations


Ads by Google