. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger
2 Hidden Markov Models - HMM X1X1 X2X2 X L-1 XLXL XiXi Hidden states Observed data H1H1 H2H2 H L-1 HLHL HiHi
3 Hidden Markov Models - HMM C-G Islands Example A C G T change A C G T C-G / Regular {A,C,G,T} X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi
4 Hidden Markov Models - HMM Coin-Tossing Example Fair/Loade d Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi transition probabilities emission probabilities 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2
5 Hidden Markov Models - HMM Interesting inference queries: Most probable state for certain position: MAX S { Pr[ S i =S | X 1,…,X L ] } Most probable path through states: MAX Š { Pr[S 1,…,S L =Š | X 1,…,X L ] } Likelihood of evidence: Pr[ X 1,…,X L | HMM] X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi
6 1.Compute posteriori belief in S i (for a specific i ) given the evidence {X 1,…,X L } for each possible value of S i. Pr[ S i =Loaded | X 1,…,X L ] and Pr[ S i =Fair | X 1,…,X L ] 2.Do this for every S i without repeating the first task L times. Query: what are the probabilities for Fair/Loaded coins given the set of tosses {X 1,…,X L } ? Coin-Tossing Example Question
7 Decomposing the computation Pr [X 1,…,X L, S i = S ] = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | X 1,…,X i, S i = S ] = = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | S i = S ] = = f i (S) * b i (S) Recall: Pr[ S i = S | X 1,…,X L ] = Pr [X 1,…,X L, S i = S ] / Pr[X 1,…,X L ] where Pr[X 1,…,X L ] = Σ S’ (Pr [ X 1,…,X L, S i = S’ ]) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi Markov
8 The forward algorithm The task: Compute f i (S) = Pr [X 1,…,X i, S i =S ] for i=1,…,L - consider evidence up to time slot i f 1 (S) = Pr [X 1, S 1 =S ] = Pr[S 1 =S ]* Pr [X 1 | S 1 =S ] {Base step} f 2 (S) = S’ ( Pr [X 1 X 2, S 1 =S’, S 2 =S ] ) = {2 nd step} = S’ ( Pr [X 1, S 1 =S’]* Pr [S 2 =S | X 1,S 1 =S’ ]* Pr [X 2 | X 1,S 1 =S’, S 2 =S] ) = = S’ ( Pr [X 1, S 1 =S’]* Pr [S 2 =S | S 1 =S’ ]* Pr [X 2 | S 2 =S ] ) = = S’ ( f 1 (S’) * P trans [S’ S ]* P emit [S X 2 ] ) f i (S) = S’ ( f i-1 (S’) * P trans [S’ S ]* P emit [S X i ] ) { i th step} X1X1 X2X2 XiXi S1S1 S2S2 SiSi transition emission Direct dependency
9 The backward algorithm The task: Compute b i (S) = Pr [X i+1,…,X L | S i =S ] for i=1,…,L - consider evidence after time slot i b L-1 (S) = Pr [X L | S L-1 =S ] = S’ ( Pr [X L, S L =S’ | S L-1 =S ] ) = {Base step} = S’ ( Pr [S L =S’ | S L-1 =S ]*Pr [X L | S L-1 =S, S L =S’ ] ) = = S’ ( P trans [S S’ ]*P emit [S’ X L ] ) b i (S) = S’ ( P trans [S S’ ] *P emit [S’ X i+1 ]* b i+1 (S’ ) ) { i th step} Direct dependency X L-1 XLXL X i+1 S L-1 SLSL S i+1 SiSi
10 The combined answer 1.Compute posteriori belief in S i (for a specific i ) given the evidence {X 1,…,X L } for each possible value of S i. Answer: Run forward and backward algorithms to obtain b i (S), f i (S). 2.Do this for every S i without repeating the first task L times. Answer: Run forward and backward algorithms to obtain b 1 (S), f L (S). (intermediate values are saved on the way) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi
11 Likelihood of evidence Likelihood of evidence - Pr [X 1,…,X L | HMM ] = Pr [X 1,…,X L ] = ? Pr [X 1,…,X L ] = S ( Pr [X 1,…,X L, S i =S ] ) = S ( f i (S) b i (S) ) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi You should get the same value no matter which i you choose
12 Coin-Tossing HMM Numeric example Outcome of 3 tosses: Head, Head, Tail Forward – 1 st step: Pr[X 1 =H, S 1 =Loaded] = Pr[S 1 =Loaded] * Pr[ Loaded H] = 0.5 * 0.75 = Pr[X 1 =H, S 1 =Fair] = Pr[S 1 =Fair] * Pr[ Fair H] = 0.5 * 0.5 = 0.25 Recall: f i (S) = Pr [X 1,…,X i, S i =S ] = S’ ( f i-1 (S’) * P trans [S’ S ]* P emit [S X i ] ) 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2
13 Coin-Tossing HMM Forward algorithm Outcome of 3 tosses: Head, Head, Tail Forward – 1 st step: Pr[X 1 =H, S 1 =Loaded]= Pr[S 1 =Loaded] * Pr[ Loaded H]= 0.5*0.75 = Pr[X 1 =H, S 1 =Fair] = Pr[S 1 =Fair] * Pr[ Fair H] = 0.5*0.5 = 0.25 Forward – 2 nd step: Pr[X 1 X 2 = HH, S 2 =Loaded]= Pr[X 1 =H, S 1 =Loaded] * Pr[ Loaded Loaded] * Pr[ Loaded H] + Pr[X 1 =H, S 1 =Fair] * Pr[ Fair Loaded] * Pr[ Loaded H] = * 0.9 * * 0.1 * 0.75 = Pr[X 1 X 2 = HH, S 2 =Fair]= Pr[X 1 =H, S 1 =Loaded] * Pr[ Loaded Fair] * Pr[ Fair H] + Pr[X 1 =H, S 1 =Fair] * Pr[ Fair Fair] * Pr[ Fair H] = * 0.1 * * 0.9 * 0.5 = Recall: f i (S) = Pr [X 1,…,X i, S i =S ] = S’ ( f i-1 (S’) * P trans [S’ S ]* P emit [S X i ] ) 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2
14 Coin-Tossing HMM Forward algorithm Forward – 1 st step: Pr[X 1 =H, S 1 =Loaded]= Pr[S 1 =Loaded] * Pr[ Loaded H]= 0.5*0.75 = Pr[X 1 =H, S 1 =Fair] = Pr[S 1 =Fair] * Pr[ Fair H] = 0.5*0.5 = 0.25 Forward – 2 nd step: Pr[X 1 X 2 = HH, S 2 =Loaded] = Pr[X 1 X 2 = HH, S 2 =Fair] = Forward – 3 rd step: Pr[X 1 X 2 X 3 = HHT, S 3 =Loaded]= Pr[X 1 X 2 =HH, S 2 =Loaded] * Pr[ Loaded Loaded] * Pr[ Loaded T] + Pr[X 1 X 2 =HH, S 2 =Fair] * Pr[ Fair Loaded] * Pr[ Loaded T] = * 0.9 * * 0.1 * 0.25 = Pr[X 1 X 2 X 3 = HHT, S 3 =Fair]= Pr[X 1 X 2 =HH, S 1 =Loaded] * Pr[ Loaded Fair] * Pr[ Fair T] + Pr[X 1 X 2 =HH, S 1 =Fair] * Pr[ Fair Fair] * Pr[ Fair T] = * 0.1 * * 0.9 * 0.5 = Recall: f i (S) = Pr [X 1,…,X i, S i =S ] = S’ ( f i-1 (S’) * P trans [S’ S ]* P emit [S X i ] ) Outcome of 3 tosses: Head, Head, Tail 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2
15 Coin-Tossing HMM Backward algorithm Backward – 1 st step: Pr[X 3 =T | S 2 =Loaded] = Pr[ Loaded Loaded] * Pr[ Loaded T] + Pr[ Loaded Fair] * Pr[ Fair T] = 0.9* = Pr[X 3 =T | S 2 =Fair] = Pr[Fair Loaded] * Pr[ Loaded T] + Pr[Fair Fair] * Pr[ Fair T] = 0.1* = Backward – 2 nd step: Pr[X 2 X 3 = HT | S 1 =Loaded]= Pr[ Loaded Loaded] * Pr[ Loaded H] * Pr[X 3 =T | S 2 =Loaded] + Pr[ Loaded Fair] * Pr[ Fair H] * Pr[X 3 =T | S 2 =Fair] = 0.9 * 0.75 * * 0.5 * = Pr[X 2 X 3 = HT | S 1 =Fair]= Pr[ Fair Loaded] * Pr[ Loaded H] * Pr[X 3 =T | S 2 =Loaded] + Pr[Fair Fair] * Pr[ Fair H] * Pr[X 3 =T | S 2 =Fair] = 0.1 * 0.75 * * 0.5 * = Recall: b i (S) = Pr [X i+1,…,X L | S i =S ] = S’ ( P trans [S S’ ] *P emit [S’ X i+1 ]* b i+1 (S) ) Outcome of 3 tosses: Head, Head, Tail 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2
16 Coin-Tossing HMM Likelihood of evidence Likelihood: * * 1 = * * = * * = Recall: likelihood = Pr [X 1,…,X L ] = Outcome of 3 tosses: Head, Head, Tail 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/ LFLF LFLF Forward: Backward:
17 Most Probable Path Likelihood of evidence: Pr [X 1 …X L ] = Š ( Pr [X 1 …X L, S 1 …S L = Š] ) We wish to compute: Pr * [X 1 …X L ] = MAX Š { Pr [X 1 …X L, S 1 …S L = Š] } and the most probable path leading to this value: S * 1,…,S * L = ARGMAX Š { Pr [X 1 …X L, S 1 …S L = Š] } X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi
18 Most Probable Path Revisiting likelihood calculation S’ ( Pr[S 1 =S’]*Pr[ S’ X 1 ]* S’’ ( Pr[S’ S’’]*Pr[S’’ X 2 ]* S’’’ ( Pr[S’’ S’’’]*Pr[S’’’ X 3 ] ))) = S’ S’’ S’’’ ( Pr [X 1 X 2 X 3, S 1 S 2 S 3 = S’S’’S’’’] ) S’ ( Pr[S 1 =S’]*Pr[ S’ X 1 ]* S’’ ( Pr[S’ S’’]*Pr[S’’ X 2 ]* b 2 (S’’) )) = Pr[X 1,X 2,X 3 ] = S’ ( Pr[S 1 =S’]*Pr[ S’ X 1 ]* b 1 (S’) ) = X1X1 X2X2 S1S1 S2S2 X3X3 S3S3
19 MAX S’ { Pr[S 1 =S’]*Pr[ S’ X 1 ]* MAX S’’ { Pr[S’ S’’]*Pr[S’’ X 2 ]* MAX S’’’ { Pr[S’’ S’’’]*Pr[S’’’ X 3 ] }}} =MAX S’ S’’ S’’’ { Pr [X 1 X 2 X 3, S 1 S 2 S 3 = S’S’’S’’’] } MAX S’ { Pr[S 1 =S’]*Pr[ S’ X 1 ]* MAX S’’ { Pr[S’ S’’]*Pr[S’’ X 2 ]* v 2 (S’’) }} = Pr * [X 1,X 2,X 3 ] = MAX S’ { Pr[S 1 =S’]*Pr[ S’ X 1 ]* v 1 (S’) } = X1X1 X2X2 S1S1 S2S2 X3X3 S3S3 S3*S3* S2*S2* S1*S1* Most probable path: S 1 * S 2 * S 3 * Most Probable Path
20 Backward phase: Calculate values v i (S) = Pr * [X i+1,…,X L | S i =S ] Base: v L (S) = 1 Step: v i (S) = MAX S’ { Pr[S S’]*Pr[S’ X i+1 ]* v i+1 (S’) } π i+1 (S) = ARGMAX S’ { Pr[S S’]*Pr[S’ X i+1 ]* v i+1 (S’) } Forward phase: Trace path of maximum probability Base: π 1 = S 1 * = ARGMAX S’ { Pr[S’]*Pr[S’ X 1 ]* v 1 (S’) } Step: S i+1 * = π i+1 (S i ) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi Most Probable Path Viterbi’s algorithm Classical Dynamic Programming The value of S i+1 which maximizes the probability
21 Most Probable Path Coin-Tossing Example Fair/Loade d Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Reminder: Outcome of 3 tosses: Head, Head, Tail What is the most probable series of coins?
22 S 1, S 2, S 3 Pr[X 1,X 2,X 3, S 1,S 2,S 3 ] F, F, F (0.5) 3 *0.5*(0.9) 2 = F, F, L (0.5) 2 *0.25*0.5*0.9*0.1 = F, L, F 0.5*0.75*0.5*0.5*0.1*0.1 = F, L, L 0.5*0.75*0.25*0.5*0.1*0.9 = L, F, F 0.75*0.5*0.5*0.5*0.1*0.9 = L, F, L 0.75*0.5*0.25*0.5*0.1*0.1 = L, L, F 0.75*0.75*0.5*0.5*0.9*0.1 = L, L, L 0.75*0.75*0.25*0.5*0.9*0.9 = Pr[X 1,X 2,X 3, S 1,S 2,S 3 ] = Pr[S 1,S 2,S 3 | X 1,X 2,X 3 ]* Pr[X 1,X 2,X 3 ] max Most Probable Path Coin-Tossing Example HH S1S1 S2S2 T S3S3 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Exponential in length of observation
23 Most Probable Path Coin-Tossing Example HH S1S1 S2S2 T S3S3 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Backward phase: Calculate values v i (S) = Pr * [X i+1,…,X L | S i =S ] Base: v 3 (L) = v 3 (F) = 1 Step : v i (S) = MAX S’ { Pr[S S’]*Pr[S’ X i+1 ]* v i+1 (S’) } v 2 (L) = MAX {Pr[L L]*Pr[L T]*v 3 (L), Pr[L F]*Pr[F T]*v 3 (F) } = = MAX {0.9 * 0.25, 0.1 * 0.5} = π 3 (L) = L v 2 (F) = MAX {Pr[F L]*Pr[L T]*v 3 (L), Pr[F F]*Pr[F T]*v 3 (F) } = = MAX {0.1 * 0.25, 0.9 * 0.5} = 0.45 π 3 (F) = F
24 Most Probable Path Coin-Tossing Example HH S1S1 S2S2 T S3S3 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Backward phase: Calculate values v i (S) = Pr * [X i+1,…,X L | S i =S ] Step : v i (S) = MAX S’ { Pr[S S’]*Pr[S’ X i+1 ]* v i+1 (S’) } v 2 (L) = π 3 (L) = L v 2 (F) = 0.45 π 3 (F) = F v 1 (L) = MAX {Pr[L L]*Pr[L H]*v 2 (L), Pr[L F]*Pr[F H]*v 2 (F) } = = MAX {0.9*0.75*0.225, 0.1*0.5*0.45} = π 2 (L) = L v 1 (F) = MAX {Pr[F L]*Pr[L H]*v 2 (L), Pr[F F]*Pr[F H]*v 2 (F) } = = MAX {0.1*0.75*0.225, 0.9*0.5*0.45} = π 2 (F) = F
25 Most Probable Path Coin-Tossing Example HH S1S1 S2S2 T S3S3 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Backward phase: Calculate values v i (S) = Pr * [X i+1,…,X L | S i =S ] v 2 (L) = π 3 (L) = L v 2 (F) = 0.45 π 3 (F) = F v 1 (L) = π 2 (L) = L v 1 (F) π 2 (F) = F Pr * [HHT] = MAX { Pr[L]*Pr[L H]*v 1 (L), Pr[F]*Pr[F H]*v 1 (F) } = = MAX { 0.5*0.75* , 0.5*0.5* } = Forward phase: Trace maximum-pointers S 1 * = L S 2 * = π 2 (S 1 * ) = L S 3 * = π 3 (S 2 * ) = L ( )
26 Hidden Markov Models - HMM Interesting inference queries: Most probable state for certain position: MAX S { Pr[ S i =S | X 1,…,X L ] } Most probable path through states: MAX Š { Pr[S 1,…,S L =Š | X 1,…,X L ] } Likelihood of evidence: Pr[ X 1,…,X L | HMM] X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi √ √ √