CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010
HMM Definition Set of states: S where |S|=N Start state S 0 /*P(S 0 )=1*/ Output Alphabet: O where |O|=M Transition Probabilities: A= {a ij } /*state i to state j*/ Emission Probabilities : B= {b j (o k )} /*prob. of emitting or absorbing o k from state j*/ Initial State Probabilities: Π={p 1,p 2,p 3,…p N } Each p i =P(o 0 =ε,S i |S 0 )
Three basic problems (contd.) Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward-Backward Algorithm )
Forward and Backward Probability Calculation
Forward probability F(k,i) Define F(k,i)= Probability of being in state S i having seen o 0 o 1 o 2 …o k F(k,i)=P(o 0 o 1 o 2 …o k, S i ) With m as the length of the observed sequence P(observed sequence)=P(o 0 o 1 o 2..o m ) =Σ p=0,N P(o 0 o 1 o 2..o m, S p ) =Σ p=0,N F(m, p)
Forward probability (contd.) F(k, q) = P(o 0 o 1 o 2..o k, S q ) = P(o 0 o 1 o 2..o k-1, o k,S q ) = Σ p=0,N P(o 0 o 1 o 2..o k-1, S p, o k,S q ) = Σ p=0,N P(o 0 o 1 o 2..o k-1, S p ). P(o m,S q |o 0 o 1 o 2..o k-1, S p ) = Σ p=0,N F(k-1,p). P(o k,S q |S p ) = Σ p=0,N F(k-1,p). P(S p S q ) okok O 0 O 1 O 2 O 3 … O k O k+1 … O m-1 O m S 0 S 1 S 2 S 3 … S p S q … S m S final
Backward probability B(k,i) Define B(k,i)= Probability of seeing o k o k+1 o k+2 …o m given that the state was S i B(k,i)=P(o k o k+1 o k+2 …o m \ S i ) With m as the length of the observed sequence P(observed sequence)=P(o 0 o 1 o 2..o m ) = P(o 0 o 1 o 2..o m | S 0 ) =B(0,0)
Backward probability (contd.) B(k, p) = P(o k o k+1 o k+2 …o m \ S p ) = P(o k+1 o k+2 …o m, o k |S p ) = Σ q=0,N P(o k+1 o k+2 …o m, o k, S q |S p ) = Σ q=0,N P(o k,S q |S p ) P(o k+1 o k+2 …o m |o k,S q,S p ) = Σ q=0,N P(o k+1 o k+2 …o m |S q ). P(o k, S q |S p ) = Σ q=0,N B(k+1,q). P(S p S q ) okok O 0 O 1 O 2 O 3 … O k O k+1 … O m-1 O m S 0 S 1 S 2 S 3 … S p S q … S m S final
Continuing with the Urn example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 Colored Ball choosing
Example (contd.) U1U1 U2U2 U3U3 U1U U2U U3U Given : Observation : RRGGBRGR What is the corresponding state sequence ? and RGB U1U U2U U3U Transition Probability Observation/output Probability
Diagrammatic representation (1/2) U1U1 U2U2 U3U R, 0.6 G, 0.1 B, 0.3 R, 0.1 B, 0.5 G, 0.4 B, 0.2 R, 0.3 G, 0.5
Diagrammatic representation (2/2) U1U1 U2U2 U3U3 R,0.02 G,0.08 B,0.10 R,0.24 G,0.04 B,0.12 R,0.06 G,0.24 B,0.30 R, 0.08 G, 0.20 B, 0.12 R,0.15 G,0.25 B,0.10 R,0.18 G,0.03 B,0.09 R,0.18 G,0.03 B,0.09 R,0.02 G,0.08 B,0.10 R,0.03 G,0.05 B,0.02
Observations and states O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 OBS: RRG G B R G R State: S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S i = U 1 /U 2 /U 3 ; A particular state S: State sequence O: Observation sequence S* = “best” possible state (urn) sequence Goal: Maximize P(S*|O) by choosing “best” S
Grouping terms P(S).P(O|S) =[P(O 0 |S 0 ).P(S 1 |S 0 )]. [P(O 1 |S 1 ).P(S 2 |S 1 )]. [P(O 2 |S 2 ).P(S 3 |S 2 )]. [P(O 3 |S 3 ).P(S 4 |S 3 )]. [P(O 4 |S 4 ).P(S 5 |S 4 )]. [P(O 5 |S 5 ).P(S 6 |S 5 )]. [P(O 6 |S 6 ).P(S 7 |S 6 )]. [P(O 7 |S 7 ).P(S 8 |S 7 )]. [P(O 8 |S 8 ).P(S 9 |S 8 )]. We introduce the states S 0 and S 9 as initial and final states respectively. After S 8 the next state is S 9 with probability 1, i.e., P(S 9 |S 8 )=1 O 0 is ε-transition O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9
Introducing useful notation S0S0 S1S1 S8S8 S7S7 S9S9 S2S2 S3S3 S4S4 S5S5 S6S6 O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ε R R GGBR G R P(O k |S k ).P(S k+1 |S k )=P(S k S k+1 ) OkOk
Viterbi Algorithm for the Urn problem (first two symbols) S0S0 U1U1 U2U2 U3U U1U1 U2U2 U3U U1U1 U2U2 U3U3 U1U1 U2U2 U3U * *0.036 *: winner sequences ε R
Probabilistic FSM (a 1 :0.3) (a 2 :0.4) (a 1 :0.2) (a 2 :0.3) (a 1 :0.1) (a 2 :0.2) (a 1 :0.3) (a 2 :0.2) The question here is: “what is the most likely state sequence given the output sequence seen” S1S1 S2S2
Developing the tree Start S1S2S1S2S1S2S1S2S1S *0.1= *0.2= *0.4= *0.3= *0.2= € a1a1 a2a2 Choose the winning sequence per state per iteration
Tree structure contd… S1S2S1S2S1S *0.1= S S S S a1a1 a2a2 The problem being addressed by this tree is a1-a2-a1-a2 is the output sequence and μ the model or the machine
Path found : (working backward) S1S1 S2S2 S1S1 S2S2 S1S1 a2a2 a1a1 a1a1 a2a2 Problem statement : Find the best possible sequence Start symbolState collectionAlphabet set Transitions T is defined as
Tabular representation of the tree €a1a1 a2a2 a1a1 a2a2 S1S1 1.0(1.0*0.1,0.0*0.2 )=(0.1,0.0) (0.02, 0.09) (0.009, 0.012)(0.0024, ) S2S2 0.0(1.0*0.3,0.0*0.3 )=(0.3,0.0) (0.04,0.0 6) (0.027,0.018)(0.0048, ) Ending state Latest symbol observed Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence.