. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.

. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev

2 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called Expectation-Maximization (EM). We will devote several classes to it. In the current context it reads as follows: Start with some probability tables (many possible choices) Iterate until convergence E-step: Compute p(s i, s i-1, x 1,…,x L ) using the current probability tables (“current parameters”). Comment: If each s i has k possible values, there are k*k such expressions. M-step: use the Expected counts found to update the local probability tables We focus today on the E-step

3 Example I: Homogenous HMM, one sample Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (s i, s i -1,x 1,…,x L ) using the forward- backward algorithm as will be soon explained. M-step: Update the parameters simultaneously:    i [ p , (s i =1, s i-1 =0, x 1,…,x L )]/  i [ p , (s i-1 =0, x 1,…,x L )]   i [ p , (s i =0, s i-1 =1, x 1,…,x L )]/  i [ p , (s i-1 =1, x 1,…,x L )] S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi

4 Decomposing the computation (from previous tutorial) P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | x 1,…,x i,s i ) S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi = P(x 1,…,x i,s i ) P(x i+1,…,x L | s i )  f(s i ) b(s i ) Answer: P(s i | x 1,…,x L ) = (1/K) P(x 1,…,x L,s i ) where K=  si P(x 1,…,x L,s i ).

5 The E-step We already know how to do this computation P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | s i )  f(s i ) b(s i ) Now we wish to compute (for the E-step) p(x 1,…,x L,s i,s i+1 )= = f(s i ) p(s i+1 |s i ) p(x i+1 |s i+1 ) b(s i+1 ) p(x 1,…,x i,s i ) p(s i+1 |s i )p(x i+1 |s i+1 )p(x i+2,…,x L |s i+1 ) S1S1 S2S2 X1X1 X2X2 S L-1 SLSL X L-1 XLXL SiSi XiXi S i+1 X i+1 Special case p(x 1,…,x L,s L-1,s L )= = f(s L-1 ) p(s L |s L-1 ) p(x L |s L ) p(x 1,…,x L-1,s L-1 ) p(s L |s L-1 )p(x L |s L ) {define b(s L )  1}

6 Coin-Tossing Example 0.9 Fair loaded head tail 0.9 0.1 1/2 1/4 3/4 1/2 S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi L tosses Fair/Loade d Head/Tail Start 1/2

7 Example II: Homogenous HMM, one sample Start with some probability tables Iterate until convergence E-step: Compute p  (s i, s i -1,x 1,…,x L ) using the forward- backward algorithm as explained earlier. M-step: Update the parameter:    i [ p  (s i =1, s i-1 =1,x 1,…,x L )+p  (s i =0, s i-1 =0,x 1,…,x L )]/  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] (will be simplified later) S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi

8 S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi Coin-Tossing Example Numeric example: 3 tosses Outcomes: head, head, tail

9 Coin-Tossing Example Numeric example: 3 tosses, Outcomes: head, head, tail Last time we calculated: forwardS1S1 S2S2 S3S3 loaded0.3750.2718750.06445 fair0.250.131250.07265 backwardS1S1 S2S2 S3S3 loaded0.2090.275(1) fair0.2340.475(1) f(s i )=P(x 1,…,x i,s i ) =  P(x 1,…,x i-1, s i-1 ) P(s i | s i-1 ) P(x i | s i ) s i-1 Recall: b(s i ) = P(x i+1,…,x L |s i )= P(x i+1,…,x L |s i ) =  P(s i+1 | s i ) P(x i+1 | s i+1 ) b(s i+1 ) s i+1

10 Coin-Tossing Example Outcomes: head, head, tail f(s 1 =loaded) = 0.375, f(s 1 =fair) = 0.25 b(s 2 =loaded) = 0.275, b(s 2 =fair) = 0.475 p(x 1,x 2,x 3,s 1,s 2 )=f(s 1 ) p(s 2 |s 1 ) p(x 2 |s 2 ) b(s 2 ) p(x 1,x 2,x 3,s 1 =loaded,s 2 =loaded)= 0.375*0.9*0.75*0.275=0.0696 p(x 1,x 2,x 3,s 1 =loaded,s 2 =fair)= 0.375*0.1*0.5*0.475=0.0089 p(x 1,x 2,x 3,s 1 =fair,s 2 =loaded)= 0.25*0.1*0.75*0.275=0.0052 p(x 1,x 2,x 3,s 1 =fair,s 2 =fair)= 0.25*0.9*0.5*0.475=0.0534

11 Coin-Tossing Example Outcomes: head, head, tail f(s 2 =loaded) = 0.271875, f(s 2 =fair) = 0.13125 b(s 3 =loaded) = 1, b(s 3 =fair) = 1 p(x 1,x 2,x 3,s 2,s 3 )=f(s 2 ) p(s 3 |s 2 ) p(x 3 |s 3 ) b(s 3 ) p(x 1,x 2,x 3,s 2 =loaded,s 3 =loaded)= 0.271875*0.9*0.25*1=0.0612 p(x 1,x 2,x 3,s 2 =loaded,s 3 =fair)= 0.271875*0.1*0.5*1=0.0136 p(x 1,x 2,x 3,s 2 =fair,s 3 =loaded)= 0.13125*0.1*0.25*1=0.0033 p(x 1,x 2,x 3,s 2 =fair,s 3 =fair)= 0.13125*0.9*0.5*1=0.0591

12 M-step M-step: Update the parameters simultaneously: (in this case we only have one parameter -  )    i [ p  (s i =1, s i-1 =1,x 1,…,x L )+p  (s i =0, s i-1 =0,x 1,…,x L )]/  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] The denominator:  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] = =  i [ p  (x 1,…,x L )] = (L-1) p  (x 1,…,x L ) In previous tutorial we saw that p  (x 1,…,x L ) =  f(s L )

13 M-step (cont.) M-step: In our example:   [ p(x 1,x 2,x 3,s 1 =l,s 2 =l) + p(x 1,x 2,x 3,s 1 =f,s 2 =f) + p(x 1,x 2,x 3,s 2 =l,s 3 =l) + p(x 1,x 2,x 3,s 2 =f,s 3 =f)]/ [2 * (f(s 3 =l) + f(s 3 =f)]   [ 0.0696 + 0.0534 + 0.0612 + 0.0591]/ [2 * (0.06445 + 0.07265)] = 0.8873  converges to 0.4

. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.

Similar presentations

Presentation on theme: ". EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.

Similar presentations

Presentation on theme: ". EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev."— Presentation transcript:

Similar presentations

About project

Feedback