. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.

Slides:

Advertisements

Similar presentations

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.

Advertisements

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Learning HMM parameters

EM Algorithm Jur van den Berg.

The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.

Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.

Parameter Estimation using likelihood functions Tutorial #1

. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.

Hidden Markov Models Tunghai University Fall 2005.

. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.

. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Hidden Markov Models.

. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Hidden Markov Models Modified from:

Statistical NLP: Lecture 11

Statistical NLP: Hidden Markov Models Updated 8/12/2005.

. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.

. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

… Hidden Markov Models Markov assumption: Transition model:

. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.

. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).

The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.

. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.

. Computational Genomics Lecture 8a Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.

. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.

Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.

. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.

. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

CALCULATE THE PROBABILITY OF AN EVENT. 1.ANSWER THIS QUESTION: IS THE EVENT POSSIBLE? STOP: DON’T CONTINUE. THE PROBABILITY OF THE EVENT IS O GO TO NUMBER.

. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Viterbi, Forward, and Backward Algorithms for Hidden Markov Models Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

Hidden Markov Models CISC 5800 Professor Daniel Leeds.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Markov Chains Tutorial #5

Introduction to EM algorithm

Ab Initio Profile HMM Generation

Markov Chains Tutorial #5

Presentation transcript:

. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev

2 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called Expectation-Maximization (EM). We will devote several classes to it. In the current context it reads as follows: Start with some probability tables (many possible choices) Iterate until convergence E-step: Compute p(s i, s i-1, x 1,…,x L ) using the current probability tables (“current parameters”). Comment: If each s i has k possible values, there are k*k such expressions. M-step: use the Expected counts found to update the local probability tables We focus today on the E-step

3 Example I: Homogenous HMM, one sample Start with some probability tables (say  = = ½) Iterate until convergence E-step: Compute p , (s i, s i -1,x 1,…,x L ) using the forward- backward algorithm as will be soon explained. M-step: Update the parameters simultaneously:    i [ p , (s i =1, s i-1 =0, x 1,…,x L )]/  i [ p , (s i-1 =0, x 1,…,x L )]   i [ p , (s i =0, s i-1 =1, x 1,…,x L )]/  i [ p , (s i-1 =1, x 1,…,x L )] S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi

4 Decomposing the computation (from previous tutorial) P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | x 1,…,x i,s i ) S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi = P(x 1,…,x i,s i ) P(x i+1,…,x L | s i )  f(s i ) b(s i ) Answer: P(s i | x 1,…,x L ) = (1/K) P(x 1,…,x L,s i ) where K=  si P(x 1,…,x L,s i ).

5 The E-step We already know how to do this computation P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | s i )  f(s i ) b(s i ) Now we wish to compute (for the E-step) p(x 1,…,x L,s i,s i+1 )= = f(s i ) p(s i+1 |s i ) p(x i+1 |s i+1 ) b(s i+1 ) p(x 1,…,x i,s i ) p(s i+1 |s i )p(x i+1 |s i+1 )p(x i+2,…,x L |s i+1 ) S1S1 S2S2 X1X1 X2X2 S L-1 SLSL X L-1 XLXL SiSi XiXi S i+1 X i+1 Special case p(x 1,…,x L,s L-1,s L )= = f(s L-1 ) p(s L |s L-1 ) p(x L |s L ) p(x 1,…,x L-1,s L-1 ) p(s L |s L-1 )p(x L |s L ) {define b(s L )  1}

6 Coin-Tossing Example 0.9 Fair loaded head tail /2 1/4 3/4 1/2 S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi L tosses Fair/Loade d Head/Tail Start 1/2

7 Example II: Homogenous HMM, one sample Start with some probability tables Iterate until convergence E-step: Compute p  (s i, s i -1,x 1,…,x L ) using the forward- backward algorithm as explained earlier. M-step: Update the parameter:    i [ p  (s i =1, s i-1 =1,x 1,…,x L )+p  (s i =0, s i-1 =0,x 1,…,x L )]/  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] (will be simplified later) S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi

8 S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL SiSi XiXi Coin-Tossing Example Numeric example: 3 tosses Outcomes: head, head, tail

9 Coin-Tossing Example Numeric example: 3 tosses, Outcomes: head, head, tail Last time we calculated: forwardS1S1 S2S2 S3S3 loaded fair backwardS1S1 S2S2 S3S3 loaded (1) fair (1) f(s i )=P(x 1,…,x i,s i ) =  P(x 1,…,x i-1, s i-1 ) P(s i | s i-1 ) P(x i | s i ) s i-1 Recall: b(s i ) = P(x i+1,…,x L |s i )= P(x i+1,…,x L |s i ) =  P(s i+1 | s i ) P(x i+1 | s i+1 ) b(s i+1 ) s i+1

10 Coin-Tossing Example Outcomes: head, head, tail f(s 1 =loaded) = 0.375, f(s 1 =fair) = 0.25 b(s 2 =loaded) = 0.275, b(s 2 =fair) = p(x 1,x 2,x 3,s 1,s 2 )=f(s 1 ) p(s 2 |s 1 ) p(x 2 |s 2 ) b(s 2 ) p(x 1,x 2,x 3,s 1 =loaded,s 2 =loaded)= 0.375*0.9*0.75*0.275= p(x 1,x 2,x 3,s 1 =loaded,s 2 =fair)= 0.375*0.1*0.5*0.475= p(x 1,x 2,x 3,s 1 =fair,s 2 =loaded)= 0.25*0.1*0.75*0.275= p(x 1,x 2,x 3,s 1 =fair,s 2 =fair)= 0.25*0.9*0.5*0.475=0.0534

11 Coin-Tossing Example Outcomes: head, head, tail f(s 2 =loaded) = , f(s 2 =fair) = b(s 3 =loaded) = 1, b(s 3 =fair) = 1 p(x 1,x 2,x 3,s 2,s 3 )=f(s 2 ) p(s 3 |s 2 ) p(x 3 |s 3 ) b(s 3 ) p(x 1,x 2,x 3,s 2 =loaded,s 3 =loaded)= *0.9*0.25*1= p(x 1,x 2,x 3,s 2 =loaded,s 3 =fair)= *0.1*0.5*1= p(x 1,x 2,x 3,s 2 =fair,s 3 =loaded)= *0.1*0.25*1= p(x 1,x 2,x 3,s 2 =fair,s 3 =fair)= *0.9*0.5*1=0.0591

12 M-step M-step: Update the parameters simultaneously: (in this case we only have one parameter -  )    i [ p  (s i =1, s i-1 =1,x 1,…,x L )+p  (s i =0, s i-1 =0,x 1,…,x L )]/  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] The denominator:  i [ p  (s i-1 =1,x 1,…,x L )+p  (s i-1 =0,x 1,…,x L )] = =  i [ p  (x 1,…,x L )] = (L-1) p  (x 1,…,x L ) In previous tutorial we saw that p  (x 1,…,x L ) =  f(s L )

13 M-step (cont.) M-step: In our example:   [ p(x 1,x 2,x 3,s 1 =l,s 2 =l) + p(x 1,x 2,x 3,s 1 =f,s 2 =f) + p(x 1,x 2,x 3,s 2 =l,s 3 =l) + p(x 1,x 2,x 3,s 2 =f,s 3 =f)]/ [2 * (f(s 3 =l) + f(s 3 =f)]   [ ]/ [2 * ( )] =  converges to 0.4