. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Expectation Maximization
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Hidden Markov Models Tunghai University Fall 2005.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Usman Roshan BNFO 601.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Lecture 5: Learning models using EM
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
. Computational Genomics Lecture 8a Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Lecture 19: More EM Machine Learning April 15, 2010.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
CSE 517 Natural Language Processing Winter 2015
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Markov Chains Tutorial #5
Introduction to EM algorithm
Markov Chains Tutorial #5
Presentation transcript:

. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger

2 Estimating model parameters Reminder: Data set Model Parameters: Θ = θ 1, θ 2, θ 3, … MLE inference ? training set

3 Estimating model parameters HMMs: 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 transitions emissions initial Estimate parameters of the model given a training data set The state-path is given along with the sequence The state-path is unknown  supervised  unsupervised E S S

4 Supervised Learning of HMMs The state-path is given along with the sequence The likelihood of a given set of parameters, Θ : Pr[ X 1 … X L, S 1 … S L |Θ] X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi E S S

5 Supervised Learning of HMMs The state-path is given along with the sequence We wish to find Θ which maximizes Pr[ X 1 … X L, S 1 … S L |Θ] = X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi We maximize independently for each state s : P trans (s, -) and P emit (s,-) MLE for multinomial distribution + pseudo counts E S S

6 Unsupervised Learning of HMMs The sequence is not labeled by states We wish to find Θ which maximizes Pr[ X 1 … X L |Θ] = Σ Š ( Pr[ X 1 … X L, Š|Θ] )  No efficient general-purpose method to find this maximum Heuristic solution: 1. Guess an initial set of parameters 2. Iteratively improve your assessment X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi ? EM algorithm

7 Baum-Welch - EM for HMMs EM (Expectation-Maximization) – Algorithm for learning the parameters from unlabeled sequences Start with some set of parameters (many possible choices) Iterate until convergence: E-step: Compute Pr[S i, X 1,…,X L ], Pr[S i-1, S i, X 1,…,X L ] using current set of parameters - there are L*| S | + (L-1)*| S | 2 such expressions to compute M-step: Use expected counts of transitions/emissions to update new parameter set

8 Start with some set of parameters ( λ=φ=½ ) Iterate until convergence: E-step: Compute Pr[S i-1 =0/1, S i =0/1, X 1,…,X L | λ,φ] using forward / backward algorithms (we will show how) M-step: update λ,φ simultaneously:    i Pr[S i-1 =0, S i =1, X 1,…,X L | λ,φ] /  i Pr[S i-1 =0, X 1,…,X L | λ,φ] λ   i Pr[S i-1 =1, S i =0, X 1,…,X L | λ,φ] /  i Pr[S i-1 =1, X 1,…,X L | λ,φ] Example 2-state/2-signal HMM X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi 2 states 2 signal : 2 parameters + pseudo counts

9 Reminder from last week: Decomposing the computation Pr [X 1,…,X L, S i = S ] = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | X 1,…,X i, S i = S ] = = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | S i = S ] = = f i (S) * b i (S) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi Markov

10 The E-step Pr [S i = S, X 1,…,X L ] = f i (S) * b i (S) (from last week) Pr[S i-1 =S, S i =S’, X 1,…,X L | λ,φ] = f i-1 (S)*Pr trans [S  S’]*Pr emit [S’  X i ]*b i (S’) (prove in HW #4) Special case i=L : Pr[S L-1 =S, S L =S’, X 1,…,X L | λ,φ] = f L-1 (S)*Pr trans [S  S’]*Pr emit [S’  X i ]  define b L (S’)=1 (for all S’ ) X1X1 X2X2 X L-1 XLXL S1S1 S2S2 S L-1 SLSL X i-1 XiXi S i-1 SiSi

11 Coin-Tossing Example Fair/Loade d Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Reminder:

12 Start with some assignment ( θ = 0.9 ) Iterate until convergence: E-step: Compute Pr[S i-1 =L/F, S i =L/F, X 1,…,X L | θ] using forward / backward algorithms (as previously explained) M-step: update θ : θ   i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] /  i Pr[(S i-1 =L/F), X 1,…,X L | θ] Example 2-state/2-signal HMM single parameter 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 (L-1)* Pr[X 1,…,X L | θ] (likelihood )

13 Coin-Tossing Example Last time we calculated: forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded (1) Fair (1) Outcome of 3 tosses: Head, Head, Tail Recall: f i (S) = Pr [X 1,…,X i, S i =S ] =  S’ ( f i-1 (S’) * P trans [S’  S ]* P emit [S  X i ] ) b i (S) = Pr [X i+1,…,X L | S i =S ] =  S’ ( P trans [S  S’ ] *P emit [S’  X i+1 ]* b i+1 (S) )

14 Coin-Tossing Example The E-step Outcomes: Head, Head, Tail Pr[S 1 =S, S 2 =S’, HHT | θ] = f 1 (S) * Pr trans [S  S’] * Pr emit [S’  H] * b 2 (S’) Pr[S 1 =Loaded, S 2 =Loaded, HHT | θ] = * 0.9 * 0.75 * = Pr[S 1 =Loaded, S 2 =Fair, HHT | θ]= * 0.1 * 0. 5 * = Pr[S 1 =Fair, S 2 =Loaded, HHT | θ] = 0.25 * 0.1 * 0.75 * = Pr[S 1 =Fair, S 2 =Fair, HHT | θ] = 0.25 * 0.9 * 0. 5 * = forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded (1) Fair (1)

15 Outcomes: Head, Head, Tail Pr[S 2 =Loaded, S 3 =Loaded, HHT | θ] = * 0.9 * 0.75 * 1 = Pr[S 2 =Loaded, S 3 =Fair, HHT | θ]= * 0.1 * 0. 5 * 1 = Pr[S 2 =Fair, S 3 =Loaded, HHT | θ] = * 0.1 * 0.75 * 1 = Pr[S 2 =Fair, S 3 =Fair, HHT | θ] = * 0.9 * 0. 5 * 1 = forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded Fair Pr[S 2 =S, S 3 =S’, HHT | θ] = f 2 (S) * Pr trans [S  S’] * Pr emit [S’  T] * b 3 (S’) Coin-Tossing Example The E-step (cont)

16 θ  (Pr[ S 1 =S 2, HHT | θ] + Pr[ S 2 =S 3, HHT | θ]) / 2*Pr[HHT | θ] We saw last week: Pr[X 1,…,X L | θ]=  S ( f L (S) ) Pr[HHT | θ] = = θ  (( ) + ( )) / 2* = Continue …  converges to ? Coin-Tossing Example The M-step M-step: update θ : θ   i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] /  i Pr[(S i-1 = L/F), X 1,…,X L | θ] (L-1)* Pr[X 1,…,X L | θ] (likelihood ) forward S1S1 S2S2 S3S3 Loaded Fair L  L F  F

17 Coin-Tossing Example Learning simulation start at start at 0.999