. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger
2 Estimating model parameters Reminder: Data set Model Parameters: Θ = θ 1, θ 2, θ 3, … MLE inference ? training set
3 Estimating model parameters HMMs: 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 transitions emissions initial Estimate parameters of the model given a training data set The state-path is given along with the sequence The state-path is unknown supervised unsupervised E S S
4 Supervised Learning of HMMs The state-path is given along with the sequence The likelihood of a given set of parameters, Θ : Pr[ X 1 … X L, S 1 … S L |Θ] X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi E S S
5 Supervised Learning of HMMs The state-path is given along with the sequence We wish to find Θ which maximizes Pr[ X 1 … X L, S 1 … S L |Θ] = X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi We maximize independently for each state s : P trans (s, -) and P emit (s,-) MLE for multinomial distribution + pseudo counts E S S
6 Unsupervised Learning of HMMs The sequence is not labeled by states We wish to find Θ which maximizes Pr[ X 1 … X L |Θ] = Σ Š ( Pr[ X 1 … X L, Š|Θ] ) No efficient general-purpose method to find this maximum Heuristic solution: 1. Guess an initial set of parameters 2. Iteratively improve your assessment X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi ? EM algorithm
7 Baum-Welch - EM for HMMs EM (Expectation-Maximization) – Algorithm for learning the parameters from unlabeled sequences Start with some set of parameters (many possible choices) Iterate until convergence: E-step: Compute Pr[S i, X 1,…,X L ], Pr[S i-1, S i, X 1,…,X L ] using current set of parameters - there are L*| S | + (L-1)*| S | 2 such expressions to compute M-step: Use expected counts of transitions/emissions to update new parameter set
8 Start with some set of parameters ( λ=φ=½ ) Iterate until convergence: E-step: Compute Pr[S i-1 =0/1, S i =0/1, X 1,…,X L | λ,φ] using forward / backward algorithms (we will show how) M-step: update λ,φ simultaneously: i Pr[S i-1 =0, S i =1, X 1,…,X L | λ,φ] / i Pr[S i-1 =0, X 1,…,X L | λ,φ] λ i Pr[S i-1 =1, S i =0, X 1,…,X L | λ,φ] / i Pr[S i-1 =1, X 1,…,X L | λ,φ] Example 2-state/2-signal HMM X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi 2 states 2 signal : 2 parameters + pseudo counts
9 Reminder from last week: Decomposing the computation Pr [X 1,…,X L, S i = S ] = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | X 1,…,X i, S i = S ] = = Pr[X 1,…,X i, S i = S ] * Pr[X i+1,…,X L | S i = S ] = = f i (S) * b i (S) X1X1 X2X2 X L-1 XLXL XiXi S1S1 S2S2 S L-1 SLSL SiSi Markov
10 The E-step Pr [S i = S, X 1,…,X L ] = f i (S) * b i (S) (from last week) Pr[S i-1 =S, S i =S’, X 1,…,X L | λ,φ] = f i-1 (S)*Pr trans [S S’]*Pr emit [S’ X i ]*b i (S’) (prove in HW #4) Special case i=L : Pr[S L-1 =S, S L =S’, X 1,…,X L | λ,φ] = f L-1 (S)*Pr trans [S S’]*Pr emit [S’ X i ] define b L (S’)=1 (for all S’ ) X1X1 X2X2 X L-1 XLXL S1S1 S2S2 S L-1 SLSL X i-1 XiXi S i-1 SiSi
11 Coin-Tossing Example Fair/Loade d Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 Reminder:
12 Start with some assignment ( θ = 0.9 ) Iterate until convergence: E-step: Compute Pr[S i-1 =L/F, S i =L/F, X 1,…,X L | θ] using forward / backward algorithms (as previously explained) M-step: update θ : θ i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] / i Pr[(S i-1 =L/F), X 1,…,X L | θ] Example 2-state/2-signal HMM single parameter 0.9 fair loaded H H T T /2 1/4 3/41/2 Start 1/2 (L-1)* Pr[X 1,…,X L | θ] (likelihood )
13 Coin-Tossing Example Last time we calculated: forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded (1) Fair (1) Outcome of 3 tosses: Head, Head, Tail Recall: f i (S) = Pr [X 1,…,X i, S i =S ] = S’ ( f i-1 (S’) * P trans [S’ S ]* P emit [S X i ] ) b i (S) = Pr [X i+1,…,X L | S i =S ] = S’ ( P trans [S S’ ] *P emit [S’ X i+1 ]* b i+1 (S) )
14 Coin-Tossing Example The E-step Outcomes: Head, Head, Tail Pr[S 1 =S, S 2 =S’, HHT | θ] = f 1 (S) * Pr trans [S S’] * Pr emit [S’ H] * b 2 (S’) Pr[S 1 =Loaded, S 2 =Loaded, HHT | θ] = * 0.9 * 0.75 * = Pr[S 1 =Loaded, S 2 =Fair, HHT | θ]= * 0.1 * 0. 5 * = Pr[S 1 =Fair, S 2 =Loaded, HHT | θ] = 0.25 * 0.1 * 0.75 * = Pr[S 1 =Fair, S 2 =Fair, HHT | θ] = 0.25 * 0.9 * 0. 5 * = forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded (1) Fair (1)
15 Outcomes: Head, Head, Tail Pr[S 2 =Loaded, S 3 =Loaded, HHT | θ] = * 0.9 * 0.75 * 1 = Pr[S 2 =Loaded, S 3 =Fair, HHT | θ]= * 0.1 * 0. 5 * 1 = Pr[S 2 =Fair, S 3 =Loaded, HHT | θ] = * 0.1 * 0.75 * 1 = Pr[S 2 =Fair, S 3 =Fair, HHT | θ] = * 0.9 * 0. 5 * 1 = forward S1S1 S2S2 S3S3 Loaded Fair backward S1S1 S2S2 S3S3 Loaded Fair Pr[S 2 =S, S 3 =S’, HHT | θ] = f 2 (S) * Pr trans [S S’] * Pr emit [S’ T] * b 3 (S’) Coin-Tossing Example The E-step (cont)
16 θ (Pr[ S 1 =S 2, HHT | θ] + Pr[ S 2 =S 3, HHT | θ]) / 2*Pr[HHT | θ] We saw last week: Pr[X 1,…,X L | θ]= S ( f L (S) ) Pr[HHT | θ] = = θ (( ) + ( )) / 2* = Continue … converges to ? Coin-Tossing Example The M-step M-step: update θ : θ i Pr[ S i-1 = S i (=L/F), X 1,…,X L | θ] / i Pr[(S i-1 = L/F), X 1,…,X L | θ] (L-1)* Pr[X 1,…,X L | θ] (likelihood ) forward S1S1 S2S2 S3S3 Loaded Fair L L F F
17 Coin-Tossing Example Learning simulation start at start at 0.999