. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
2 Nothing is hidden H1H1 H2H2 H L-1 HLHL HiHi Maximum likelihood: P(H 1 =t) = N t /(N t +N f ) H1H1 Maximum likelihood: P(H 2 =t|H 1 =t) = N t,t /(N t,t +N f,t ) And so on for every edge - independently. Equal-prior MAP: P(H 1 =t) = (a+N t )/(a+N t + a+N f ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi How to extend to hidden variables ?
3 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called Expectation-Maximization (EM). In the HMM context it reads as follows: Start with some initial probability tables (many choices) Iterate until convergence M-step: use these Expected counts to update the local probability tables via the Maximum likelihood formula. E-step: Compute p(h i,h i-1 | x 1,…,x L ) using the current probability tables (“current parameters”).
4 Example I: Homogenous HMM, one sample Start with some probability tables (say = = ½) Iterate until convergence E-step: Compute p , (h i |h i -1,x 1,…,x L ) from p , (h i, h i -1 | x 1,…,x L ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi M-step: Update the parameters simultaneously: i p , (h i =1 | h i-1 =0, x 1,…,x L )/(L-1) i p , (h i =0 | h i-1 =1, x 1,…,x L )/(L-1)
5 Example II: Homogenous HMM, N samples Start with some probability tables (say = = ½) Iterate until convergence E-step: Compute p , (h i | h i -1, [x 1,…,x L ] j ) for j=1,..,N, from p , (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Changes in the equations due to N>1 are marked in bold blue. M-step: Update the parameters simultaneously: j i p , (h i =1 | h i-1 =0, [x 1,…,x L ] j )/ N(L-1) j i p , (h i =0 | h i-1 =1, [x 1,…,x L ] j )/ N(L-1)
6 Example III: Non Homogenous, N samples Start with some probability tables (say = = ½) Iterate until convergence E-step: Compute p i, i (h i | h i -1, [x 1,…,x L ] j ) for j=1,...,N, from p i, i (h i, h i -1 | [x 1,…,x L ] j ) which is computed using the forward- backward algorithm as explained earlier. M-step: Update the parameters simultaneously: i j p i, i (h i =1 | h i-1 =0, [x 1,…,x L ] j )/N i j p i, i (h i =0 | h i-1 =1, [x 1,…,x L ] j )/N H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Summation over i is now dropped; The factor L-1 dropped.
7 Example IV: missing emission probabilities H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Exercise: Write equations for the parameters , ’ and . Hint: compute P(x i,h i |Data). Often the learned parameters are collectively denoted by . E.g., in the context of homogeneous HMMs, if all parameters are learned from data, then ={ , ’, , , }.
8 Viterbi Training Start with some probability tables (many possible choices) Iterate until convergence E-step (new): Compute using the current parameters. M-step: use these Expected counts to update the local probability tables via Maximum likelihood (=N s1 s2 /N). Comments: Useful when the posterior probability centers around the MAP value. Avoids the inconsistency of adding up each link separately. E.g., one can not have H 1 =0, H 2 =1 and H 2 =0, H 3 =1, simultaneously, as we did earlier. Summing over all joint options is exponential. A common variant of the EM algorithm for HMMs.
9 Summary of HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Belief update = posterior decoding Forward-Backward algorithm 2.Maximum A Posteriori assignment Viterbi algorithm 3.Learning parameters The EM algorithm Viterbi training
10 Some applications of HMMs H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1. Haplotyping 2. Gene mapping 3. Speech recognition, finance, 4. … you name it… everywhere
11 Haplotyping G1G1 G2G2 G L-1 GLGL H1H1 H2H2 H L-1 HLHL HiHi GiGi H1H1 H2H2 HLHL HiHi Every G i is an unordered pair of letters {aa,ab,bb}. The source of one letter is the first chain and the source of the other letter is the second chain. Which letter comes from which chain ? (Is it a paternal or maternal DNA?(
12 Model of Inheritance Example with two parents and one child. 1 One locus i i+1 More loci 3 children