HMM (Hidden Markov Models) Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
Infer the invisible In science, one often needs to infer what is not visible based on what is visible. Examples in this lecture Detecting missing alleles in a locus Hidden Markov model (the main topic) Xuhua Xia
Detecting missing alleles in a locus Suppose we obtain 30 individuals with a single A band, 30 with a single B band, and 40 individuals with both A and B bands (NA? = 30, NB? = 30, NAB = 40). With H2 (hypothesis of two alleles) which assumes NA? = NAA and NB? = NBB, the proportion of AA, AB and BB genotype are pA2, 2pApB, and pB2, respectively. So, assuming H-W equilibrium, the log-likelihood function for estimating pA is A B which will lead to pA = 0.5, and lnLH2 = -110.90355. We do not need to estimate pa because pa = 1 – pA. With H3 (3 alleles, with a 3rd cryptic allele C) we need to estimate pA and pB. The log-likelihood function is which will lead to pA = pB = 0.4667 (and pC = 0.0666), and lnLH3 = -109.62326. LRT or information-theoretic indices, e.g.
Markov Models A model of a stochastic process over time or space Components of a Markov model A set of states A set of dependence A set of transition probabilities Examples: Markov chain of nucleotide (or amino acid) substitutions over time Markov models of site dependence Xuhua Xia
Morkov models over 1-dimention S = RRRYYYRRRYYYRRRYYYRRRYYY RRRYYYRRRYYY……, 0-order Markov model PR = 0.5 PY = 0.5 What is the nucleotide at site i? Site dependence 1st-order Markov model PRR = PYY = 2/3 PYR = PRY =1/3 P(Si|Si-1 = R)? 2nd-order Markov model P(Si|Si-2 = R, Si-1 = Y)? 3rd-order and higher order Markov models: rarely used R Y 2/3 1/3 3 1 2 R Y 1/6 Xuhua Xia
Long-range dependence Xuhua Xia
Elements of our Markov chain States space: {Ssunny, Srainy, Ssnowy} State transition probabilities (transition matrix): A = Initial state distribution: i = (.7 .25 .05) .20 .05 .75 .02 .60 .38 .15 .80 Xuhua Xia
The likelihood of a sequence of events P(Ssunny) x P(Srainy | Ssunny) x P(Srainy | Srainy) x P(Srainy | Srainy) x P(Ssnowy | Srainy) x P(Ssnowy | Ssnowy) = 0.7 x 0.15 x 0.6 x 0.6 x 0.02 x 0.2 = 0.0001512 Xuhua Xia
An example of HMM Xuhua Xia
HMM: A classic example Dishonest casino dealer secretly switching between a fair (F) die with P(i, i=1 to 6) = 1/6 and a loaded (L) die with P(6) = 1 Visible: 1354634255112466666122363 Hidden: FFFFFFFFFFFFFFLLLLLFFFFFF HMM: 1. State space: {S1, S2,…,SN} 2. State transition probabilities (transition matrix): aij = P(Si,t+1 | Sj,t) 3. Observations: {O1, O2,…,OM} 4. Emission probabilities: bj(k) = P(Ok| Sj) Is it possible to know how many dice the dishonest casino dealer uses? Need specific hypotheses: e.g., H1: he has two H2: he has three F L 1 2 3 4 5 6 PFF PFL 1/6 PLF PLL Is it possible to reconstruct the hidden Markov model and infer all hidden states? Almost never. Success of inference depends mainly on 1) differences in emission probabilities, and 2) how long the chain will stay in each state Xuhua Xia
Protein secondary structure The hidden states are coil (C) strand (E) and helix (H). RT is a real HIV-1 reverse transcriptase. ST indicates the states. Xuhua Xia
Transition probabilities over sites C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 We have learned P_ij (t) over time, i.e., transition from state i to state j over time t. Here P_ij is transition from state i to state j over site. The relatively large values in the diagonal are typical of Markov chains where a state tends to stay in the same state for several consecutive observations (amino acids in our case). Quality of inference increases with the diagonal values. Xuhua Xia
Emission probabilities AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 Quality of inferring the hidden states increases with the difference in emission probabilities. Predicting secondary structure based only on emission probabilities only: 123456789012345678901 T = YVYVEEEEEEVEEEEEEPGPG Naïve = EEEEHHHHHHEHHHHHHCCCC PEH in the transition probability matrix is 0.00000, implying an extremely small probability of E followed by H. Our naïve prediction above with an H at position 5 (following an E at position 4) therefore represents an extremely unlikely event. Another example is at position 11 with T11 = V. Our prediction of Naïve11 = E implies a transition of secondary structure from H to E (Naïve10 and Naïve11) and then back from E to H (Naïve11 and Naïve12). The transition probability matrix shows us that PHE and PEH are both very small. So S11 is very unlikely to be E. Xuhua Xia
Infer the hidden states: Viterbi algorithm A dynamic programming algorithm Incorporate information from both the transition probability matrix and the emission probability matrix Need two matrices Viterbi matrix (Vij): log-likelihood of state j at position i. Backtrack matrix (B) Both of dimension NumState × SeqLen Output: a reconstructed hidden state sequence Example: predicting secondary structure of sequence YVYVEEEEEEVEEEEEEPGPG Xuhua Xia
Viterbi algorithm V matrix C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 First row of V matrix: V matrix C E H Y -4.741 -2.97 -6.075 V P G Explanation of the two equations in the top-left: Eq. 1: k = C, E, or H, n is the number of hidden states (=3 in our example), and X_1 is the amino acid corresponding to C, E, and H, e.g. V_C(1)=e_C(Y)/n = 0.0262/3 (actually the natural logarithm of it) Eq. 2: e.g., V_C(2) =e_C(V)*max(V_C(1)P_CC,V_E(1)P_CE,V_H(1)P_CH) -4.74061 -2.97006 -6.07485 ln 0.008734 0.051282 0.002299 /3 Viterbi algorithm
Viterbi algorithm lnL of state j at position i V matrix C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 2+ row of V matrix: V matrix lnL of state j at position i C E H Y -4.741 -2.97 -6.075 V -7.548 -4.963 -9.185 -9.946 -7.138 -14.24 -11.72 -9.131 -16.01 -13.07 -12.92 -16.73 -15.8 -16.7 -18.09 -18.52 -20.48 -20.15 -21.25 -24.27 -22.21 -23.98 -27.39 -26.7 -30.12 -26.33 -30.06 -31.05 -29.44 -32.79 -34.84 -31.5 -35.52 -38.62 -33.56 -38.24 -41.66 -35.62 -40.89 -44.08 -37.68 -42.95 -46.14 -39.74 -45.01 -48.2 -41.8 P -46.44 -50.95 -46.16 G -48.91 -237.9 -50.52 -51 -55.74 -54.89 -53.47 -242.5 -58.32 The V matrix shows ln, e.g., ln(0.000527104) = -7.548 The necessity of take logarithm: multiplication of increasing number of probability values leading to increasingly small values. Viterbi algorithm
B matrix Viterbi algorithm C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 B matrix V Matrix B Matrix C E H C(0) E (1) H(2) HS Y -4.74057 -2.97041 -6.07535 V -7.54809 -4.96308 -9.18506 1 2 -9.94622 -7.13807 -14.24069 -11.71574 -9.13074 -16.01287 -13.07242 -12.91516 -16.73257 -15.79838 -16.69959 -18.08925 -18.52435 -20.48402 -20.14914 -21.25031 -24.26844 -22.20904 -23.97627 -27.39268 -24.26893 -26.70223 -30.11864 -26.32883 -30.06419 -31.05285 -29.43855 -32.79015 -34.83727 -31.49844 -35.51611 -38.62170 -33.55834 -38.24207 -41.65849 -35.61823 -40.89289 -44.07621 -37.67813 -42.95279 -46.13610 -39.73802 -45.01268 -48.19600 -41.79792 P -46.44005 -50.94904 -46.16040 G -48.90819 -237.91316 -50.52288 -51.00163 -55.74371 -54.88536 -53.46976 -242.47474 -58.32104 Viterbi algorithm
The Viterbi Algorithm Dynamic programming algorithm Score matrix V Backtrack matrix B C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 123456789012345678901 T = YVYVEEEEEEVEEEEEEPGPG Viterbi = EEEECHHHHHHHHHHHHCCCC Naïve = EEEEHHHHHHEHHHHHHCCCC Xuhua Xia
Objectives and computation in HMM Define the model structure, e.g., number of hidden states, number of observed events Obtain training data Training HMM to obtain transition probability matrix and emission probabilities Viterbi algorithm to reconstruct the hidden states Forward algorithm to compute the probability of the observed sequence Utility again: Better understanding Better prediction Xuhua Xia