Download presentation
Presentation is loading. Please wait.
1
HMM (Hidden Markov Models)
Xuhua Xia
2
Infer the invisible In science, one often needs to infer what is not visible based on what is visible. Examples in this lecture Detecting missing alleles in a locus Hidden Markov model (the main topic) Xuhua Xia
3
Detecting missing alleles in a locus
Suppose we obtain 30 individuals with a single A band, 30 with a single B band, and 40 individuals with both A and B bands (NA? = 30, NB? = 30, NAB = 40). With H2 (hypothesis of two alleles) which assumes NA? = NAA and NB? = NBB, the proportion of AA, AB and BB genotype are pA2, 2pApB, and pB2, respectively. So, assuming H-W equilibrium, the log-likelihood function for estimating pA is A B which will lead to pA = 0.5, and lnLH2 = We do not need to estimate pa because pa = 1 – pA. With H3 (3 alleles, with a 3rd cryptic allele C) we need to estimate pA and pB. The log-likelihood function is which will lead to pA = pB = (and pC = ), and lnLH3 = LRT or information-theoretic indices, e.g.
4
Markov Models A model of a stochastic process over time or space
Components of a Markov model A set of states A set of dependence A set of transition probabilities Examples: Markov chain of nucleotide (or amino acid) substitutions over time Markov models of site dependence Xuhua Xia
5
Morkov models over 1-dimention
S = RRRYYYRRRYYYRRRYYYRRRYYY RRRYYYRRRYYY……, 0-order Markov model PR = 0.5 PY = 0.5 What is the nucleotide at site i? Site dependence 1st-order Markov model PRR = PYY = 2/3 PYR = PRY =1/3 P(Si|Si-1 = R)? 2nd-order Markov model P(Si|Si-2 = R, Si-1 = Y)? 3rd-order and higher order Markov models: rarely used R Y 2/3 1/3 3 1 2 R Y 1/6 Xuhua Xia
6
Long-range dependence
Xuhua Xia
7
Elements of our Markov chain
States space: {Ssunny, Srainy, Ssnowy} State transition probabilities (transition matrix): A = Initial state distribution: i = ( ) .20 .05 .75 .02 .60 .38 .15 .80 Xuhua Xia
8
The likelihood of a sequence of events
P(Ssunny) x P(Srainy | Ssunny) x P(Srainy | Srainy) x P(Srainy | Srainy) x P(Ssnowy | Srainy) x P(Ssnowy | Ssnowy) = 0.7 x 0.15 x 0.6 x 0.6 x 0.02 x 0.2 = Xuhua Xia
9
An example of HMM Xuhua Xia
10
HMM: A classic example Dishonest casino dealer secretly switching between a fair (F) die with P(i, i=1 to 6) = 1/6 and a loaded (L) die with P(6) = 1 Visible: Hidden: FFFFFFFFFFFFFFLLLLLFFFFFF HMM: 1. State space: {S1, S2,…,SN} 2. State transition probabilities (transition matrix): aij = P(Si,t+1 | Sj,t) 3. Observations: {O1, O2,…,OM} 4. Emission probabilities: bj(k) = P(Ok| Sj) Is it possible to know how many dice the dishonest casino dealer uses? Need specific hypotheses: e.g., H1: he has two H2: he has three F L 1 2 3 4 5 6 PFF PFL 1/6 PLF PLL Is it possible to reconstruct the hidden Markov model and infer all hidden states? Almost never. Success of inference depends mainly on 1) differences in emission probabilities, and 2) how long the chain will stay in each state Xuhua Xia
11
Protein secondary structure
The hidden states are coil (C) strand (E) and helix (H). RT is a real HIV-1 reverse transcriptase. ST indicates the states. Xuhua Xia
12
Transition probabilities over sites
C E H We have learned P_ij (t) over time, i.e., transition from state i to state j over time t. Here P_ij is transition from state i to state j over site. The relatively large values in the diagonal are typical of Markov chains where a state tends to stay in the same state for several consecutive observations (amino acids in our case). Quality of inference increases with the diagonal values. Xuhua Xia
13
Emission probabilities
AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 Quality of inferring the hidden states increases with the difference in emission probabilities. Predicting secondary structure based only on emission probabilities only: T = YVYVEEEEEEVEEEEEEPGPG Naïve = EEEEHHHHHHEHHHHHHCCCC PEH in the transition probability matrix is , implying an extremely small probability of E followed by H. Our naïve prediction above with an H at position 5 (following an E at position 4) therefore represents an extremely unlikely event. Another example is at position 11 with T11 = V. Our prediction of Naïve11 = E implies a transition of secondary structure from H to E (Naïve10 and Naïve11) and then back from E to H (Naïve11 and Naïve12). The transition probability matrix shows us that PHE and PEH are both very small. So S11 is very unlikely to be E. Xuhua Xia
14
Infer the hidden states: Viterbi algorithm
A dynamic programming algorithm Incorporate information from both the transition probability matrix and the emission probability matrix Need two matrices Viterbi matrix (Vij): log-likelihood of state j at position i. Backtrack matrix (B) Both of dimension NumState × SeqLen Output: a reconstructed hidden state sequence Example: predicting secondary structure of sequence YVYVEEEEEEVEEEEEEPGPG Xuhua Xia
15
Viterbi algorithm V matrix C E H 0.88210 0.06987 0.04803 0.26154
AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 First row of V matrix: V matrix C E H Y -4.741 -2.97 -6.075 V P G Explanation of the two equations in the top-left: Eq. 1: k = C, E, or H, n is the number of hidden states (=3 in our example), and X_1 is the amino acid corresponding to C, E, and H, e.g. V_C(1)=e_C(Y)/n = /3 (actually the natural logarithm of it) Eq. 2: e.g., V_C(2) =e_C(V)*max(V_C(1)P_CC,V_E(1)P_CE,V_H(1)P_CH) ln /3 Viterbi algorithm
16
Viterbi algorithm lnL of state j at position i V matrix C E H 0.88210
AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 2+ row of V matrix: V matrix lnL of state j at position i C E H Y -4.741 -2.97 -6.075 V -7.548 -4.963 -9.185 -9.946 -7.138 -14.24 -11.72 -9.131 -16.01 -13.07 -12.92 -16.73 -15.8 -16.7 -18.09 -18.52 -20.48 -20.15 -21.25 -24.27 -22.21 -23.98 -27.39 -26.7 -30.12 -26.33 -30.06 -31.05 -29.44 -32.79 -34.84 -31.5 -35.52 -38.62 -33.56 -38.24 -41.66 -35.62 -40.89 -44.08 -37.68 -42.95 -46.14 -39.74 -45.01 -48.2 -41.8 P -46.44 -50.95 -46.16 G -48.91 -237.9 -50.52 -51 -55.74 -54.89 -53.47 -242.5 -58.32 The V matrix shows ln, e.g., ln( ) = The necessity of take logarithm: multiplication of increasing number of probability values leading to increasingly small values. Viterbi algorithm
17
B matrix Viterbi algorithm C E H 0.88210 0.06987 0.04803 0.26154
AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 B matrix V Matrix B Matrix C E H C(0) E (1) H(2) HS Y V 1 2 P G Viterbi algorithm
18
The Viterbi Algorithm Dynamic programming algorithm
Score matrix V Backtrack matrix B C E H T = YVYVEEEEEEVEEEEEEPGPG Viterbi = EEEECHHHHHHHHHHHHCCCC Naïve = EEEEHHHHHHEHHHHHHCCCC Xuhua Xia
19
Objectives and computation in HMM
Define the model structure, e.g., number of hidden states, number of observed events Obtain training data Training HMM to obtain transition probability matrix and emission probabilities Viterbi algorithm to reconstruct the hidden states Forward algorithm to compute the probability of the observed sequence Utility again: Better understanding Better prediction Xuhua Xia
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.