HMM (Hidden Markov Models)

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
The Occasionally Dishonest Casino Narrated by: Shoko Asei Alexander Eng.
Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Profiles for Sequences
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Profile-profile alignment using hidden Markov models Wing Wong.
Announcements Midterm scores (without challenge problem): –Median 85.5, mean 79, std 16. –Roughly, ~A, ~B,
Hidden Markov Models: an Introduction by Rachel Karchin.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
Class 5 Hidden Markov models. Markov chains Read Durbin, chapters 1 and 3 Time is divided into discrete intervals, t i At time t, system is in one of.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
Introduction to Profile Hidden Markov Models
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CS Statistical Machine learning Lecture 24
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
(H)MMs in gene prediction and similarity searches.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models BMI/CS 576
Hidden Markov Models.
Ab initio gene prediction
CSCI 5822 Probabilistic Models of Human and Machine Learning
Hidden Markov Models Part 2: Algorithms
1.
Professor of Computer Science and Mathematics
Hidden Markov Models (HMMs)
Professor of Computer Science and Mathematics
Professor of Computer Science and Mathematics
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

HMM (Hidden Markov Models) Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca

Infer the invisible In science, one often needs to infer what is not visible based on what is visible. Examples in this lecture Detecting missing alleles in a locus Hidden Markov model (the main topic) Xuhua Xia

Detecting missing alleles in a locus Suppose we obtain 30 individuals with a single A band, 30 with a single B band, and 40 individuals with both A and B bands (NA? = 30, NB? = 30, NAB = 40). With H2 (hypothesis of two alleles) which assumes NA? = NAA and NB? = NBB, the proportion of AA, AB and BB genotype are pA2, 2pApB, and pB2, respectively. So, assuming H-W equilibrium, the log-likelihood function for estimating pA is A B which will lead to pA = 0.5, and lnLH2 = -110.90355. We do not need to estimate pa because pa = 1 – pA. With H3 (3 alleles, with a 3rd cryptic allele C) we need to estimate pA and pB. The log-likelihood function is which will lead to pA = pB = 0.4667 (and pC = 0.0666), and lnLH3 = -109.62326. LRT or information-theoretic indices, e.g.

Markov Models A model of a stochastic process over time or space Components of a Markov model A set of states A set of dependence A set of transition probabilities Examples: Markov chain of nucleotide (or amino acid) substitutions over time Markov models of site dependence Xuhua Xia

Morkov models over 1-dimention S = RRRYYYRRRYYYRRRYYYRRRYYY RRRYYYRRRYYY……, 0-order Markov model PR = 0.5 PY = 0.5 What is the nucleotide at site i? Site dependence 1st-order Markov model PRR = PYY = 2/3 PYR = PRY =1/3 P(Si|Si-1 = R)? 2nd-order Markov model P(Si|Si-2 = R, Si-1 = Y)? 3rd-order and higher order Markov models: rarely used R Y 2/3 1/3 3 1 2 R Y 1/6 Xuhua Xia

Long-range dependence Xuhua Xia

Elements of our Markov chain States space: {Ssunny, Srainy, Ssnowy} State transition probabilities (transition matrix): A = Initial state distribution: i = (.7 .25 .05) .20 .05 .75 .02 .60 .38 .15 .80 Xuhua Xia

The likelihood of a sequence of events P(Ssunny) x P(Srainy | Ssunny) x P(Srainy | Srainy) x P(Srainy | Srainy) x P(Ssnowy | Srainy) x P(Ssnowy | Ssnowy) = 0.7 x 0.15 x 0.6 x 0.6 x 0.02 x 0.2 = 0.0001512 Xuhua Xia

An example of HMM Xuhua Xia

HMM: A classic example Dishonest casino dealer secretly switching between a fair (F) die with P(i, i=1 to 6) = 1/6 and a loaded (L) die with P(6) = 1 Visible: 1354634255112466666122363 Hidden: FFFFFFFFFFFFFFLLLLLFFFFFF HMM: 1. State space: {S1, S2,…,SN} 2. State transition probabilities (transition matrix): aij = P(Si,t+1 | Sj,t) 3. Observations: {O1, O2,…,OM} 4. Emission probabilities: bj(k) = P(Ok| Sj) Is it possible to know how many dice the dishonest casino dealer uses? Need specific hypotheses: e.g., H1: he has two H2: he has three F L 1 2 3 4 5 6 PFF PFL 1/6 PLF PLL Is it possible to reconstruct the hidden Markov model and infer all hidden states? Almost never. Success of inference depends mainly on 1) differences in emission probabilities, and 2) how long the chain will stay in each state Xuhua Xia

Protein secondary structure The hidden states are coil (C) strand (E) and helix (H). RT is a real HIV-1 reverse transcriptase. ST indicates the states. Xuhua Xia

Transition probabilities over sites C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 We have learned P_ij (t) over time, i.e., transition from state i to state j over time t. Here P_ij is transition from state i to state j over site. The relatively large values in the diagonal are typical of Markov chains where a state tends to stay in the same state for several consecutive observations (amino acids in our case). Quality of inference increases with the diagonal values. Xuhua Xia

Emission probabilities AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 Quality of inferring the hidden states increases with the difference in emission probabilities. Predicting secondary structure based only on emission probabilities only: 123456789012345678901 T = YVYVEEEEEEVEEEEEEPGPG Naïve = EEEEHHHHHHEHHHHHHCCCC PEH in the transition probability matrix is 0.00000, implying an extremely small probability of E followed by H. Our naïve prediction above with an H at position 5 (following an E at position 4) therefore represents an extremely unlikely event. Another example is at position 11 with T11 = V. Our prediction of Naïve11 = E implies a transition of secondary structure from H to E (Naïve10 and Naïve11) and then back from E to H (Naïve11 and Naïve12). The transition probability matrix shows us that PHE and PEH are both very small. So S11 is very unlikely to be E. Xuhua Xia

Infer the hidden states: Viterbi algorithm A dynamic programming algorithm Incorporate information from both the transition probability matrix and the emission probability matrix Need two matrices Viterbi matrix (Vij): log-likelihood of state j at position i. Backtrack matrix (B) Both of dimension NumState × SeqLen Output: a reconstructed hidden state sequence Example: predicting secondary structure of sequence YVYVEEEEEEVEEEEEEPGPG Xuhua Xia

Viterbi algorithm V matrix C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 First row of V matrix: V matrix   C E H Y -4.741 -2.97 -6.075 V P G Explanation of the two equations in the top-left: Eq. 1: k = C, E, or H, n is the number of hidden states (=3 in our example), and X_1 is the amino acid corresponding to C, E, and H, e.g. V_C(1)=e_C(Y)/n = 0.0262/3 (actually the natural logarithm of it) Eq. 2: e.g., V_C(2) =e_C(V)*max(V_C(1)P_CC,V_E(1)P_CE,V_H(1)P_CH) -4.74061 -2.97006 -6.07485 ln 0.008734 0.051282 0.002299 /3 Viterbi algorithm

Viterbi algorithm lnL of state j at position i V matrix C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 2+ row of V matrix: V matrix lnL of state j at position i   C E H Y -4.741 -2.97 -6.075 V -7.548 -4.963 -9.185 -9.946 -7.138 -14.24 -11.72 -9.131 -16.01 -13.07 -12.92 -16.73 -15.8 -16.7 -18.09 -18.52 -20.48 -20.15 -21.25 -24.27 -22.21 -23.98 -27.39 -26.7 -30.12 -26.33 -30.06 -31.05 -29.44 -32.79 -34.84 -31.5 -35.52 -38.62 -33.56 -38.24 -41.66 -35.62 -40.89 -44.08 -37.68 -42.95 -46.14 -39.74 -45.01 -48.2 -41.8 P -46.44 -50.95 -46.16 G -48.91 -237.9 -50.52 -51 -55.74 -54.89 -53.47 -242.5 -58.32 The V matrix shows ln, e.g., ln(0.000527104) = -7.548 The necessity of take logarithm: multiplication of increasing number of probability values leading to increasingly small values. Viterbi algorithm

B matrix Viterbi algorithm C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 AA C E H A 0.0262 0.0308 0.0552 0.0000 0.0138 D 0.0568 0.0154 0.0345 0.0742 0.1379 F 0.0218 0.0462 0.0276 G 0.0961 I 0.0480 0.1385 0.0828 K 0.1135 0.0769 0.1172 L 0.0699 0.1103 M 0.0131 N 0.0393 P 0.1397 Q R 0.0621 S 0.0207 T 0.0830 0.0615 V 0.1846 0.0483 W 0.0306 Y 0.1539 0.0069 B matrix   V Matrix B Matrix C E H C(0) E (1) H(2) HS Y -4.74057 -2.97041 -6.07535 V -7.54809 -4.96308 -9.18506 1 2 -9.94622 -7.13807 -14.24069 -11.71574 -9.13074 -16.01287 -13.07242 -12.91516 -16.73257 -15.79838 -16.69959 -18.08925 -18.52435 -20.48402 -20.14914 -21.25031 -24.26844 -22.20904 -23.97627 -27.39268 -24.26893 -26.70223 -30.11864 -26.32883 -30.06419 -31.05285 -29.43855 -32.79015 -34.83727 -31.49844 -35.51611 -38.62170 -33.55834 -38.24207 -41.65849 -35.61823 -40.89289 -44.07621 -37.67813 -42.95279 -46.13610 -39.73802 -45.01268 -48.19600 -41.79792 P -46.44005 -50.94904 -46.16040 G -48.90819 -237.91316 -50.52288 -51.00163 -55.74371 -54.88536 -53.46976 -242.47474 -58.32104 Viterbi algorithm

The Viterbi Algorithm Dynamic programming algorithm Score matrix V Backtrack matrix B C E H 0.88210 0.06987 0.04803 0.26154 0.73846 0.00000 0.06897 0.00690 0.92414 123456789012345678901 T = YVYVEEEEEEVEEEEEEPGPG Viterbi = EEEECHHHHHHHHHHHHCCCC Naïve = EEEEHHHHHHEHHHHHHCCCC Xuhua Xia

Objectives and computation in HMM Define the model structure, e.g., number of hidden states, number of observed events Obtain training data Training HMM to obtain transition probability matrix and emission probabilities Viterbi algorithm to reconstruct the hidden states Forward algorithm to compute the probability of the observed sequence Utility again: Better understanding Better prediction Xuhua Xia