Hidden Markov Models (HMMs)

Slides:



Advertisements
Similar presentations
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
… Hidden Markov Models Markov assumption: Transition model:
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Gene Finding. Finding Genes Prokaryotes –Genome under 10Mb –>85% of sequence codes for proteins Eukaryotes –Large Genomes (up to 10Gb) –1-3% coding for.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Sampling Approaches to Pattern Extraction
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
1 Hidden Markov Models (HMMs). 2 Definition Hidden Markov Model is a statistical model where the system being modeled is assumed to be a Markov process.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Models BMI/CS 576
Essential Probability & Statistics
Learning, Uncertainty, and Information: Learning Parameters
Structured prediction
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models - Training
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models (HMMs)
Hidden Markov Models Part 2: Algorithms
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Algorithms of POS Tagging
Introduction to HMM (cont)
Hidden Markov Model Lecture #6
Hidden Markov Models By Manish Shrivastava.
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

Hidden Markov Models (HMMs) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 27, 2005 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Motivation: the CpG island problem Methylation in human genome “CG” -> “TG” happens in most place except “start regions” of genes CpG islands = 100-1,000 bases before a gene starts Questions Q1: Given a short stretch of genomic sequence, how would we decide if it comes from a CpG island or not? Q2: Given a long sequence, how would we find the CpG islands in it?

Answer to Q1: Bayes Classifier Hypothesis space: H={HCpG, HOther} Evidence: X=“ATCGTTC” Prior probability Likelihood of evidence (Generative Model) We need two generative models for sequences: p(X| HCpG), p(X|HOther)

A Simple Model for Sequences:p(X) Probability rule Assume independence Capture some dependence P(x|HCpG) P(A|HCpG)=0.25 P(T|HCpG)=0.25 P(C|HCpG)=0.25 P(G|HCpG)=0.25 P(x|HOther) P(A|HOther)=0.25 P(T|HOther)=0.40 P(C|HOther)=0.10 P(G|HOther)=0.25 X=ATTG Vs. X=ATCG

Answer to Q2: Hidden Markov Model X=ATTGATGCAAAAGGGGGATCGGGCGATATAAAATTTG Other CpG Island Other How can we identify a CpG island in a long sequence? Idea 1: Test each window of a fixed number of nucleitides Idea2: Classify the whole sequence Class label S1: OOOO………….……O Class label S2: OOOO…………. OCC … Class label Si: OOOO…OCC..CO…O Class label SN: CCCC……………….CC S*=argmaxS P(S|X) = argmaxS P(S,X) S*=OOOO…OCC..CO…O CpG

HMM is just one way of modeling p(X,S)…

A simple HMM 0.8 0.8 0.5 0.5 Parameters 0.2 0.2 P(x|B) P(x|I) 0.5 0.5 Initial state prob: p(B)= 0.5; p(I)=0.5 State transition prob: p(BB)=0.8 p(BI)=0.2 p(IB)=0.5 p(II)=0.5 Output prob: P(a|B) = 0.25, … p(c|B)=0.10 P(c|I) = 0.25 … P(B)=0.5 P(I)=0.5 0.2 0.2 B I P(x|B) P(x|I) 0.5 0.5 P(x|HOther)=p(x|B) P(a|B)=0.25 P(t|B)=0.40 P(c|B)=0.10 P(g|B)=0.25 P(x|HCpG)=p(x|I) P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25

A General Definition of HMM Initial state probability: N states M symbols State transition probability: Output probability:

How to “Generate” a Sequence? P(x|B) 0.8 P(x|I) 0.5 P(a|B)=0.25 P(t|B)=0.40 P(c|B)=0.10 P(g|B)=0.25 P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25 0.2 B I model 0.5 P(B)=0.5 P(I)=0.5 a c g t t … Sequence B I I I B B I B states I I I B B I I B … … Given a model, follow a path to generate the observations.

How to “Generate” a Sequence? P(x|B) 0.8 P(x|I) 0.5 P(a|B)=0.25 P(t|B)=0.40 P(c|B)=0.10 P(g|B)=0.25 P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25 0.2 B I model 0.5 P(B)=0.5 P(I)=0.5 a c g t t … Sequence 0.2 0.5 0.5 0.5 B I I I B 0.5 0.25 0.25 0.25 0.25 0.4 a c g t t P(“BIIIB”, “acgtt”)=p(B)p(a|B) p(I|B)p(c|I)p(I|I)p(g|I)p(I|I)p(t|I)p(B|I)p(t|B)

HMM as a Probabilistic Model Time/Index: t1 t2 t3 t4 … Data: o1 o2 o3 o4 … Sequential data Random variables/ process Observation variable: O1 O2 O3 O4 … Hidden state variable: S1 S2 S3 S4 … State transition prob: Probability of observations with known state transitions: Output prob. Joint probability (complete likelihood): Init state distr. Probability of observations (incomplete likelihood): State trans. prob.

Three Problems 1. Decoding – finding the most likely path Given: model, parameters, observations (data) Compute: most likely states sequence 2. Evaluation – computing observation likelihood Given: model, parameters, observations (data) Compute: the likelihood to generate the observed data

Three Problems (cont.) 3 Training – estimating parameters Supervised Given: model architecture, labeled data ( data+state sequence) Unsupervised Given : model architecture, unlabeled data Maximum Likelihood

Problem I: Decoding/Parsing Finding the most likely path You can think of this as classification with all the paths as class labels…

What’s the most likely path? P(x|B) 0.8 P(x|I) P(a|B)=0.25 P(t|B)=0.40 P(c|B)=0.10 P(g|B)=0.25 0.5 P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25 0.2 B I 0.5 P(B)=0.5 P(I)=0.5 ? a c g t t a t g

Viterbi Algorithm: An Example 0.8 0.5 P(x|B) P(a|B)=0.251 P(t|B)=0.40 P(c|B)=0.098 P(g|B)=0.251 P(x|I) 0.2 P(a|I)=0.25 P(t|I)=0.25 P(c|I)=0.25 P(g|I)=0.25 B I 0.5 P(B)=0.5 P(I)=0.5 t = 1 2 3 4 … a c g t … 0.5 0.8 0.8 B B 0.8 B B 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 I 0.5 I 0.5 I I VP(B): 0.5*0.251 (B) 0.5*0.251*0.8*0.098(BB) … VP(I) 0.5*0.25(I) 0.5*0.25*0.5*0.25(II) … Remember the best paths so far 0.5

(Dynamic programming) Viterbi Algorithm Observation: Algorithm: (Dynamic programming) Complexity: O(TN2)

Problem II: Evaluation Computing the data likelihood Another use of an HMM, e.g., as a generative model for discrimination Also related to Problem III – parameter estimation

Data Likelihood: p(O|) c g t … 0.5 0.8 B B 0.8 0.8 B B 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 I 0.5 I 0.5 I I All HMM parameters In general, Complexity of a naïve approach? 0.5

The Forward Algorithm Observation: Algorithm: The data likelihood is Generating o1…ot with ending state si The data likelihood is Complexity: O(TN2)

Forward Algorithm: Example c g t … 0.5 0.8 B B 0.8 0.8 B B 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 I 0.5 I 0.5 I I 1(B): 0.5*p(“a”|B) 2(B): [1(B)*0.8+ 1(I)*0.5]*p(“c”|B) …… 1(I): 0.5*p(“a”|I) 2(I): [1(B)*0.2+ 1(I)*0.5]*p(“c”|I) …… P(“a c g t”) = 4(B)+ 4(I)

The Backward Algorithm Observation: Algorithm: (o1…ot already generated) Starting from state si Generating ot+1…oT Complexity: O(TN2) The data likelihood is

Backward Algorithm: Example … 0.5 0.8 0.8 B B 0.8 B B 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 I 0.5 I 0.5 I I … 4(B): 1 3(B): 0.8*p(“t”|B)*4(B)+ 0.2*p(“t”|I)*4(I) 3(I): 0.5*p(“t”|B)*4(B)+ 0.5*p(“t”|T)*4(I) 4(I): 1 P(“a c g t”) =  1(B)* 1(B)+  1(I)* 1(I) =  2(B)* 2(B)+  2(I)* 2(I)

Problem III: Training Estimating Parameters Where do we get the probability values for all parameters? Supervised vs. Unsupervised

Supervised Training Given: 1. N – the number of states, e.g., 2, (s1 and s2) 2. V – the vocabulary, e.g., V={a,b} 3. O – observations, e.g., O=aaaaabbbbb 4. State transitions, e.g., S=1121122222 Task: Estimate the following parameters 1. 1, 2 2. a11, a12,a22,a21 3. b1(a), b1(b), b2(a), b2(b) 1=1/1=1; 2=0/1=0 a11=2/4=0.5; a12=2/4=0.5 a21=1/5=0.2; a22=4/5=0.8 b1(a)=4/4=1.0; b1(b)=0/4=0; b2(a)=1/6=0.167; b2(b)=5/6=0.833 0.5 0.8 0.5 P(s1)=1 P(s2)=0 1 2 0.2 P(a|s1)=1 P(b|s1)=0 P(a|s2)=167 P(b|s2)=0.833

Unsupervised Training Given: 1. N – the number of states, e.g., 2, (s1 and s2) 2. V – the vocabulary, e.g., V={a,b} 3. O – observations, e.g., O=aaaaabbbbb 4. State transitions, e.g., S=1121122222 Task: Estimate the following parameters 1. 1, 2 2. a11, a12,a22,a21 3. b1(a), b1(b), b2(a), b2(b) How could this be possible?  Maximum Likelihood:

Computation of P(O,qk|) is expensive … Intuition O=aaaaabbbbb,  P(O,q1|) P(O,qK|) P(O,q2|) q1=1111111111 q2=11111112211 … qK=2222222222 New ’ Computation of P(O,qk|) is expensive …

Baum-Welch Algorithm Basic “counters”: Computation of counters: Being at state si at time t Being at state si at time t and at state sj at time t+1 Computation of counters: Complexity: O(N2)

Baum-Welch Algorithm (cont.) Updating formulas: Overall complexity for each iteration: O(TN2)

What You Should Know Definition of an HMM and parameters of an HMM Viterbi Algorithm Forward/Backward algorithms Estimate parameters of an HMM in a supervised way