. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Markov Chains Lecture #5
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Hidden Markov Models.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Other Models for Time Series. The Hidden Markov Model (HMM)
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Hidden Markov Model I.
Hidden Markov Models Part 2: Algorithms
Hidden Markov Model ..
Markov Chains Lecture #5
Hidden Markov Model Lecture #6
Hidden Markov Models ..
Presentation transcript:

. Hidden Markov Model Lecture #6

2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m dimensional initial distribution vector ( p(1),.., p(m)). 2.An m×m transition probabilities matrix M= (a st ) For each integer L, a Markov Chain assigns probability to sequences (x 1 …x L ) over D (i.e, x i D) as follows: Similarly, (X 1,…, X i,…)is a sequence of probability distributions over D.

3 Ergodic Markov Chains The Fundamental Theorem of Finite-state Markov Chains: If a Markov Chain is ergodic, then 1.It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix. 2.The distributions X i, as i  ∞, converges to V. A B C D A Markov chain is ergodic if : 1.All states are recurrent (ie, the graph is strongly connected) 2.It is not peridoic

4 Use of Markov Chains: Sequences with CpG Islands Recall from last class: In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

5 Modeling sequences with CpG Island The “+” model: Use transition matrix A + = (a + st ), Where: a + st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix A - = (a - st ), Where: a - st = (the probability that t follows s in a non CpG island)

6 CpG Island: Question 1 We solved the following question: Question 1: Given a short stretch of genomic data, does it come from a CpG island ? By modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

7 CpG Island: Question 2 Now we solve the 2 nd question: Question 2: Given a long piece of genomic data, does it contain CpG islands in it, and where? For this, we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model. This is done by using the Hidden Markov Model, to be defined.

8 Question 2: Finding CpG Islands Given a long genomic str with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): C+C+ T+T+ G+G+ A+A+ C-C- T-T- G-G- A-A- The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. Therefore we use here Hidden Markov Model

9 Hidden Markov Model A Markov chain (s 1,…,s L ): and for each state s and a symbol x we have p(X i =x|S i =s) Application in communication: message sent is (s 1,…,s m ) but we receive (x 1,…,x m ). Compute what is the most likely message sent ? Application in speech recognition: word said is (s 1,…,s m ) but we recorded (x 1,…,x m ). Compute what is the most likely word said ? S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT

10 Hidden Markov Model Notations: Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities: p(X i = b| S i = s) = e s (b) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT For Markov Chains we know: What is p( s,x ) = p(s 1,…,s L ;x 1,…,x L ) ?

11 Indepndence assumptions: X1X1 X2X3Xi-1XiXi+1R1R1 R2R2 R3R3 R i-1 RiRi R i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 We assume the following joint distribution for the full chain: This factorization encodes the following conditional independence assumptions: p(s i | s 1,…,s i-1,x 1,…,x i-1 ) = p(s i | s i-1 ) and p(r i | s 1,…,s i,x 1,…,x i-1 ) = p(x i | s i )

12 Hidden Markov Model p(X i = b| S i = s) = e s (b), means that the probability of x i depends only on the probability of s i. Formally, this is equivalent to the conditional independence assumption: p(X i =x i |x 1,..,x i-1,x i+1,..,x L,s 1,..,s i,..,s L ) = e s i (x i ) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Thus

13 Hidden Markov Model Exercise: Using the definition of conditional probability: P(X|Y) = P(X,Y)/P(Y), prove formally that the equality p(X i = x i |x 1,..,x i-1,x i+1,..,x L,s 1,..,s i,..,s L ) = e s i (x i ) implies that for any Y  { x 1,..,x i-1,x i+1,..,x L,s 1,..,s i,..,s L }, such that s i is in Y, it holds that: p(X i =x i |Y) = e s i (x i ) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT

14 Hidden Markov Model for CpG Islands The states: S1S1 S2S2 S L-1 SLSL X1X1 X2X2 X L-1 XLXL Domain(S i )={+, -}  {A,C,T,G} (8 values) In this representation P(x i | s i ) = 0 or 1 depending on whether x i is consistent with s i. E.g. x i = G is consistent with s i =(+,G) and with s i =(-,G) but not with any other state of s i. The query of interest: ),,|,...,(argmax),...,( 11 ) (s ** 1 1 LL s L xxsspss L  

15 Use of HMM: A posteriori belief The conditional probability of a variable X given the evidence e: This is the a posteriori belief in X, given evidence e This query is also called Belief update. We use HMM to compute our posteriori belief on a sequence, given some information on it (usually (x 1,…,x L ))

16 Hidden Markov Model Questions: Given the “visible” sequence x =(x 1,…,x L ), find: 1.A most probable (hidden) path. 2.The probability of x. 3.For each i = 1,..,L, and for each state k, p(s i =k| x ) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT

17 1. Most Probable state path S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT First Question: Given an output sequence x = (x 1,…,x L ), A most probable path s*= (s * 1,…,s * L ) is one which maximizes p(s|x).

18 Most Probable path (cont.) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Since we need to find s which maximizes p(s,x)

19 Viterbi’s algorithm for most probable path s1s1 s2s2 X1X1 X2X2 sisi XiXi The task: compute v l (i) = the probability p(s 1,..,s i ;x 1,..,x i |s i =l ) of a most probable path up to i, which ends in state l. Let the states be {1,…,m} Idea: for i=1,…,L and for each state l, compute:

20 Viterbi’s algorithm for most probable path v l (i) = the probability p(s 1,..,s i ;x 1,..,x i |s i =l ) of a most probable path up to i, which ends in state l. Exercise: For i = 1,…,L and for each state l: s1s1 S i-1 X1X1 X i-1 l XiXi...

21 Viterbi’s algorithm s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : v l (i) = e l (x i ) MAX k {v k (i-1)a kl } ptr i (l)=argmax k {v k (i-1)a kl } [storing previous state for reconstructing the path] Termination: Initialization: v 0 (0) = 1, v k (0) = 0 for k > 0 0 We add the special initial state 0. Result : p(s 1 *,…,s L * ;x 1,…,x l ) =

22 2. Computing p(x) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Given an output sequence x = (x 1,…,x L ), Compute the probability that this sequence was generated: The summation taken over all state-paths s generating x.

23 Forward algorithm for computing p(x) ? ? X1X1 X2X2 sisi XiXi The task: compute Idea: for i=1,…,L and for each state l, compute: f l (i) = p(x 1,…,x i ;s i =l ), the probability of all the paths which emit (x 1,..,x i ) and end in state s i =l. Use the recursive formula:

24 Forward algorithm for computing p(x) s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : f l (i) = e l (x i ) ∑ k f k (i-1)a kl Initialization: f 0 (0) := 1, f k (0) := 0 for k>0 0 Similar to Viterbi’s algorithm: Result : p(x 1,…,x L ) =

25 3. The distribution of S i, given x S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Given an output sequence x = (x 1,…,x L ), Compute for each i=1,…,l and for each state k the probability that s i = k. This helps to reply queries like: what is the probability that s i is in a CpG island, etc.

26 Solution in two stages 1. For each i and each state k, compute p(s i =k | x 1,…,x L ). s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi 2. Do the same computation for every i = 1,..,L but without repeating the first task L times.

27 Computing for a single i: s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi

28 Decomposing the computation P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | x 1,…,x i,s i ) (by the equality p(A,B) = p(A)p(B|A ). s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi P(x 1,…,x i,s i )= f s i (i) ≡ F(s i ), so we are left with the task to compute P(x i+1,…,x L | x 1,…,x i,s i ) ≡ B(s i )

29 Decomposing the computation s1s1 s2s2 S i+1 sLsL X1X1 X2X2 X i+1 XLXL sisi XiXi Exercise: Show from the definitions of Markov Chain and Hidden Markov Chain that: P(x i+1,…,x L | x 1,…,x i,s i ) = P(x i+1,…,x L | s i ) Denote P(x i+1,…,x L | s i ) ≡ B(s i ).

30 Decomposing the computation Summary: P(x 1,…,x L,s i ) = P(x 1,…,x i,s i ) P(x i+1,…,x L | x 1,…,x i,s i ) s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi Equality due to independence of {x i+1,…,x L }, and {x 1,…,x i } | s i } – by the Exercise. = P(x 1,…,x i,s i ) P(x i+1,…,x L | s i ) ≡ F(s i )·B(s i )

31 F(s i ): The Forward algorithm: s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi For i=1 to L do for each state l : F(s i ) = e s i (x i )·∑ s i-1 F (s i-1 )a s i-1 s i Initialization: F (0) = 1 0 The algorithm computes F(s i ) = P(x 1,…,x i,s i ) for i=1,…,L (namely, considering evidence up to time slot i).

32 B(s i ): The backward algorithm The task: Compute B(s i ) = P(x i+1,…,x L |s i ) for i=L-1,…,1 (namely, considering evidence after time slot i). S L-1 SLSL X L-1 XLXL SiSi S i+1 X i+1 {first step, step L-1: Compute B(s L-1 ).} {step i: compute B(s i ) from B(s i+1 )} P(x i+1,…,x L |s i ) =  P(s i+1 | s i ) P(x i+1 | s i+1 ) P(x i+2,…,x L | s i+1 ) s i+1 B(si) B(si) B(s i+1 ) P(x L | s L-1 ) =  s L P(x L,s L |s L-1 ) =  s L P(s L |s L-1 ) P(x L |s L )

33 The combined answer 1. To compute the probability that S i =s i given that {x 1,…,x L } run the forward algorithm and compute F(s i ) = P(x 1,…,x i,s i ), run the backward algorithm to compute B(s i ) = P(x i+1,…,x L |s i ), the product F(s i )B(s i ) is the answer (for every possible value s i ). 2. To compute these probabilities for every s i simply run the forward and backward algorithms once, storing F(s i ) and B(s i ) for every i (and every value of s i ). Compute F(s i )B(s i ) for every i. s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi

34 Time and Space Complexity of the forward/backward algorithms Time complexity is O(m 2 L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. s1s1 s2s2 s L-1 sLsL X1X1 X2X2 X L-1 XLXL sisi XiXi Space complexity is also O(m 2 L).