Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).

Similar presentations


Presentation on theme: ". Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.)."— Presentation transcript:

1 . Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).

2 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q(  ) over the alpha bet {A,C,T,G}. This is not the case in true genomes. 1.genomic sequences come in triplets– the codons– which encode amino acids via the genetic code. 2.there are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. 3.The pairs C followed by G is less common than expected for random sampling. We will focus on analyzing the third example using a model called Hidden Markov Model.

3 3 Example: CpG Island In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

4 4 Example: CpG Island (Cont.) We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? We use Markov Chains. Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We use Hidden Markov Models.

5 5 (Stationary) Markov Chains X1X1 X2X2 X L-1 XLXL Every variable x i has a domain. For example, suppose the domain are the letters {a, c, t, g}. Every variable is associated with a local (transition) probability table p(X i = x i | X i-1 = x i-1 ) and p(X 1 = x 1 ). The joint distribution is given by In short: Stationary means that the transition probability tables do not depend on i.

6 6 Question 1: Using two Markov chains X1X1 X2X2 X L-1 XLXL For CpG islands: We need to specify p I (x i | x i-1 ) where I stands for CpG Island. X i-1 XiXi ACTG A 0.20.30.40.1 C 0.4p(C | C)p(T| C)high T 0.1p(C | T) p(T | T)p(G | T) G 0.3p(C | G) p(T | G)p(G | G) =1 Lines must add up to one; columns need not.

7 7 Question 1: Using two Markov chains X1X1 X2X2 X L-1 XLXL For non-CpG islands: We need to specify p N (x i | x i-1 ) where N stands for Non CpG island. X i-1 XiXi ACTG A 0.20.30.25 C 0.4p(C | C) p(T | C)low T 0.1p(C | T) p(T | T)high G 0.3p(C | G) p(T | G)p(G | G) Some entries may or may not change compared to p I (x i | x i-1 ).

8 8 Question 1: Log Odds-Ratio test Comparing the two options via odds-ratio test yields If logQ > 0, then CpG island is more likely. If logQ < 0, then non-CpG island is more likely.

9 9 Maximum Likelihood Estimate (MLE) of the parameters (with a teacher, labeled) The needed parameters are: p I (x 1 ), p I (x i | x i-1 ), p N (x 1 ), p N (x i | x i-1 ) The ML estimates are given by: Using MLE is justified when we have a large sample. The numbers appearing in the text book are based on 60,000 sequences. When only small samples are available, Bayesian learning is an attractive alternative, which we will cover soon. X1X1 X2X2 X L-1 XLXL Where N a,I is the number of times letter a appear in CpG islands in the dataset. Where N ba,I is the number of times letter b appears after letter a in CpG islands in the dataset.

10 10 Hidden Markov Models (HMMs) X1X1 X2X3Xi-1XiXi+1R1R1 R2R2 R3R3 R i-1 RiRi R i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 This HMM depicts the factorization: Application in communication: message sent is (s 1,…,s m ) but we receive (r 1,…,r m ). Compute what is the most likely message sent ? Applications in Computational Biology: discussed in this and next few classes (CpG islands, Gene finding, Genetic linkage analysis). k  k transition matrix

11 11 HMM for finding CpG islands Question 2: The input is a long sequence parts of which come from CpG islands and some don’t. We wish to find the most likely assignment of the two labels {I,N} to each letter in the sequence. Domain(H i )={I, N}  {A,C,T,G} (8 states/values) We define a variable H i that encodes the letter at location i and the (hidden) label at that location. Namely, H1H1 H2H2 H L-1 HLHL These hidden variables H i are assumed to form a Markov Chain: The transition matrix is of size 8  8.

12 12 HMM for finding CpG islands (Cont) The HMM: H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL Domain(H i )={I, N}  {A,C,T,G} (8 values) In this representation p(x i | h i ) = 0 or 1 depending on whether x i is consistent with h i. E.g. x i = G is consistent with h i =(I,G) and with h i =(N,G) but not with any other state of h i. The size of the local probability table p(x i | h i ) is 8  4. Domain(X i )= {A,C,T,G} (4 values)

13 13 Queries of interest (MAP) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL The Maximum A Posteriori query : An efficient solution, assuming local probability tables (“the parameters”) are known, is called the Viterbi Algorithm. Same problem if replaced by maximizing the joint distribution p(h 1,…,h L,x 1,..,x L ) An answer to this query gives the most probable N/I labeling for all locations.

14 14 Queries of interest (Belief Update) Posterior Decoding 1. Compute the posteriori belief in H i (specific i) given the evidence {x 1,…,x L } for each of H i ’s values h i, namely, compute p(h i | x 1,…,x L ). 2. Do the same computation for every H i but without repeating the first task L times. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Local probability tables are assumed to be known. An answer to this query gives the probability of having label I or N at an arbitrary location.

15 15 Learning the parameters (EM algorithm) A common algorithm to learn the parameters from unlabeled sequences is called the Expectation-Maximization (EM) algorithm. We will devote several classes to it. In the current context, we just say that it is an iterative algorithm repeating E-step and M-step until convergence. The E-step uses the algorithms we develop in this class.

16 16 Decomposing the computation of Belief update (Posterior decoding) P(x 1,…,x L,h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | x 1,…,x i,h i ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Equality due to Ind({x i+1,…,x L }, {x 1,…,x i } | H i } = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i )  f(h i ) b(h i ) Belief update: P(h i | x 1,…,x L ) = (1/K) P(x 1,…,x L,h i ) where K=  hi P(x 1,…,x L,h i ).

17 17 The forward algorithm P(x 1,x 2,h 2 ) =  P(x 1,h 1,h 2,x 2 ) {Second step} =  P(x 1,h 1 ) P(h 2 | x 1,h 1 ) P(x 2 | x 1,h 1,h 2 ) h1h1 h1h1 Last equality due to conditional independence =  P(x 1,h 1 ) P(h 2 | h 1 ) P(x 2 | h 2 ) h1h1 H1H1 H2H2 X1X1 X2X2 HiHi XiXi The task: Compute f(h i ) = P(x 1,…,x i,h i ) for i=1,…,L (namely, considering evidence up to time slot i). P(x 1, h 1 ) = P(h 1 ) P(x 1 |h 1 ) {Basis step} P(x 1,…,x i,h i ) =  P(x 1,…,x i-1, h i-1 ) P(h i | h i-1 ) P(x i | h i ) h i-1 {step i}

18 18 The backward algorithm The task: Compute b(h i ) = P(x i+1,…,x L |h i ) for i=L-1,…,1 (namely, considering evidence after time slot i). H L-1 HLHL X L-1 XLXL HiHi H i+1 X i+1 P(x L | h L-1 ) =  P(x L,h L |h L-1 ) =  P(h L |h L-1 ) P(x L |h L-1,h L )= hLhL hLhL Last equality due to conditional independence =  P(h L |h L-1 ) P(x L |h L ) {first step} hLhL P(x i+1,…,x L |h i ) =  P(h i+1 | h i ) P(x i+1 | h i+1 ) P(x i+2,…,x L | h i+1 ) h i+1 {step i} =b(h i )= =b(h i+1 )=

19 19 The combined answer 1. To Compute the posteriori belief in H i (specific i) given the evidence {x 1,…,x L } run the forward algorithm and compute f(h i ) = P(x 1,…,x i,h i ), run the backward algorithm to compute b(h i ) = P(x i+1,…,x L |h i ), the product f(h i )b(h i ) is the answer (for every possible value h i ). 2. To Compute the posteriori belief for every H i simply run the forward and backward algorithms once, storing f(h i ) and b(h i ) for every i (and value h i ). Compute f(h i )b(h i ) for every i. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi

20 20 Consequence I: The E-step H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Recall that belief update has been computed via P(x 1,…,x L,h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i )  f(h i ) b(h i ) Now we wish to compute (for the E-step) p(x 1,…,x L,h i,h i+1 )= = f(h i ) p(h i+1 |h i ) p(x i+1 | h i+1 ) b(h i+1 ) p(x 1,…,x i,h i ) p(h i+1 |h i )p(x i+1 | h i+1 )p(x i+2,…,x L |h i+1 )

21 21 Consequence II: Likelihood of evidence 1.To compute the likelihood of evidence P(x 1,…,x L ), do one more step in the forward algorithm, namely,  f(h L ) =  P(x 1,…,x L,h L ) 2. Alternatively, do one more step in the backward algorithm, namely,  b(h 1 ) P(h 1 ) P(x 1 |h 1 ) =  P(x 2,…,x L |h 1 ) P(h 1 ) P(x 1 |h 1 ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi hLhL h1h1 hLhL h1h1

22 22 Time and Space Complexity of the forward/backward algorithms Time complexity is linear in the length of the chain, provided the number of states of each variable is a constant. More precisely, time complexity is O(k 2 L) where k is the maximum domain size of each variable. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Space complexity is also O(k 2 L).

23 23 The MAP query in HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Recall that the query asking likelihood of evidence is to compute P(x 1,…,x L ) =  P(x 1,…,x L, h 1,…,h L ) 2.Now we wish to compute a similar quantity: P * (x 1,…,x L ) = MAX P(x 1,…,x L, h 1,…,h L ) (h 1,…,h L ) And, of course, we wish to find a MAP assignment (h 1 *,…,h L * ) that brought about this maximum.

24 24 Example: Revisiting likelihood of evidence H1H1 H2H2 X1X1 X2X2 H3H3 X3X3 P(x 1,x 2,x 3 ) =  P(h 1 )P(x 1 |h 1 )  P(h 2 |h 1 )P(x 2 |h 2 )  P(h 3 |h 2 )P(x 3 |h 3 ) h3h3 h2h2 h1h1 =  P(h 1 )P(x 1 |h 1 )  b(h 2 ) P(h 1 |h 2 )P(x 2 |h 2 ) h1h1 h2h2 =  b(h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1

25 25 Example: Computing the MAP assignment H1H1 H2H2 X1X1 X2X2 H3H3 X3X3 maximum = max P(h 1 )P(x 1 |h 1 ) max P(h 2 |h 1 )P(x 2 |h 2 ) max P(h 3 |h 2 )P(x 3 |h 3 ) h3h3 h2h2 h1h1 = max P(h 1 )P(x 1 |h 1 ) max b (h 2 ) P(h 1 |h 2 )P(x 2 |h 2 ) h1h1 h2h2 h3h3 Replace sums with taking maximum: = max b (h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1 h2h2 {Finding the maximum} h 1 * = arg max b (h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1 h2h2 {Finding the map assignment} h 2 * = x* (h 1 * ); h2h2 x* (h 2 ) h3h3 x* (h 1 ) h2h2 h 3 * = x* (h 2 * ) h3h3

26 26 Viterbi’s algorithm For i=1 to L-1 do h 1 * = ARG MAX P(h 1 ) P(x 1 |h 1 ) b (h 1 ) h2h2 h2h2 h i+1 * = x* (h i *) h i+1 Forward phase (Tracing the MAP assignment) : H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi x* (h i ) = ARGMAX P(h i+1 | h i ) P(x i+1 | h i+1 ) b (h i+1 ) For i=L-1 downto 1 do b (h i ) = MAX P(h i+1 | h i ) P(x i+1 | h i+1 ) b (h i+1 ) h i+1 h i+2 b (h L ) = 1 h L+1 h i+1 h i+2 Backward phase: (Storing the best value as a function of the parent’s values)

27 27 Summary of HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Belief update = posterior decoding Forward-Backward algorithm 2.Maximum A Posteriori assignment Viterbi algorithm 3.Learning parameters The EM algorithm Viterbi training


Download ppt ". Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.)."

Similar presentations


Ads by Google