. Correctness proof of EM Variants of HMM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes made.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Hidden Markov Model in Biological Sequence Analysis – Part 2
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
Ab initio gene prediction Genome 559, Winter 2011.
EM algorithm and applications. Relative Entropy Let p,q be two probability distributions on the same sample space. The relative entropy between p and.
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Markov Chains Lecture #5
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Alignment HMMs Tutorial #10 © Ilan Gronau. 2 Global Alignment HMM M ISIS ITIT STARTEND (a,a) (a,b) (z,z) (-,a) (-,z) (a,-) (z,-) Probability distributions.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
. Sequence Alignment II Lecture #3 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then by Shlomo Moran. Background.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Sequence Alignment II Lecture #3 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Introduction to Profile Hidden Markov Models
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
Hidden Markov Models for Sequence Analysis 4
. EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 1 Chap. 4 Pairwise alignment using HMMs Biointelligence Laboratory School of Computer Sci. & Eng. Seoul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Learning Bayesian networks
Pair Hidden Markov Model
Hidden Markov Models (HMMs)
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Model Lecture #6
Presentation transcript:

. Correctness proof of EM Variants of HMM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings: chapters 11.6, 3.4, 3.5, 4, in the Durbin et al., 2001, Chapter 3.4 Setubal et al., 1997

2 In each iteration the EM algorithm does the following. u (E step): Calculate Q θ ( ) = ∑ y p(y|x,θ)log p(x,y| ) u (M step): Find * which maximizes Q θ ( ) (Next iteration sets   * and repeats). The EM algorithm Comment: At the M-step we only need that Q θ ( *)>Q θ (θ). This change yields the so called Generalized EM algorithm. It is important when it is hard to find the optimal *.

3 Correctness proof of EM Theorem: If λ* maximizes Q  (λ) = ∑ y p(y|x,θ)log p(y| λ), then P(x| λ*)  P(x| θ). Comment: The proof remains valid if we assume only that Q  (λ*)  Q  (θ).

4 By the definition of conditional probability, for each y we have, p(x| ) p(y|x, ) = p(y,x| ), and hence: log p(x| ) = log p( y,x| ) – log p( y |x, ) Hence log p(x| λ) = ∑ y p(y|x,θ) [log p(y|λ) – log p(y|x,λ)] log p(x|λ) Proof (cont.) =1 (Next..)

5 Proof (end) log p(x|λ) = ∑ y p(y|x, θ) log p(y|λ) - ∑ y p(y|x,θ) log [p(y|x,λ)] Qθ(λ)Qθ(λ) Substituting λ=λ* and λ=θ, and then subtracting, we get log p(x|λ*) - log p(x|θ) = Q(λ*) – Q(θ) + D(p(y|x,θ) || p(y|x,λ*)) ≥ 0 [since λ* maximizes Q(λ)]. Relative entropy 0 ≤ ≥ 0 QED

6 EM in Practice Initial parameters: u Random parameters setting u “Best” guess from other source Stopping criteria: u Small change in likelihood of data u Small change in parameter values Avoiding bad local maxima: u Multiple restarts u Early “pruning” of unpromising ones

7 HMM model structure: 1. Duration Modeling Markov chains are rather limited in describing sequences of symbols with non-random structures. For instance, Markov chain forces the distribution of segments in which some state is repeated for k additional times to be (1-p)p k, for some p. Several ways enable modeling of other distributions. One is assigning k >1 states to represent the same “real” state. This may guarantee k repetitions with any desired probability. A1A1 A2A2 A3A3 A4A4

8 HMM model structure: 2. Silent states u States which do not emit symbols (as we saw in the abo locus). u Can be used to model duration distributions. u Also used to allow arbitrary jumps (may be used to model deletions) u Need to generalize the Forward and Backward algorithms for arbitrary acyclic digraphs to count for the silent states (see next): Silent states: Regular states:

9 Eg, the forwards algorithm should look: Directed cycles of silence (or other) states complicate things, and should be avoided. x v z Silent states Regular states symbols

10 HMM model structure: 3. High Order Markov Chains Markov chains in which the transition probabilities depends on the last k states: P(x i |x i-1,...,x 1 ) = P(x i |x i-1,...,x i-k ) Can be represented by a standard Markov chain with more states. eg for k=2: AA BB BA AB

11 HMM model structure: 4. Inhomogenous Markov Chains u An important task in analyzing DNA sequences is recognizing the genes which code for proteins. u A triplet of 3 nucleotides – called codon - codes for amino acids (see next slide). u It is known that in parts of DNA which code for genes, the three codons positions has different statistics. u Thus a Markov chain model for DNA should represent not only the Nucleotide (A, C, G or T), but also its position – the same nucleotide in different position will have different transition probabilities. Used in GENEMARK gene finding program (93).

12 Genetic Code There are 20 amino acids from which proteins are build.

13 Sequence Comparison using HMM HMM for sequence alignment, which incorporates affine gap scores. “Hidden” States u Match. u Insertion in x. u insertion in y. Symbols emitted u Match: {(a,b)| a,b in ∑ }. u Insertion in x: {(a,-)| a in ∑ }. u Insertion in y: {(-,a)| a in ∑ }.

14 The Transition Probabilities ε1- ε ε δδ 1-2 δ MX X M Y Y Emission Probabilities u Match: (a,b) with p ab – only from M states u Insertion in x: (a,-) with q a – only from X state u Insertion in y: (-,a).with q a - only from Y state. (Note that the hidden states can be reconstructed from the alignment.) Transitions probabilities (note the forbidden ones).  δ = probability for 1 st gap  ε = probability for tailing gap.

15 Scoring alignments Each aligned pair is generated by the above HMM with certain probability. For each pair of sequences x (of length m) and y (of length n), there are many alignments of x and y, each corresponds to a different state-path (the length of the paths are between max{m,n} and m+n). Our task is to score alignments by using this model. The score should reflect the probability of the alignment.

16 Most probable alignment Let v M (i,j) be the probability of the most probable alignment of x(1..i) and y(1..j), which ends with a match. Similarly, v X (i,j) and v Y (i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y. Then using a recursive argument, we get:

17 Most probable alignment Similar argument for v X (i,j) and v Y (i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y, are:

18 Adding termination probabilities For this, an END state is added, with transition probability τ from any other state to END. This assumes expected sequence length of 1/ τ. MXY END M 1-2δ -τ δδτ X 1-ε -τ ε τ Y ετ END 1 We may want a model which defines a probability distribution over all possible sequences. The last transition in each alignment is to the END state, with probability τ

19 The log-odds scoring function u We wish to know if the alignment score is above or below the score of random alignment. u For gapless alignments we used the log-odds ratio: s(a,b) = log (p ab / q a q b ). log (p ab /q a q b )>0 iff the probability that a and b are related by our model is larger than the probability that they are picked at random. u To adapt this for the HMM model, we need to model random sequence by HMM, with end state. This model assigns probability to each pair of sequences x and y of arbitrary lengths m and n.

20 scoring function for random model XYEND X1- ηη Y η END1 The transition probabilities for the random model, with termination probability η: (x is the start state) The emission probability for a is q a. Thus the probability of x (of length n) and y (of length m) being random is: And the corresponding log-odds score is:

21 Markov Chains for “Random” and “Model” XYEND X1- ηη Y η END 1 MXY M1-2δ -τ δδτ X1-ε -τ ε τ Y ετ END 1 “Model” “Random”

22 Combining models in the log-odds scoring function In order to compare the M score to the R score of sequences x and y, we can find an optimal M score, and then subtract from it the R score. This is insufficient when we look for local alignments, where the optimal substrings in the alignment are not known in advance. A better way: 1. Define a log-odds scoring function which keeps track of the difference Match-Random scores of the partial strings during the alignment. 2. At the end add to the score (logτ – 2logη). We get the following:

23 The log-odds scoring function And at the end add to the score (logτ – 2logη). (assuming that letters at insertions/deletions are selected by the random model)

24 The log-odds scoring function Another way (Durbin et. al., Chapter 4.1): Define scoring function s with penalties d and e for a first gap and a tailing gap, resp. Then modify the algorithm to correct for extra prepayment, as follows:

25 Log-odds alignment algorithm Initialization: V M (0,0)=logτ - 2logη. Termination: V = max{ V M (m,n), V X (m,n)+c, V Y (m,n)+c} Where c= log (1-2δ-τ) – log(1-ε-τ)

26 Total probability of x and y Rather then computing the probability of the most probable alignment, we look for the total probability that x and y are related by our model. Let f M (i,j) be the sum of the probabilities of all alignments of x(1..i) and y(1..j), which end with a match. Similarly, f X (i,j) and f Y (i,j) are the sum of these alignments which end with insertion to x (y resp.). A “forward” type algorithm for computing these functions. Initialization: f M (0,0)=1, f X (0,0)= f Y (0,0)=0 (we start from M, but we could select other initial state).

27 Total probability of x and y (cont.) The total probability of all alignments is: P(x,y|model)= f M [m,n] + f X [m,n] + f Y [m,n]