. EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes.

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Hidden Markov Model in Biological Sequence Analysis – Part 2
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model.
EM algorithm and applications. Relative Entropy Let p,q be two probability distributions on the same sample space. The relative entropy between p and.
Phylogenetic Trees Lecture 4
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models Usman Roshan BNFO 601.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Hidden Markov Models.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Class 3: Estimating Scoring Rules for Sequence Alignment.
. Sequence Alignment II Lecture #3 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then by Shlomo Moran. Background.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
. EM and variants of HMM Lecture #9 Background Readings: Chapters 11.2, 11.6, 3.4 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Correctness proof of EM Variants of HMM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes made.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Expected accuracy sequence alignment Usman Roshan.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Expected accuracy sequence alignment Usman Roshan.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Pair Hidden Markov Model
CONTEXT DEPENDENT CLASSIFICATION
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Model Lecture #6
Presentation transcript:

. EM with Many Random Variables Another Example of EM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings: chapters 11.6, 3.4, 3.5, 4, in the Durbin et al., 2001, Chapter 3.4 Setubal et al., 1997

2 EM for processes with many dice In the previous class we presented the EM algorithm for the case where the parameters are probabilities associated with a single “die” (i.e., probability space/random variable). In practical applications the model may include many dice (like the HMM model). The generalization of the EM algorithm to many dice is rather straightforward, and given next.

3 EM for processes with many dice The model is defined by the parameters (random variables, or dice) and the simple events. Let the random variables be Z l (l =1,...,r), Z l has m l values z l,1,...z l,m l with probabilities {q lk |k=1,...,m l }. Each simple event y corresponds to a sequence of outcomes (z l 1, k 1,...,z l n, k n ) of the random variables used in y. Let N lk (y) = #(z lk appears in y).

Define N lk as the expected value of N lk (y), given x and θ: N lk =E(N lk |x,θ) = ∑ y p(y|x,θ) N lk (y), Then we have: 4 EM for processes with many dices Similarly to the single die case, we have:

5 L  (λ) for processes with many dices N lk

6 EM algorithm for processes with many dice Maximization step Set λ lk =N lk / (∑ k’ N lk’ ) Similarly to the one dice case we get: Expectation step Set N lk to E (N lk (y)|x,θ), ie: N lk = ∑ y p(y|x,θ) N lk (y)

7 EM algorithm for n independent observations x 1,…, x n : Expectation step It can be shown that, if the x j are independent, then:

8 EM – One More Example The DNA of species in planet Melmek contains two letters: A and B. In the evolutionary process in Melmek, a mutation of a letter occurs in two steps: first the letter may be deleted, and in case the letter is not deleted it can be changed to the 2 nd letter. The unknown probabilities of these events are respectively There are two species in Melmek, S (for Son) and its direct ancestor F (Father). Scientists were able to deduce that the DNA of F contained a sequence of two letters "AX", where "X" is equally likely to be A or B (Prob(X=A) = Prob(X=B) = 0.5). 1. Describe the probability space defined by the evolution of the two letters "AX" in F into a sequence (of up to two letters) in S: a.Write down all the "simple events". b.For each event write its probability (as a function of ), c. and its contribution to the six statistics used by the EM algorithm.

9 Example (cont) For writing the simple events, we assume the following order of “dice tossing” : (a)decide if the initial sequence is AA or AB (with probability 0.5 each) (b)Delete/replace the right letter (one or two tossing) (c)Same for left letter. For instance, a simple event of deletion of two letters is:

10 Example (cont) Two more simple events which start with AA: There are altogether 18 simple events: 9 which start with AA and 9 which start with AB

11 Example (cont) 2.Later information showed that the sequence "AX" of F evolved into the sequence "A" in S. For given, write down the probability of the above scenario of evolution (of "AX" in F evolving into "A" in S).

12 Example (cont) 3. Write a single round of the EM algorithm to estimate the parameters which maximize the likelihood of the above scenario, starting with arbitrary initial parameters Show that the outcome is independent on the initial parameters. Regardless the initial parameters: del and remain have exactly the same count, hence they receive each probability 0.5 Also, A→A and B→A both get probability 1. Calculate the counts of each outcome in each simple event:

13 Example (end) 4.What are the parameters which maximize the likelihood of the above scenario? Justify your answer. Solution 2: The above is parameters are obtained after each iteration of the EM algorithm. By the EM correctness theorem, this must be the (unique) maximum.

14 EM for other discrete stochastic processes Where the experiment (x,y) is generated by a general “stochastic process”. The only assumption we make is that the outcome of each experiment consists of a (finite) sequence of samplings of r discrete random variables (dices) Z 1,..., Z r, each of the Z i ‘s can be sampled few times. This can be realized by a probabilistic acyclic state machine, where at each state some Z i is sampled, and the next state is determined by the outcome – until a final state is reached. The EM algorithm is applicable to a general scenario in which we wish to maximize p(x| )=∑ y p(x,y| ).

15 EM in Practice Initial parameters: u Random parameters setting u “Best” guess from other source Stopping criteria: u Small change in likelihood of data u Small change in parameter values Avoiding bad local maxima: u Multiple restarts u Early “pruning” of unpromising ones

16 Sequence Comparison using HMM We now use HMM to extend such log-odds scoring functions to alignments which may contain gaps (indels). Recall: We used log-odds scoring functions for gapless alignments, as follows: s(a,b)= log(p ab / q a q b ), where p ab and q a are the probabilities of the “Match” and “Random” models.

17 Sequence alignment using HMM Each “output symbol” of the HMM is an aligned pair of two letters, or of a letter and a gap. Example: Insertion of a first gap in this model: 17 … XM (G,T) … (C,-) We still need to assign transition/emission probabilities

18 Example: Insertion of a first gap in this model: … XM (G,T) … (C,-) We still need to assign transition/emission probabilities

19 Need to define the hidden states, and the transition and emission probabilities, which define the probability of each aligned pair of sequences. Given two input sequences, we look for an alignment of these sequences of maximum probability.

20 Hidden states and emitted symbols “Hidden” States u Match. u Insertion in x. u insertion in y. Symbols emitted u Match: {(a,b)| a,b in ∑ }. u Insertion in x: {(a,-)| a in ∑ }. u Insertion in y: {(-,a)| a in ∑ }.

21 Transitions and Emission Probabilities ε01- ε 0ε δδ 1-2 δ MX X M Y Y Emission Probabilities u Match: (a,b) with p ab – only from M states u Insertion in x: (a,-) with q a – only from X state u Insertion in y: (-,a).with q a - only from Y state. (Note that the hidden states can be reconstructed from the alignment.) Transitions probabilities (note the forbidden ones).  δ = probability for 1 st gap  ε = probability for tailing gap.

22 Scoring alignments For each pair of sequences x (of length m) and y (of length n), there are many alignments of x and y, each corresponds to a different state-path (the lengths of the paths are between max{m,n} and m+n). Given the transmission and emission probabilities, each alignment has a defined score – the product of the corresponding probabilities. An alignment is “most probable” if it maximizes this score.

23 Finding the most probable alignment Let v M (i,j) be the probability of the most probable alignment of x(1..i) and y(1..j), which ends with a match. Similarly, v X (i,j) and v Y (i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y. Then using a recursive argument, we get:

24 Most probable alignment By similar argument for v X (i,j) and v Y (i,j), the probabilities of the most probable alignment of x(1..i) and y(1..j), which ends with an insertion to x or y, are:

25 The Probability Space Different alignments of x and y may have different lengths, so the probability space which we used earlier, for HMM of a fixed length L, is not applicable to this alignment HMM model. However, there is a probability space which contains all infinite sequence alignments (finite alignments are compound events in this model). The algorithm of the previous slides compute the correct probability of each alignment in this probability space. Another approach is to define a probability space which contains all alignments of finite length. In the following we adapt our algorithm to this model.

26 Adding termination probabilities Each other state has a transition probability τ to the END state. This results in an expected sequence length of 1/ τ. MXY END M 1-2δ -τ δδτ X 1-ε -τ ε τ Y ετ END 1 Probability space for all finite alignments is obtained by adding an END state, which denotes the end of the alignment The last transition in each alignment is to the END state, with probability τ

27 The log-odds scoring function u We wish to compare the “model” alignment score to the “random” alignment score. u For gapless alignments we used the log-odds ratio: s(a,b) = log (p ab / q a q b ). u To adapt this for the HMM model, we need to model random sequence by HMM, with end state.

28 scoring function for random model XYEND X1- ηη0 Y0 η END001 The transition probabilities for the random model, with termination probability η: (x is the start state) The emission probability for a is q a. Thus the probability of x (of length n) and y (of length m) being random is: And the corresponding score is:

29 Markov Matrices for “Random” and “Model” XYEND X1- ηη Y η END 1 MXY M1-2δ -τ δδτ X1-ε -τ ε τ Y ετ END 1 “Model” “Random”

30 Combining models in the log-odds scoring function In order to compare the M score to the R score of sequences x and y, we can find an optimal M score, and then subtract from it the R score. This is insufficient when we look for local alignments, where the optimal substrings in the alignment are not known in advance. A better way: 1. Define a log-odds scoring function which keeps track of the difference Match-Random scores of the partial strings during the alignment. 2. At the end add to the score (logτ – 2logη) to compensate for the end transitions in both models. We get the following:

31 The log-odds scoring function And at the end add to the score (logτ – 2logη). (assuming that letters at insertions/deletions are selected by the random model)

32 The log-odds scoring function Another way, with uniform scoring for the M state (Durbin et. al., Chapter 4.1): Define scoring function s with penalties d and e for a first gap and a tailing gap, resp. Then modify the algorithm to correct for extra prepayment, as follows:

33 Log-odds alignment algorithm Initialization: V M (0,0)=logτ - 2logη. Termination: V = max{ V M (m,n), V X (m,n)+c, V Y (m,n)+c} Where c= log (1-2δ-τ) – log(1-ε-τ)

34 Total probability of x and y Rather then computing the probability of the most probable alignment, we look for the total probability that x and y are related by our model. Let f M (i,j) be the sum of the probabilities of all alignments of x(1..i) and y(1..j), which end with a match. Similarly, f X (i,j) and f Y (i,j) are the sum of these alignments which end with insertion to x (y resp.). A “forward” type algorithm for computing these functions. Initialization: f M (0,0)=1, f X (0,0)= f Y (0,0)=0 (we start from M, but we could select other initial state).

35 Total probability of x and y (cont.) The total probability of all alignments is: P(x,y|model)= f M [m,n] + f X [m,n] + f Y [m,n]