. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
. Exact Inference in Bayesian Networks Lecture 9.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
… Hidden Markov Models Markov assumption: Transition model:
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Learning Bayesian networks Slides by Nir Friedman.
Lecture 5: Learning models using EM
1 Directional consistency Chapter 4 ICS-275 Spring 2007.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
Hidden Markov Models.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Computer vision: models, learning and inference
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Hidden Markov Models Achim Tresch MPI for Plant Breedging Research & University of Cologne.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Today.
Latent Variables, Mixture Models and EM
Hidden Markov Models Part 2: Algorithms
Basic Model For Genetic Linkage Analysis Lecture #3
Tutorial #6 by Ma’ayan Fishelson
Hidden Markov Models ..
Presentation transcript:

. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.

2 Hidden Markov Models in General X1X1 X2X3Xi-1XiXi+1R1R1 R2R2 R3R3 R i-1 RiRi R i+1 X1X1 X2X3Xi-1XiXi+1S1S1 S2S2 S3S3 S i-1 SiSi S i+1 Which depicts the factorization:

3 Hidden Markov Model In our case X1X1 X2X3Xi-1XiXi+1 X1X1 X2X2 X3X3 Y i-1 XiXi X i+1 X1X1 X2X3Xi-1XiXi+1 S1S1 S2S2 S3S3 S i-1 SiSi S i+1 The compounded variable S i = (S i,1,…,S i,2n ) is called the inheritance vector. It has 2 2n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable X i = (X i,1,…,X i,2n ) is the data regarding locus i. Similarly for the disease locus we use Y i. To specify the HMM we need to write down the transition matrices from S i-1 to S i and the matrices P(x i |S i ). Note that these quantities have already been implicitly defined.

4 Queries of interest (MAP) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL The Maximum A Posteriori query : An efficient solution, assuming local probability tables (“the parameters”) are known, is called the Viterbi Algorithm. Same problem if replaced by maximizing the joint distribution p(h 1,…,h L,x 1,..,x L ) An answer to this query gives the most probable inheritance vector for all locations.

5 Queries of interest (Belief Update) Posterior Decoding 1. Compute the posteriori belief in H i (specific i) given the evidence {x 1,…,x L } for each of H i ’s values h i, namely, compute p(h i | x 1,…,x L ). 2. Do the same computation for every H i but without repeating the first task L times. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Local probability tables are assumed to be known. An answer to this query gives the probability distribution of inheritance vectors at an arbitrary location.

6 Decomposing the computation of Belief update (Posterior decoding) P(x 1,…,x L,h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | x 1,…,x i,h i ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Equality due to Ind({x i+1,…,x L }, {x 1,…,x i } | H i } = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i )  f(h i ) b(h i ) Belief update: P(h i | x 1,…,x L ) = (1/K) P(x 1,…,x L,h i ) where K=  hi P(x 1,…,x L,h i ).

7 The forward algorithm P(x 1,x 2,h 2 ) =  P(x 1,h 1,h 2,x 2 ) {Second step} =  P(x 1,h 1 ) P(h 2 | x 1,h 1 ) P(x 2 | x 1,h 1,h 2 ) h1h1 h1h1 Last equality due to conditional independence =  P(x 1,h 1 ) P(h 2 | h 1 ) P(x 2 | h 2 ) h1h1 H1H1 H2H2 X1X1 X2X2 HiHi XiXi The task: Compute f(h i ) = P(x 1,…,x i,h i ) for i=1,…,L (namely, considering evidence up to time slot i). P(x 1, h 1 ) = P(h 1 ) P(x 1 |h 1 ) {Basis step} P(x 1,…,x i,h i ) =  P(x 1,…,x i-1, h i-1 ) P(h i | h i-1 ) P(x i | h i ) h i-1 {step i}

8 The backward algorithm The task: Compute b(h i ) = P(x i+1,…,x L |h i ) for i=L-1,…,1 (namely, considering evidence after time slot i). H L-1 HLHL X L-1 XLXL HiHi H i+1 X i+1 P(x L | h L-1 ) =  P(x L,h L |h L-1 ) =  P(h L |h L-1 ) P(x L |h L-1,h L )= hLhL hLhL Last equality due to conditional independence =  P(h L |h L-1 ) P(x L |h L ) {first step} hLhL P(x i+1,…,x L |h i ) =  P(h i+1 | h i ) P(x i+1 | h i+1 ) P(x i+2,…,x L | h i+1 ) h i+1 {step i} =b(h i )= =b(h i+1 )=

9 The combined answer 1. To Compute the posteriori belief in H i (specific i) given the evidence {x 1,…,x L } run the forward algorithm and compute f(h i ) = P(x 1,…,x i,h i ), run the backward algorithm to compute b(h i ) = P(x i+1,…,x L |h i ), the product f(h i )b(h i ) is the answer (for every possible value h i ). 2. To Compute the posteriori belief for every H i simply run the forward and backward algorithms once, storing f(h i ) and b(h i ) for every i (and value h i ). Compute f(h i )b(h i ) for every i. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi

10 Consequence I: Likelihood of evidence 1.To compute the likelihood of evidence P(x 1,…,x L ), do one more step in the forward algorithm, namely,  f(h L ) =  P(x 1,…,x L,h L ) 2. Alternatively, do one more step in the backward algorithm, namely,  b(h 1 ) P(h 1 ) P(x 1 |h 1 ) =  P(x 2,…,x L |h 1 ) P(h 1 ) P(x 1 |h 1 ) H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi hLhL h1h1 hLhL h1h1 Evaluate likelihood of evidence for all locations of the disease marker And choose the best.

11 Consequence II: The E-step H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Recall that belief update has been computed via P(x 1,…,x L,h i ) = P(x 1,…,x i,h i ) P(x i+1,…,x L | h i )  f(h i ) b(h i ) Now we wish to compute (for the E-step) p(x 1,…,x L,h i,h i+1 )= = f(h i ) p(h i+1 |h i ) p(x i+1 | h i+1 ) b(h i+1 ) p(x 1,…,x i,h i ) p(h i+1 |h i )p(x i+1 | h i+1 )p(x i+2,…,x L |h i+1 )

12 The EM algorithm for finding a local maximum H1H1 H2H2 H i+1 HLHL X1X1 X2X2 Y i+1 XLXL HiHi XiXi The Expectation Maximization algorithm Iterate the two steps: E-step: compute p  (x 1,…,x L,h i,h i+1 ) where i+1 is the disease locus M-step: Until  convergences. Comment: use random restarts, other convergence criteria, other ending schemes 

13 Time and Space Complexity of the forward/backward algorithms Time complexity is linear in the length of the chain, provided the number of states of each variable is a constant. More precisely, time complexity is O(k 2 L) where k is the maximum domain size of each variable. H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Space complexity is also O(k 2 L).

14 The MAP query in HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Recall that the query asking likelihood of evidence is to compute P(x 1,…,x L ) =  P(x 1,…,x L, h 1,…,h L ) 2.Now we wish to compute a similar quantity: P * (x 1,…,x L ) = MAX P(x 1,…,x L, h 1,…,h L ) (h 1,…,h L ) And, of course, we wish to find a MAP assignment (h 1 *,…,h L * ) that brought about this maximum.

15 Example: Revisiting likelihood of evidence H1H1 H2H2 X1X1 X2X2 H3H3 X3X3 P(x 1,x 2,x 3 ) =  P(h 1 )P(x 1 |h 1 )  P(h 2 |h 1 )P(x 2 |h 2 )  P(h 3 |h 2 )P(x 3 |h 3 ) h3h3 h2h2 h1h1 =  P(h 1 )P(x 1 |h 1 )  b(h 2 ) P(h 1 |h 2 )P(x 2 |h 2 ) h1h1 h2h2 =  b(h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1

16 Example: Computing the MAP assignment H1H1 H2H2 X1X1 X2X2 H3H3 X3X3 maximum = max P(h 1 )P(x 1 |h 1 ) max P(h 2 |h 1 )P(x 2 |h 2 ) max P(h 3 |h 2 )P(x 3 |h 3 ) h3h3 h2h2 h1h1 = max P(h 1 )P(x 1 |h 1 ) max b (h 2 ) P(h 1 |h 2 )P(x 2 |h 2 ) h1h1 h2h2 h3h3 Replace sums with taking maximum: = max b (h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1 h2h2 {Finding the maximum} h 1 * = arg max b (h 1 ) P(h 1 )P(x 1 |h 1 ) h1h1 h2h2 {Finding the map assignment} h 2 * = x* (h 1 * ); h2h2 x* (h 2 ) h3h3 x* (h 1 ) h2h2 h 3 * = x* (h 2 * ) h3h3

17 Viterbi’s algorithm For i=1 to L-1 do h 1 * = ARG MAX P(h 1 ) P(x 1 |h 1 ) b (h 1 ) h2h2 h2h2 h i+1 * = x* (h i *) h i+1 Forward phase (Tracing the MAP assignment) : H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi x* (h i ) = ARGMAX P(h i+1 | h i ) P(x i+1 | h i+1 ) b (h i+1 ) For i=L-1 downto 1 do b (h i ) = MAX P(h i+1 | h i ) P(x i+1 | h i+1 ) b (h i+1 ) h i+1 h i+2 b (h L ) = 1 h L+1 h i+1 h i+2 Backward phase: (Storing the best value as a function of the parent’s values)

18 Summary of HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi 1.Belief update = posterior decoding Forward-Backward algorithm 2.Maximum A Posteriori assignment Viterbi algorithm

19 The forward algorithm for genetic linkage analysis H1H1 H2H2 X1X1 X2X2 HiHi XiXi P(x 1,…,x i,h i ) =  P(x 1,…,x i-1, h i-1 ) P(h i | h i-1 ) P(x i | h i ) h i-1 {step i} Note that in Step i of the forward algorithm, we multiply a transition matrix of size 2 2n x 2 2n with vectors of size 2 2n. The transition matrix P(h i | h i-1 ) has a special form so we can multiply it by a vector faster than for arbitrary matrices. The vector P(x i | h i ) is not given explicitly, so we will see an efficient way to compute it.

20 The transition matrix Recall that: Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 2 2n  2 2n, encoding four options of recombination/non-recombination for the two parental meiosis: (The Kronecker product) For n non-founders, the transition matrix is the n-fold Kronecker product:

21 Efficient Product So, if we start with a matrix of size 2 2n, we will need 2 2n multiplications if we had matrix A in hands. Continuing recursively, at most 2n times, yields a complexity of O(2n2 2n ), far less than O(2 4n ) needed for regular multiplication. With n=10 non-founders, we drop from non-feasible region to feasible one.

22 Probability of data in one marker locus given an inheritance vector S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2 P(x 21, x 22, x 23 |s 23m,s 23f ) = =  P(l 21m ) P(l 21f ) P(l 22m ) P(l 22f ) P(x 21 | l 21m, l 21f ) P(x 22 | l 22m, l 22f ) P(x 23 | l 23m, l 23f ) P(l 23m | l 21m, l 21f, S 23m ) P(l 23f | l 22m, l 22f, S 23f ) l 21m,l 21f,l 22m,l 22f l 22m,l 22f The five last terms are always zero-or-one, namely, indicator functions.

23 Efficient computation S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2 Assume only individual 3 is genotyped. For the inheritance vector (0,1), the founder alleles L 21m and L 22f are not restricted by the data while (L 21f,L 22m ) have two possible joint assignments (A 1,A 2 ) or (A 2,A 1 ) only: The five last terms are always zero-or-one, namely, indicator functions. ={A 1,A 2 } =1 =0 p(x 21, x 22, x 23 |s 23m =1,s 23f =0 ) = p( A 1 )p( A 2 ) + p( A 2 )p( A 1 ) In general. Every inheritance vector defines a subgraph of the Bayesian network above. We build a founder graph

24 Efficient computation S 23m L 21f L 21m L 23m X 21 S 23f L 22f L 22m L 23f X 22 X 23 Model for locus 2 The five last terms are always zero-or-one, namely, indicator functions. ={A 1,A 2 } =1 =0 In general. Every inheritance vector defines a subgraph as indicated by the black lines above. Construct a founder graph whose vertices are the founder variables and where there is an edge between two vertices if they have a common typed descendent. The label of an edge is the constraint dictated by the common typed descendent. Now find all consistent assignments for every connected component. L 21f L 21m L 22f L 22m {A 1,A 2 }

25 A Larger Example a,b a,c b,d {a,b} 5364 {b,d} {a,c} Descent graph Founder graph (An example of a constraint satisfaction graph) Connect two nodes if they have a common typed descendant.

26 The Constraint Satisfaction Problem {a,b} 5364 {b,d} {a,c} The number of possible consistent alleles per non-isolated node is 0, 1 or 2. For example node 2 has all possible alleles, node 6 can only be b and node 3 can be assigned either a or b. namely, the intersection of its adjacent edges labels. For each non-singleton connected component: Start with an arbitrary node, pick one of its values. This dictates all other values in the component. Repeat with the other value if it has one. So each non-singleton component yields at most two solutions.

27 Solution of the CSP {a,b} 5364 {b,d} {a,c} Since each non-singleton component yields at most two solutions. The likelihood is simply the product of sums each of two terms at most. Each component contributes one term. Singleton components contribute the term 1 In our example: 1 * [ p(a)p(b)p(a) + p(b)p(a)p(b)] * p(d)p(b)p(a)p(c). Complexity. Building the founder graph: O(f 2 +n). While solving general CSPs is NP-hard.