Download presentation
Presentation is loading. Please wait.
Published byPreston Wood Modified over 9 years ago
1
Probabilistic Approaches to Phylogenies BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Oct 2 nd, 2014
2
Readings Chapter 8 – Sections 8.1, 8.2, 8.3, 8.4
3
Key concepts Scoring a tree based on likelihood of observed data – How to compute the likelihood of sequences using a given tree and conditional probabilities Felsenstein’s algorithm General understanding of where the conditional probabilities are obtained from General understanding of search strategies – (similar to parsimony)
4
Probabilistic methods for phylogenies data: a set of n sequences tree: a phylogenetic tree for the n sequences Two approaches – Maximum likelihood framework P(data|tree) – Bayesian framework P(tree|data) Both need a probabilistic model to compute the probability of a set of sequences, given a tree and branch lengths
5
Maximum likelihood methods for phylogenetic trees “best” tree is the one that maximizes the likelihood of data (sequence) given model (Tree topology and branch length) Phylogenetic tree construction requires – Scoring a tree Requires computing the likelihood of sequence given the tree That is computing the probability of the sequences given a tree – Searching the space of possible trees Given a probabilistic model of sequence changes, we can compute the tree with the greatest likelihood (maximum likelihood)
6
Notation for computing the probability of sequences given a tree x j : sequence at node j x j i : character in the i th position for the j th sequence t j : length of the j th branch T : tree topology P(x|y,t): probability of switching from y to x from ancestor to child along a branch of length t – We will come back to defining such probabilistic models later
7
Computing the probability of sequences on a tree If we know P(x|y,t) we can compute the probability of the sequences This relies on a key assumption: – sequence at a child node i is independent of everything else given i ’s parent – E.g. for a node i whose parent is given by α(i) P(x i |x α(i),x j,x k..)=P(x i |x α(i) ) We will also make additional simplifying assumptions – We have an ungapped alignment – Characters at different sites evolve independently
8
Example of computing the probability of sequences given a tree x1x1 x2x2 x3x3 x4x4 x5x5 t1t1 t2t2 t3t3 t4t4 The probability of these sequences given this tree is Assume we are given the following tree for three sequences x 1, x 2 and x 3 at the leaf nodes First, assume that the sites evolve independently, and there are a total on N sites Hereafter, let’s just focus on one site, u Also for clarity, we will use t for all branch lengths
9
Example continued The expression Assume conditional independence, the above is Written more compactly as α(i) denotes the parent of i x1x1 x2x2 x3x3 x4x4 x5x5 t1t1 t2t2 t3t3 t4t4
10
Example continued Or more generally for n sequences at the leaves as Between internal nodes Between extant and internal nodes
11
But.. the ancestral sequences cannot be observed So our probability calculation needs to sum over all ancestral states Let us consider a simple example of two sequences xu1xu1 xu2xu2 t1t1 t2t2
12
Summing of ancestral state for a pair of sequences q a is the probability of observing character a at the root node 3 xu1xu1 xu2xu2 t1t1 t2t2 a
13
Generalizing this to n sequences Requires us to sum over all of the internal nodes α(i) gives the parent of I a i is a variable storing the character at the i th internal node Felsenstein’s algorithm gives an efficient way to compute this quantity Between internal nodes Between extant and internal nodes
14
Felsenstein’s algorithm Input: Given a set of n sequences at the leaf nodes, conditional probability distribution of character switch, and a tree topology Output: The likelihood of the sequences Also based on Dynamic programming – Relies on computations performed in subtrees for computations at the root of these subtrees Very similar ideas as in the Weighted Parsimony algorithm
15
Notation for Felsenstein’s algorithm P(L k |a): probability of the leaves below node k, given that the residue at k is a i and j will denote the children of k a, b, c characters at any node We’ll drop the subscript u and work with only one site
16
Felsentein’s algorithm Initialize: k=2n-1 Recursion: – If k is a leaf node, – Else, compute P(L i |a) and P(L j |a) for all a at daughters i and j Termination – Likelihood at a site
17
An observation and a simplication Note that Further more, we will assume that the conditional probabilities are independent of branch length Finally, assume q a =0.25 for all a in {A,T,G,C}
18
What is probability for the following set of residues ATG 1 23 4 5 ACGT A0.70.1 C 0.70.1 G 0.70.1 T 0.7 Assume the above conditional probability matrix P(b|a) for all branches a b
19
In class exercise
20
The probabilities computed for each node Probability of sequence given tree is 0.25(0.0058+0.0022+0.0154 + 0.0058)=0.0073
21
Felsentein’s algorithm comments Very similar to the weighted parsimony case – Main differences are at Leaf nodes Minimization versus summation for internal nodes Can it be used to infer ancestral states as well? – Instead of summing, we would maximize – As in the parsimony case, we would need to keep track of the maximizing assignment
22
Errata Slide 16 and slide 17 should have – Replace P(L i |a) by P(L i |b) Slide 20 had typos.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.