Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic methods for phylogenetic trees (Part 2)

Similar presentations


Presentation on theme: "Probabilistic methods for phylogenetic trees (Part 2)"— Presentation transcript:

1 Probabilistic methods for phylogenetic trees (Part 2)
Sushmita Roy BMI/CS 576 Oct 7th, 2014

2 Probabilistic methods for phylogenetic tree construction
RECAP Probabilistic methods for phylogenetic tree construction P(data|tree) Maximum likelihood Felsenstein algorithm for computing the likelihood of a sequence given a tree

3 Probabilistic models of evolution
The probability of a character switching from a to b along a branch of length t, P(b|a,t) is captured by the matrix For example for DNA this is:

4 Defining the conditional probability distributions
If we consider t to be evolutionary time, these conditional probabilities can be obtained what is called a continuous time Markov process Such processes are defined by a K-by-K rate matrix R Each entry of R, R(a,b) gives a rate of substitution from a to b The time spent in any state (character) is exponentially distributed If we have R, S(t) can be obtained from R Using the theory of continuous time Markov processes

5 Rate matrices A rate matrix R
Is a K-by-K matrix where K is the size of our alphabet E.g. for DNA K=4 Different rate matrices make different assumptions of substitutions Jukes Cantor: all substitutions have same rates. Kimura: transitions (A<->G, C<->T) and transversions (A<->C,A<->T,G<->C,G<->T) have different rates. Hasegawa, Kishino, Yano (HKY, all substitutions have different rates).

6 Jukes Cantor Rate matrix
Simplest possible rate matrix for DNA sequence evolution Assumes all bases change at the same rate A T G C A Mutations are occurring from rows to columns. T G C

7 Conditional probabilities from Jukes Cantor
The conditional probability matrix, P(a|b,t) has a similar form as the rate matrix A T G C A T G C P(G|C,t) Equilibrium distribution: ¼ for all bases

8 Searching phylogenetic tree space with maximum likelihood
As in the maximum parsimony case we need to Score a tree Search over the space of possible trees Score a given tree Branch lengths are parameters Estimate the branch lengths to maximize the likelihood of data given tree Search over trees Start with an initial tree A greedy approach of adding a branch that maximizes the likelihood Neighbor Joining Revisit using nearest neighbor interchange or subtree grafting approaches until convergence

9 Some advantages of probabilistic approaches
Probabilistic models can be naturally extended to more realistic model Model site specific parameters Model gaps A probabilistic framework can be used to evaluate different models of varying complexity (more parameters) Different evolutionary models Easily combined with other probabilistic models Hidden Markov models

10 Modeling site-specific parameters
Recall we had assumed that the probabilities at each is the same This could be relaxed by introducing additional parameters per site, ru

11 Probabilistic interpretation of Parsimony
Recall P(a|b,t) is the key quantity of interest Replace P(a|b,t) by P(a|b) and use –log P(a|b) as the score Applying the weighted parsimony algorithm on this score to get the minimal cost tree will give an approximation to likelihood The one associated with the most likely assignment of the ancestral states

12 Bootstrap: Assessing reliability of phylogenetic trees
Bootstrap: a computational strategy used to assess confidence in an estimated quantity E.g. branch length Tree branching topology Generate a bunch of trees, {T1,…,TN}, from N random samples of the data Sample columns/sites with replacement Reconstruct a tree from sampled columns One can estimate the confidence of any tree feature based on the proportion of times the feature is seen in a tree in {T1,…,TN}

13 Example of bootstrap Ziheng Yang and Bruce Rannala, Nature Reviews Genetics 2012

14 Some common phylogenetic tree construction algorithms
PhyML Maximum likelihood, Nearest neighbor interchange, subtree pruning and regrafting RAxML (Randomized Axelerated Maximum Likelihood) Exists in both sequential and parallel versions Also does subtree pruning and regrafting PhyLIP (From Felsenstein) Package for distance-based, parsimony, ML methods BEAST (Bayesian) MCMC based sampling MrBayes (Bayesian) Visit here for more

15 Comments about phylogenetic tree construction
Which method to pick? Neighbor joining: fast, constructs right tree if the distances are additive Parsimony: does not make any assumption of distances Probabilistic: More principled, provides a systematic framework to estimate model parameters Enables us to quantify uncertainty in the model, evaluate different models of evolution If ML distances are additive NJ can construct the right tree If branch lengths are ignored, weighted parsimony and maximum likelihood are equivalent Search space may be large, but can find the optimal tree efficiently in some cases heuristic methods can be applied Difficult to evaluate inferred phylogenies: ground truth not usually known can look at agreement across different sources of evidence can look at repeatability across subsamples of the data (bootstrap) can look at indirect predictions, e.g. conservation of sites in proteins Methods could be assessed based on a simulation framework based on a probabilistic model of phylogenies Phylogenies for bacteria, viruses not so straightforward because of lateral transfer of genetic material


Download ppt "Probabilistic methods for phylogenetic trees (Part 2)"

Similar presentations


Ads by Google