Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular Systematics

Similar presentations


Presentation on theme: "Molecular Systematics"— Presentation transcript:

1 Molecular Systematics
Maximum likelihood approaches are time consuming Bayesian approaches are similar in approach but more rapid ML attempts to find the tree that maximizes the probability of the data given a set of trees and a model Bayesian analyses attempts to find the tree that maximizes the probability of the tree given the data and model Impossible until recently – advances in computational methods (MCMC) and speed Based on Bayes’ Theorem - tells how to update or revise beliefs in light of new evidence a posteriori.

2 Molecular Systematics
A simple example – Imagine a box of 100 dice 90% are true, 10% are biased You pick a die randomly and are asked to determine if it is true or biased With no other information you must conclude that the probability of it being biased is 0.1 What if you had additional information?

3 Molecular Systematics
Roll the die twice  P[result | true die] = 1/62 = 1/36 = P[result | biased die] = 4/21 x 6/21 = The probability of the die being biased given this result is higher than the probability of the die being true given this result Bayes’ Theorem: P[biased die | results] = 0.18, an increase from the original 0.1 P[true die | results] = 0.82, a decrease from the original 0.9 These are the posterior probabilities that the die you chose is biased or true You have more information and are able to make a more informed decision In Bayesian phylogenetics we replace the dice with trees and attempt to maximize the posterior probability of our final tree given random permutations to a start tree P[biased die | results] = P[results | biased die] x P[biased die] P[results | biased die] x P[biased die] + P[results | true die] x P[true die] P[biased die | results] = x 0.1 x x 0.9 P[tree | data] = P[data | tree] x P[tree] P[data]

4 Molecular Systematics
The development that made Bayesian phylogenetic estimation possible is the Markov chain Monte Carlo (MCMC) method MCMC works by taking a series of steps that form a conceptual chain At each step, a new location in parameter space is proposed via random perturbation (usually a very small change) The relative posterior-probability of the new location is calculated If the new location has a higher posterior-probability density than that of the present location of the chain, the move is accepted — the proposed location becomes the next link in the chain and the cycle is repeated. If the proposed location has a lower posterior-probability density, the move will be accepted only a proportion (p) of the time (small steps downward are accepted often, whereas big leaps down are discouraged)

5 Molecular Systematics
If the proposed location is rejected, the present location is added as the next link in the chain By repeating this procedure millions of times, a long chain of locations in parameter space is created The proportion of the time that any tree (location) is visited along the course of the chain is an approximation of the posterior probability

6 Molecular Systematics
This method suffers from the same local optimum problem as most other hill climbing methods Bayesian analyses overcome this by running several analyses simultaneously, usually 4 These four independent chains occasionally exchange information in an effort to avoid getting trapped on less than optimal hills

7 Molecular Systematics
A MrBayes analysis – a Metropolis-coupled MCMC (MCMCMC): Begins by proposing eight random trees (two independent sets of four chains each) For one of these sets, all four chains will randomly perturb the trees and recalculate the posterior probabilities One chain is considered ‘cold’. This is the chain whose posterior probability is actually measured

8 Molecular Systematics
Other three chains are ‘hot’ and have a different (but similar) tree space. The difference is in the magnitude of the peak heights Because ‘drops’ are not as large on the heated chains, they are more free to explore the tree space and less likely to become trapped.

9 Molecular Systematics
All four chains continue with occasional switching between them to avoid getting caught on particular hills Eventually each set of runs will begin to plateau and run out of changes (even chain switches) that can improve the tree (convergence) How do we know when this has been reached? That’s where the second set of chains comes in Each set should converge on ~ the same tree Average standard deviation of split frequencies is a measure of the tree similarity for each set of chains.

10 Molecular Systematics
Once convergence has occurred, we need to generate a consensus tree Important that we don’t include any of the initial (essentially random) trees, only the ones that were obtained after the analysis reached a plateau The burn-in is the set of sampled trees that we discard in favor of the (likely) more accurate trees The consensus tree is generated from the data collected after the burn-in burn-in


Download ppt "Molecular Systematics"

Similar presentations


Ads by Google