Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incorporating uncertainty in distance-matrix phylogenetics

Similar presentations


Presentation on theme: "Incorporating uncertainty in distance-matrix phylogenetics"— Presentation transcript:

1 Incorporating uncertainty in distance-matrix phylogenetics
Wally Gilks Leeds University Tom Nye Newcastle University Pietro Liò Cambridge University Isaac Newton Institute December 17, 2007

2 Distance-based methods
Larger trees Faster algorithms Less model-dependent Genome-scale evolutionary rearrangements

3 Agglomerative distance methods
NJ (Saitou and Nei, 1987) BioNJ (Gascuel, 1997) Weighbor (Bruno et al, 2000) MVR (Gascuel, 2000) FastME (Desper and Gascuel, 2004)

4 Variance models Independent distances Correlated distances A
Ordinary Least Squares (OLS) Weighted Least Squares (WLS) NJ, Weighbor, FastME Correlated distances shared evolutionary paths (Chakraborty, 1977) computed from shared sequences: BioNJ induced by estimation process (we show) Generalised Least Squares (GLS) Hasegawa (1985), Bulmer (1991), MVR A A B C

5 Two types of tree Ultrametric time tree Non-ultrametric
divergence tree Time (mya) Divergence = “true distance” = integrated rate of evolution = path length Divergence more evolution

6 Which tree type to assume?
Ultrametric tree makes stronger assumptions Different methods for estimating each type But both types are in principle correct! Our method coherently integrates both types Produces rooted tree, no need for outgroup

7 An agglomerative stage
time tree divergence tree Time (mya) Divergence E E C A C A D B D B

8 Divergence additivity
divergence tree and for X = C,D,… E C A D B

9 Distances are estimated divergences
Regression model divergence tree mean zero and for X = C,D,… E C A D B parameters

10 Divergences are distorted times
A B C D E Time (mya) time tree parameter mean zero uncorrelated Random effects model

11 Variance assumptions controls noise function of clade A structure
variance parameters clade A size shared node A elapsed time Chakraborty (1977) Nei et al (1985) Bulmer (1991) controls distortion

12 Estimation Time tree and divergence tree are estimated simultaneously
by GLS (Hasegawa, 1985; Bulmer, 1991) Choose most recent agglomeration always Estimated divergences become the distances for the next stage Variance formula accommodates estimation-induced correlations

13 Notes Can estimate variance parameters s2 and n
Computationally efficient algorithm same time-complexity as BioNJ we call it StatTree

14 Simulations 16 taxa, unbalanced topology, 100 simulations
Mean topological correctness n=1% n=5% n=10% s=5% StatTree = 95% BioNJ = 83% StatTree = 89% BioNJ = 81% StatTree = 85% BioNJ = 77% s=10% StatTree = 72% BioNJ = 50% StatTree = 71% BioNJ = 48% StatTree = 67% BioNJ = 53% s=20% StatTree = 44% BioNJ = 28% StatTree = 45% BioNJ = 26% StatTree = 43% 16 taxa, unbalanced topology, 100 simulations


Download ppt "Incorporating uncertainty in distance-matrix phylogenetics"

Similar presentations


Ads by Google