Download presentation
Presentation is loading. Please wait.
Published byMarianne Sundström Modified over 5 years ago
1
Incorporating uncertainty in distance-matrix phylogenetics
Wally Gilks Leeds University Tom Nye Newcastle University Pietro Liò Cambridge University Isaac Newton Institute December 17, 2007
2
Distance-based methods
Larger trees Faster algorithms Less model-dependent Genome-scale evolutionary rearrangements
3
Agglomerative distance methods
NJ (Saitou and Nei, 1987) BioNJ (Gascuel, 1997) Weighbor (Bruno et al, 2000) MVR (Gascuel, 2000) FastME (Desper and Gascuel, 2004)
4
Variance models Independent distances Correlated distances A
Ordinary Least Squares (OLS) Weighted Least Squares (WLS) NJ, Weighbor, FastME Correlated distances shared evolutionary paths (Chakraborty, 1977) computed from shared sequences: BioNJ induced by estimation process (we show) Generalised Least Squares (GLS) Hasegawa (1985), Bulmer (1991), MVR A A B C
5
Two types of tree Ultrametric time tree Non-ultrametric
divergence tree Time (mya) Divergence = “true distance” = integrated rate of evolution = path length Divergence more evolution
6
Which tree type to assume?
Ultrametric tree makes stronger assumptions Different methods for estimating each type But both types are in principle correct! Our method coherently integrates both types Produces rooted tree, no need for outgroup
7
An agglomerative stage
time tree divergence tree Time (mya) Divergence E E C A C A D B D B
8
Divergence additivity
divergence tree and for X = C,D,… E C A D B
9
Distances are estimated divergences
Regression model divergence tree mean zero and for X = C,D,… E C A D B parameters
10
Divergences are distorted times
A B C D E Time (mya) time tree parameter mean zero uncorrelated Random effects model
11
Variance assumptions controls noise function of clade A structure
variance parameters clade A size shared node A elapsed time Chakraborty (1977) Nei et al (1985) Bulmer (1991) controls distortion
12
Estimation Time tree and divergence tree are estimated simultaneously
by GLS (Hasegawa, 1985; Bulmer, 1991) Choose most recent agglomeration always Estimated divergences become the distances for the next stage Variance formula accommodates estimation-induced correlations
13
Notes Can estimate variance parameters s2 and n
Computationally efficient algorithm same time-complexity as BioNJ we call it StatTree
14
Simulations 16 taxa, unbalanced topology, 100 simulations
Mean topological correctness n=1% n=5% n=10% s=5% StatTree = 95% BioNJ = 83% StatTree = 89% BioNJ = 81% StatTree = 85% BioNJ = 77% s=10% StatTree = 72% BioNJ = 50% StatTree = 71% BioNJ = 48% StatTree = 67% BioNJ = 53% s=20% StatTree = 44% BioNJ = 28% StatTree = 45% BioNJ = 26% StatTree = 43% 16 taxa, unbalanced topology, 100 simulations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.