Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.

Similar presentations


Presentation on theme: "Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond."— Presentation transcript:

1 bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond

2 Relaxed Phylogenetics2 (Bayesian) RELAXED PHYLOGENETICS Relaxed Phylogenetics allows the co-estimation of divergence times together with a phylogenetic reconstruction should be compared with b1b1 b2b2 b3b3 b4b4 b5b5 time t0t0 t1t1 t2t2 Unrooted (2n-3 parameters) Rooted with a strict clock (n-1 divergence times)

3 Relaxed Phylogenetics3 TIME, SUBSTITUTIONS, and RATES Time, substitutions and rates Expected number of substitutions per site on a particular branch i Substitution rate R(t) cannot be directly observed ! →Only the product of rate and time is identifiable →Without information external to the data, rate and time cannot be separated… time i T 0

4 Relaxed Phylogenetics4 MOLECULAR CLOCK HYPOTHESIS Molecular Clock Hypothesis (MCH) (Zuckerlandl and Pauling 1965) DNA and protein sequences change at a rate that is constant over time First the substitution rate is estimated then time corresponds to sequence divergence divided by the rate →Estimation of relative rate and relative divergence times Calibration Time reference, scaling Bayesian Phylogenetics : Priors on node height or on tips →Transform relative to absolute rate

5 Relaxed Phylogenetics5 MOLECULAR CLOCK HYPOTHESIS Substitution rate depends on Natural selection, population size, body mass, generation time, mutation rate, mutation pattern, … →MCH is often violated ! How to deal with non-clock like data Keep them ! Remove them ! Relax the MCH →Allow the rate of evolution to vary →Make assumptions about the variations

6 Relaxed Phylogenetics6 RELAXING THE MCH Modeling the “Rate of evolution of the rate of evolution” Sanderson “nonparametric” model (Random) Local Clock model Uncorrelated relaxed clock model Autocorrelated relaxed clock model Compound Poisson process Implementation of relaxed clock models in Beast allows to co-estimate the substitution parameters the clock parameters the ancestral phylogenies the demography … →Relaxed phylogenetics

7 Relaxed Phylogenetics7 UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006 Hypothesis The rate of evolution is probably never exactly the same for all evolutionary lineages Rates follow a given distribution Prior on rates →Distribution of the rates given by the hyperparameters  and  2 or

8 Relaxed Phylogenetics8 UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006 Implementation Different rates in a tree But a constant rate per branch On a given rooted tree of n species 2n-2 rates n-1 divergence times The distribution is discretized Each branch of the tree is assigned a given rate category Category mixing : swapped drawn (uniform) random walk time t0t0 t1t1 t2t2 4 3 2 1 r1r1 r0r0 r2r2 r3r3 r4r4 r5r5 0246810 relative rate r

9 Relaxed Phylogenetics9 AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 Hypothesis The rate is probably never exactly the same for all evolutionary lineages For closely related lineages the rates should be similar Prior on rates log of the rates follow a Normal distribution Expectation of a rate r is its ancestor rate r A →Rate at the root node is given by the hyperparameter  →Amount of variation is given by the hyperparameter  2 rArA r t

10 Relaxed Phylogenetics10 AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 Implementation Different rates in a tree But a constant rate per branch On a given rooted tree of n species 2n-2 rates n-1 divergence times Episodic vs Time dependent Episodic variance =  2 Time dependent variance = t  2 time t0t0 t1t1 t2t2 4 3 2 1 r1r1 r0r0 r2r2 r3r3 r4r4 r5r5

11 Relaxed Phylogenetics11 GOALS of this TALK Validation of models implementation Comparison of models Fit the data Deal with calibrations Estimate of divergence times Estimate of rates Reconstruct the tree topology

12 Relaxed Phylogenetics12 PHYLOGENETIC ANALYSIS Dataset 1: Lemurs (Yoder et al 2000) 36 species (lemurs + mammals outgroup) alignment of 1812 nucleotides (2 genes) 7 calibration points Settings HKY substitution model + gamma rate heterogeneity Yule tree prior 4 independent runs of 20 M steps of MCMC for each setting

13 Relaxed Phylogenetics13 PHYLOGENETIC ANALYSIS Dataset 2: Primates (Peter Waddell) 7 species of primates: human, chimp, gorilla, orangutan, gibbon, macaque and marmoset alignment of 1,362,261 nucleotides Non coding regions calibration : 16 MYA divergence time of human – orangutan Settings GTR substitution model + gamma rate heterogeneity + Invariant Coalescent or Yule tree prior 4 independent runs of 50 M steps of MCMC for each setting

14 Relaxed Phylogenetics14 PHYLOGENETIC ANALYSIS Dataset 3: Yeast (Rokas et al 2003) 8 species of yeast alignment of 127,026 nucleotides (106 genes) calibration : Normal prior on the root height N (1, 0.025) Settings GTR substitution model + gamma rate heterogeneity + Invariant Yule tree prior 4 independent runs of 50 M steps of MCMC for each setting

15 Relaxed Phylogenetics15 PHYLOGENETIC ANALYSIS Dataset 4: Dengue (Rambaut 2000) 17 serotype 4 sequences alignment of 1,485 nucleotides serial sampling (1956-1994) Settings HKY substitution model Coalescent tree prior 4 independent runs of 10 M steps of MCMC for each setting

16 Relaxed Phylogenetics16 PHYLOGENETIC ANALYSIS Dataset 5 : Influenza A virus (Drummond et al 2006) 69 sequences each sequence represents a consensus of the viral population alignment of 98 nucleotides serial sampling (1981-1998) Settings HKY substitution model + gamma rate heterogeneity Coalescent tree prior Constant population size 4 independent runs of 20 M steps of MCMC for each setting

17 Relaxed Phylogenetics17 MODEL COMPARISON Bayes Factor (Kass and Raftery 1995, Marc Suchard 2005) Quantifies the real support of two competing hypothesis given the observed data →Ratio of the marginal likelihood of two models M 1 and M 2 →Bayesian analogue of the likelihood rate test (LRT)

18 Relaxed Phylogenetics18 MARGINAL LOG LIKELIHOOD SCUCACeAC Lemurs -31 524.7 -31 349.3 -31 355.4-31 352.3 Primates -3 090 089.90 -3 089 592.76-3 089 591.72-3 089 591.37 Yeast -684 380.8 -683 754.6-683 754.4-683 754.6 Dengue -3 861.7-3 861.5-3 861.9-3 861.7 Influenza -4 288.8 -4 263.9 -4272.1-4 275.7 A priori Clock-likeCorrelatedCalibrations Lemurs No?7 internal (hard) Primates NearlyYes1 internal (soft) Yeast No?root node (soft) Dengue Yes Serial Sampling Influenza No Serial Sampling

19 Relaxed Phylogenetics19 Influenza dataset Consensus trees Uncorrelated AutoCorrelated

20 Relaxed Phylogenetics20 DIVERGENCE TIMES

21 Relaxed Phylogenetics21 DIVERGENCE TIMES Beast: mean of the posterior distributions, error bars are 95% lower and upper HPDs Glazko et al: error bars are +/- standard error

22 Relaxed Phylogenetics22 DIVERGENCE TIMES Uncorrelated Relaxed Clock Human Chimp Gorilla Orang Gibbon Macaque Marmoset Autocorrelated Relaxed Clock

23 Relaxed Phylogenetics23 RATE OF EVOLUTION MeanExternalCoefficient of Rate VariationCorrelation LemursSC0.00297-- UC 0.003090.003570.390.01 AC0.003250.004190.370.88 eAC0.003250.004720.490.88 PrimatesSC0.00095-- UC 0.000980.000990.12-0.14 AC 0.001050.001000.110.56 eAC 0.001040.000990.110.74 YeastSC1.03-- UC 0.870.830.46-0.13 AC 0.830.790.370.19 eAC 0.900.980.440.33

24 Relaxed Phylogenetics24 RATE OF EVOLUTION MeanExternalCoefficient of Rate VariationCorrelation Dengue SC 0.00080-- UC 0.000810.000820.06-0.03 AC 0.000790.000800.060.69 eAC 0.000790.000810.050.69 InfluenzaSC0.0048-- UC 0.00500.00610.58-0.01 AC0.00500.00520.370.87 eAC0.00450.00520.380.89

25 Relaxed Phylogenetics25 RATE OF EVOLUTION

26 Relaxed Phylogenetics26 RATE OF EVOLUTION

27 Relaxed Phylogenetics27 GENES RATE VS SPECIES RATE Mean rate per “locus” Primates Yeast

28 Relaxed Phylogenetics28 NAÏVE MULTIPLE LOCUS APPROACH Super Matrix →Genes share the same divergence time Multiple Locus →Perform a relaxed phylogenetic analysis for each “genes” SCUCACeAC Yeast (SM) -684 380.8 -683 754.6-683 754.4-683 754.6 Yeast (mL) -672 854.3-672 135.5 -672 115.8 -672 128.86 Primates (SM) -3 090 089.90 -3 089 592.76-3 089 591.72-3 089 591.37 Primates (mL) -3 078 315.48 -3 077 756.50 -3 077 784.95-3 078 136.58

29 Relaxed Phylogenetics29 GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES

30 Relaxed Phylogenetics30 GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES Root Height in the primates dataset

31 Relaxed Phylogenetics31 GENES RATE VS SPECIES RATE Coefficient of Variation Coefficient of Correlation Super MatrixMultiple LocusSuper MatrixMultiple Locus YeastUC 0.46 0.75 -0.13 -0.07 AC 0.370.710.190.39 eAC 0.44 0.77 0.33 0.34 PrimatesUC 0.120.16-0.14-0.08 AC 0.11 0.10 0.56 0.44 eAC 0.11 0.03 0.74 0.49

32 Relaxed Phylogenetics32 GENES TREE VS SPECIES TREE % True Tree inSize ofTrue Tree 95% Cred Set Posterior YeastSC64.7 2.925.4 UC92.4 24.720.6 AC 88.617.815.7 eAC 88.615.119.1 PrimatesSC86.71.179.4 UC 87.51.375.7 AC 87.51.277.7 eAC 87.51.179.1

33 Relaxed Phylogenetics33 GENES TREE VS SPECIES TREE

34 Relaxed Phylogenetics34 Conclusions Validation of the implementation in Beast Model comparison Fit the data Uncorrelated vs Autocorrelated : prior knowledge Calibrations Estimate of rates Disagree in the multiple locus approach Reconstruct the tree topology

35 Relaxed Phylogenetics35 THANKS


Download ppt "Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond."

Similar presentations


Ads by Google