Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.

Slides:



Advertisements
Similar presentations
Introduction to molecular dating methods. Principles Ultrametricity: All descendants of any node are equidistant from that node For extant species, branches,
Advertisements

Juan Daza UCF Fall 2008 Juan Daza UCF Fall 2008 Estimating divergence times from molecular data.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Practical Session: Bayesian evolutionary analysis by sampling trees (BEAST) Rebecca R. Gray, Ph.D. Department of Pathology University of Florida.
Sampling distributions of alleles under models of neutral evolution.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Clock I. Evolutionary rate Xuhua Xia
Molecular Evolution Revised 29/12/06
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Ln(7.9* ) –ln(6.2* ) is  2 – distributed with (n-2) degrees of freedom Output from Likelihood Method. Likelihood: 6.2*  = 0.34.
SRT Sequence model with Rates and Times Martin Linder, UU Tom Britton, SU Örjan Åkerborg, KTH Jens Lagergren, KTH, Bengt Sennblad.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tree Inference Methods
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
The Molecular Clock? By: T. Michael Dodson. Hypothesis For any given macromolecule (a protein or DNA sequence) the rate of evolution is approximately.
PHYLOGENETICS CONTINUED TESTS BY TUESDAY BECAUSE SOME PROBLEMS WITH SCANTRONS.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
MOLECULAR PHYLOGENETICS Four main families of molecular phylogenetic methods :  Parsimony  Distance methods  Maximum likelihood methods  Bayesian methods.
Calculating branch lengths from distances. ABC A B C----- a b c.
How to date Xuhua Xia
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Lecture 16 – Molecular Clocks Up until recently, studies such as this one relied on sequence evolution to behave in a clock-like fashion, with a uniform.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
Phylogeny Ch. 7 & 8.
NEW TOPIC: MOLECULAR EVOLUTION.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Bayesian Evolutionary Analysis by Sampling Trees (BEAST) LEE KIM-SUNG Environmental Health Institute National Environment Agency.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Full modeling versus summarizing gene- tree uncertainty: Method choice and species-tree accuracy L.L. Knowles et al., Molecular Phylogenetics and Evolution.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Evolutionary genomics can now be applied beyond ‘model’ organisms
Xuhua Xia How to date Xuhua Xia
Lecture 16 – Molecular Clocks
Distance based phylogenetics
Maximum likelihood (ML) method
In-Text Art, Ch. 16, p. 316 (1).
Models of Sequence Evolution
Molecular Clocks Rose Hoberman.
Summary and Recommendations
26.5 Molecular Clocks Help Track Evolutionary Time
Volume 19, Issue 5, Pages (May 2011)
Rates of Molecular Evolution Suggest Natural History of Life History Traits and a Post-K- Pg Nocturnal Bottleneck of Placentals  Jiaqi Wu, Takahiro Yonezawa,
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Presentation transcript:

bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond

Relaxed Phylogenetics2 (Bayesian) RELAXED PHYLOGENETICS Relaxed Phylogenetics allows the co-estimation of divergence times together with a phylogenetic reconstruction should be compared with b1b1 b2b2 b3b3 b4b4 b5b5 time t0t0 t1t1 t2t2 Unrooted (2n-3 parameters) Rooted with a strict clock (n-1 divergence times)

Relaxed Phylogenetics3 TIME, SUBSTITUTIONS, and RATES Time, substitutions and rates Expected number of substitutions per site on a particular branch i Substitution rate R(t) cannot be directly observed ! →Only the product of rate and time is identifiable →Without information external to the data, rate and time cannot be separated… time i T 0

Relaxed Phylogenetics4 MOLECULAR CLOCK HYPOTHESIS Molecular Clock Hypothesis (MCH) (Zuckerlandl and Pauling 1965) DNA and protein sequences change at a rate that is constant over time First the substitution rate is estimated then time corresponds to sequence divergence divided by the rate →Estimation of relative rate and relative divergence times Calibration Time reference, scaling Bayesian Phylogenetics : Priors on node height or on tips →Transform relative to absolute rate

Relaxed Phylogenetics5 MOLECULAR CLOCK HYPOTHESIS Substitution rate depends on Natural selection, population size, body mass, generation time, mutation rate, mutation pattern, … →MCH is often violated ! How to deal with non-clock like data Keep them ! Remove them ! Relax the MCH →Allow the rate of evolution to vary →Make assumptions about the variations

Relaxed Phylogenetics6 RELAXING THE MCH Modeling the “Rate of evolution of the rate of evolution” Sanderson “nonparametric” model (Random) Local Clock model Uncorrelated relaxed clock model Autocorrelated relaxed clock model Compound Poisson process Implementation of relaxed clock models in Beast allows to co-estimate the substitution parameters the clock parameters the ancestral phylogenies the demography … →Relaxed phylogenetics

Relaxed Phylogenetics7 UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006 Hypothesis The rate of evolution is probably never exactly the same for all evolutionary lineages Rates follow a given distribution Prior on rates →Distribution of the rates given by the hyperparameters  and  2 or

Relaxed Phylogenetics8 UNCORRELATED RELAXED CLOCK (UC) Drummond et al 2006 Implementation Different rates in a tree But a constant rate per branch On a given rooted tree of n species 2n-2 rates n-1 divergence times The distribution is discretized Each branch of the tree is assigned a given rate category Category mixing : swapped drawn (uniform) random walk time t0t0 t1t1 t2t r1r1 r0r0 r2r2 r3r3 r4r4 r5r relative rate r

Relaxed Phylogenetics9 AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 Hypothesis The rate is probably never exactly the same for all evolutionary lineages For closely related lineages the rates should be similar Prior on rates log of the rates follow a Normal distribution Expectation of a rate r is its ancestor rate r A →Rate at the root node is given by the hyperparameter  →Amount of variation is given by the hyperparameter  2 rArA r t

Relaxed Phylogenetics10 AUTOCORRELATED RELAXED CLOCK (AC) Thorne and Kishino 1998,2001,2002 Implementation Different rates in a tree But a constant rate per branch On a given rooted tree of n species 2n-2 rates n-1 divergence times Episodic vs Time dependent Episodic variance =  2 Time dependent variance = t  2 time t0t0 t1t1 t2t r1r1 r0r0 r2r2 r3r3 r4r4 r5r5

Relaxed Phylogenetics11 GOALS of this TALK Validation of models implementation Comparison of models Fit the data Deal with calibrations Estimate of divergence times Estimate of rates Reconstruct the tree topology

Relaxed Phylogenetics12 PHYLOGENETIC ANALYSIS Dataset 1: Lemurs (Yoder et al 2000) 36 species (lemurs + mammals outgroup) alignment of 1812 nucleotides (2 genes) 7 calibration points Settings HKY substitution model + gamma rate heterogeneity Yule tree prior 4 independent runs of 20 M steps of MCMC for each setting

Relaxed Phylogenetics13 PHYLOGENETIC ANALYSIS Dataset 2: Primates (Peter Waddell) 7 species of primates: human, chimp, gorilla, orangutan, gibbon, macaque and marmoset alignment of 1,362,261 nucleotides Non coding regions calibration : 16 MYA divergence time of human – orangutan Settings GTR substitution model + gamma rate heterogeneity + Invariant Coalescent or Yule tree prior 4 independent runs of 50 M steps of MCMC for each setting

Relaxed Phylogenetics14 PHYLOGENETIC ANALYSIS Dataset 3: Yeast (Rokas et al 2003) 8 species of yeast alignment of 127,026 nucleotides (106 genes) calibration : Normal prior on the root height N (1, 0.025) Settings GTR substitution model + gamma rate heterogeneity + Invariant Yule tree prior 4 independent runs of 50 M steps of MCMC for each setting

Relaxed Phylogenetics15 PHYLOGENETIC ANALYSIS Dataset 4: Dengue (Rambaut 2000) 17 serotype 4 sequences alignment of 1,485 nucleotides serial sampling ( ) Settings HKY substitution model Coalescent tree prior 4 independent runs of 10 M steps of MCMC for each setting

Relaxed Phylogenetics16 PHYLOGENETIC ANALYSIS Dataset 5 : Influenza A virus (Drummond et al 2006) 69 sequences each sequence represents a consensus of the viral population alignment of 98 nucleotides serial sampling ( ) Settings HKY substitution model + gamma rate heterogeneity Coalescent tree prior Constant population size 4 independent runs of 20 M steps of MCMC for each setting

Relaxed Phylogenetics17 MODEL COMPARISON Bayes Factor (Kass and Raftery 1995, Marc Suchard 2005) Quantifies the real support of two competing hypothesis given the observed data →Ratio of the marginal likelihood of two models M 1 and M 2 →Bayesian analogue of the likelihood rate test (LRT)

Relaxed Phylogenetics18 MARGINAL LOG LIKELIHOOD SCUCACeAC Lemurs Primates Yeast Dengue Influenza A priori Clock-likeCorrelatedCalibrations Lemurs No?7 internal (hard) Primates NearlyYes1 internal (soft) Yeast No?root node (soft) Dengue Yes Serial Sampling Influenza No Serial Sampling

Relaxed Phylogenetics19 Influenza dataset Consensus trees Uncorrelated AutoCorrelated

Relaxed Phylogenetics20 DIVERGENCE TIMES

Relaxed Phylogenetics21 DIVERGENCE TIMES Beast: mean of the posterior distributions, error bars are 95% lower and upper HPDs Glazko et al: error bars are +/- standard error

Relaxed Phylogenetics22 DIVERGENCE TIMES Uncorrelated Relaxed Clock Human Chimp Gorilla Orang Gibbon Macaque Marmoset Autocorrelated Relaxed Clock

Relaxed Phylogenetics23 RATE OF EVOLUTION MeanExternalCoefficient of Rate VariationCorrelation LemursSC UC AC eAC PrimatesSC UC AC eAC YeastSC UC AC eAC

Relaxed Phylogenetics24 RATE OF EVOLUTION MeanExternalCoefficient of Rate VariationCorrelation Dengue SC UC AC eAC InfluenzaSC UC AC eAC

Relaxed Phylogenetics25 RATE OF EVOLUTION

Relaxed Phylogenetics26 RATE OF EVOLUTION

Relaxed Phylogenetics27 GENES RATE VS SPECIES RATE Mean rate per “locus” Primates Yeast

Relaxed Phylogenetics28 NAÏVE MULTIPLE LOCUS APPROACH Super Matrix →Genes share the same divergence time Multiple Locus →Perform a relaxed phylogenetic analysis for each “genes” SCUCACeAC Yeast (SM) Yeast (mL) Primates (SM) Primates (mL)

Relaxed Phylogenetics29 GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES

Relaxed Phylogenetics30 GENES DIVERGENCE TIMES VS SPECIES DIVERGENCE TIMES Root Height in the primates dataset

Relaxed Phylogenetics31 GENES RATE VS SPECIES RATE Coefficient of Variation Coefficient of Correlation Super MatrixMultiple LocusSuper MatrixMultiple Locus YeastUC AC eAC PrimatesUC AC eAC

Relaxed Phylogenetics32 GENES TREE VS SPECIES TREE % True Tree inSize ofTrue Tree 95% Cred Set Posterior YeastSC UC AC eAC PrimatesSC UC AC eAC

Relaxed Phylogenetics33 GENES TREE VS SPECIES TREE

Relaxed Phylogenetics34 Conclusions Validation of the implementation in Beast Model comparison Fit the data Uncorrelated vs Autocorrelated : prior knowledge Calibrations Estimate of rates Disagree in the multiple locus approach Reconstruct the tree topology

Relaxed Phylogenetics35 THANKS