Lecture 16 – Molecular Clocks

Slides:



Advertisements
Similar presentations
Introduction to molecular dating methods. Principles Ultrametricity: All descendants of any node are equidistant from that node For extant species, branches,
Advertisements

Sampling distributions of alleles under models of neutral evolution.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
The Tree of Life From Ernst Haeckel, 1891.
Ln(7.9* ) –ln(6.2* ) is  2 – distributed with (n-2) degrees of freedom Output from Likelihood Method. Likelihood: 6.2*  = 0.34.
Probabilistic methods for phylogenetic trees (Part 2)
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Phylogenetic trees Sushmita Roy BMI/CS 576
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Comparative methods: Using trees to study evolution.
Terminology of phylogenetic trees
Molecular phylogenetics
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
PAML: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang Depart of Biology University College London
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Calculating branch lengths from distances. ABC A B C----- a b c.
Lecture 16 – Molecular Clocks Up until recently, studies such as this one relied on sequence evolution to behave in a clock-like fashion, with a uniform.
The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Lecture 15: Reconstruction of Phylogeny Adaptive characters: 1.May indicate derived character (special adaptation) e.g. Raptorial forelegs in mantids 2.May.
Evaluating the Fossil Record with Model Phylogenies Cladistic relationships can be determined without ideas about stratigraphic completeness; implied gaps.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Comparative methods wrap-up and “key innovations”.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Lecture 19 – Species Tree Estimation
Estimating standard error using bootstrap
Section 2: Modern Systematics
Lecture 15 - Hypothesis Testing
Lecture 14 – Consensus Trees & Nodal Support
From: On the Origin of Darwin's Finches
Phylogenetics LLO9 Maximum Likelihood and Its Applications
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Evolutionary genomics can now be applied beyond ‘model’ organisms
Root Finding Methods Fish 559; Lecture 15 a.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Distance based phylogenetics
Lecture 6B – Optimality Criteria: ML & ME
Inferring a phylogeny is an estimation procedure.
Maximum likelihood (ML) method
Figure 1. Recent estimates of Neoaves phylogeny. a) The Jarvis et al
Section 2: Modern Systematics
Clustering methods Tree building methods for distance-based trees
Models of Sequence Evolution
The Tree of Life From Ernst Haeckel, 1891.
Parsimony is Computationally Intensive
Why Models of Sequence Evolution Matter
18.2 Modern Systematics I. Traditional Systematics
Reading Phylogenetic Trees
Lecture 6B – Optimality Criteria: ML & ME
Chapter 19 Molecular Phylogenetics
The Most General Markov Substitution Model on an Unrooted Tree
Lecture 7 – Algorithmic Approaches
Morphological Phylogenetics in the Genomic Age
Lecture 14 – Consensus Trees & Nodal Support
The Examination of Residuals
Molecular data assisted morphological analyses
Modern Comparative Methods
But what if there is a large amount of homoplasy in the data?
Incorporating uncertainty in distance-matrix phylogenetics
Michael S.Y. Lee, Julien Soubrier, Gregory D. Edgecombe 
Presentation transcript:

Lecture 16 – Molecular Clocks Fossil calibration Up until recently, studies such as this one relied on sequence evolution to behave in a clock-like fashion, with a uniform rate across the topology.

Method of Calibration Matters a Great Deal (dos Reis et al. 2014. Biol. Lett. 10:20131003)

II. Tests of the Molecular Clock. Two identical topologies, and they differ in (2n-3) branch lengths. These trees converge when the following 2 conditions are met: 1) a = b; c = d; a + f = g + c 2) The root occurs along branch e, such that e’’ = e’ + g + c So in the clock tree, we focus on times of (n-1) nodes w, x, y, & z.

A. LRT of the Molecular Clock The clock tree represents the special case (with branch lengths constrained as above) and the non-clock tree represents the general model. d = 2 [ lnL(non-clock) – lnL(clock)] We use the asymptotic c2-approximation, where d.f. equal to the difference in the number of parameters between the two models. In the non-clock tree there are 2n – 3 branch lengths, whereas in the clock tree there are n – 1 node times to estimate. (2n – 3) – (n – 1) = n – 2 Although I’ve only seen it done once (Lanfear et al. 2010. PNAS), one could use AIC or BIC to compare clock to non-clock models in the same manner.

A. LRT of the Molecular Clock The test statistic is: d = 32.34. With 13 d.f., the p-value from the c2-distribution is 0.0021.

A. LRT of the Molecular Clock To conduct the parametric bootstrap, we would use the clock tree as the true tree and simulate data under a molecular clock.  This difference could be the result of either (or both) of two things. It may be that in this case the asymptotic approximation of the c2-distribution doesn’t work very well. It also may be attributable to the fact that this data set mixes intraspecific and interspecific comparisons.

B. Relative-Rates Tests y x m1 m2 Does d(A-E) - d(B-E) = 0 ? Tajima (1993): If a clock holds, the number of sites that show the pattern yxx (m1) should equal the number of sites showing the pattern xyx (m2). (m1 – m2)2 / (m1 + m2) Muse & Weir (1992) developed an RRT that uses LRT and also uses a c2 with one d.f. Does d(A-E) – d(C-E) = 0 ? Does d(B-E) – d(D-E) = 0 ? Does d(A-E) – d(D-E) = 0 ? Does d(C-E) – d(D-E) = 0 ? Does d(B-E) – d(C-E) = 0 ? Does d(A-E) – d(B-E) = 0 ?

III. Estimating Divergence Times in the Absence of a Clock A. Linearized Trees - RRT’s to identify offending species or a clade . B. Local Clocks – Rate shifts may be pretty rare (Yoder & Yang. 2000. MB&E) So here we have three local clocks and we would date divergences, say within the 4-taxon tree using rate 3 (Figure from Welch & Bromham, 2005. TREE, 20:320). We need to locate rate shifts on the tree.

Random Local Clocks Drummond & Suchard (2010. BMCBiol.) developed a RLC model that locates rate changes. The rate of branch k: rk = cr x rpa(k) x fk cr is a rate scaling constant rpa(k) is the rate of the parent branch of k fk is the branch-specific rate multiplier.

C. Autocorrelation of Rates Rates of closely related taxa are expected to be more similar than rates of more distantly related taxa. Non-parametric rate smoothing (Sanderson 1997) - The sum of squared differences in local rates that are minimized, as follows: Since rk = bk/tk, NPRS uses parsimony reconstructions as proxies for bk’s and finds the combination of internal node times (t1, t2, … tn-1) that minimizes W. Fully parametric versions have also been implemented.

D. Uncorrelated Rates A group of approaches have relaxed the assumption of correlated rates. Every branch can have a unique rate, these are drawn from some distribution Uncorrelated LogNormal (UCLN) approach of Drummond et al. (2006. PLoSBiology, 4:699.) rates are drawn from a discretized LogNormal Distribution

models for 1056 mammalian data sets. D. Uncorrelated Rates Distributions other than LogNormal have been proposed: Inverse Gaussian, Exponential Li & Drummond (2012. MB&E) developed an rjMCMC to treat the relaxed clock model as a random variable and assess posterior probabilities of each of these three models for 1056 mammalian data sets. Most of these data sets don’t voice any preference for the models, so the model averaging this allows in divergence-time estimation is potentially important.

E. Inclusion of Fossils with Molecular Data In using fossil dates to set priors on node ages, we need some idea of the topology of the tree, and we need to where the calibration points are. An emerging approach that circumvents this uses RevBayes (Hohna et al. 2016. Syst. Biol. 65:726) to include fossil taxa in the character by taxon matrix. The data then include both morphological and molecular characters and separate models are applied to each data type in a single analysis. One of the huge advances is that the locations of the dated nodes in the phylogeny (i.e., the ones pertaining to the fossils) are estimated from the data, and the uncertainty in these placements is incorporated into divergence time estimates across the chronogram (i.e., dated phylogeny).

Gavryushkina et al. (2016. Syst. Biol. 66:57)

adoption of the relaxed-clock approaches. F. A Final Note of Caution Divergence times are important and interesting, and there seems to be euphoric adoption of the relaxed-clock approaches. Data simulated under a clock, but with biased detection of rate variation.