Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Review bootstrap and permutation
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Approximate random distribution of coefficients of correlation for two random variates  = 0.03 Under a normal approximation we can use Z-transformed score.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Chapter 13 – Boot Strap Method. Boot Strapping It is a computer simulation to generate random numbers from a sample. In Excel, it can simulate 5000 different.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
An Introduction to Phylogenetic Methods
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Molecular Evolution Revised 29/12/06
Multiple Sequence Alignment (MSA) and Phylogeny. Clustal X.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Sample size computations Petter Mostad
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Bootstrapping LING 572 Fei Xia 1/31/06.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Probabilistic methods for phylogenetic trees (Part 2)
Processing & Testing Phylogenetic Trees. Rooting.
STAT 572: Bootstrap Project Group Members: Cindy Bothwell Erik Barry Erhardt Nina Greenberg Casey Richardson Zachary Taylor.
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Characterizing the Phylogenetic Tree-Search Problem Daniel Money And Simon Whelan ~Anusha Sura.
AM Recitation 2/10/11.
Lecture 15 - Hypothesis Testing A. Competing a priori hypotheses - Paired-Sites Tests Null Hypothesis : There is no difference in support for one tree.
Molecular phylogenetics
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 03/10/2015 6:40 PM Final project: submission Wed Dec 15 th,2004.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Resampling techniques
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Bootstraps and Jackknives Hal Whitehead BIOL4062/5062.
Processing & Testing Phylogenetic Trees. Rooting.
Introduction to resampling in MATLAB. So you've done an experiment... Two independent datasets: control experimental Have n numbers in each dataset, representing.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Bootstrap ? See herehere. Maximum Likelihood and Model Choice The maximum Likelihood Ratio Test (LRT) allows to compare two nested models given a dataset.Likelihood.
Methods in Phylogenetic Inference Chris Castorena Thornton Lab.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Bootstrapping and Randomization Techniques Q560: Experimental Methods in Cognitive Science Lecture 15.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Lecture 15 - Hypothesis Testing
Bootstrap – The Statistician’s Magic Wand
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
Application of the Bootstrap Estimating a Population Mean
Bootstrap in refinement
Ch8.4 P-Values The P-value is the smallest level of significance at which H0 would be rejected when a specified test procedure is used on a given data.
Sampling distribution
Phylogenetic Inference
26.3 Shared Characters Are Used To Construct Phylogenetic Trees
When we free ourselves of desire,
Test for Mean of a Non-Normal Population – small n
Random Sampling Population Random sample: Statistics Point estimate
Evolutionary history of related organisms
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
Ch13 Empirical Methods.
Bootstrap and randomization methods
Presentation transcript:

Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable are they?

Reliability most of the time, ‘reliability’ refers to the topology, not to branch lengths. reliability = probability that the members of a given clade are always members of that clade

Methods phylogeneticists use different methods to test the reliability of trees Bootstrapping Jackknife Permutation tests Likelihood ratio tests (a)LRT

Bootstrapping bootstrapping uses random sampling with replacement to obtain properties of an estimator

Bootstrapping bootstrapping uses random sampling with replacement to obtain properties of an estimator

x Bootstrapping bootstrapping uses random sampling with replacement to obtain properties of an estimator x x x x times x f

Bootstrapping in phylogenetic bootstrapping, the alignment is resampled AAT CGC AGT TGT TCT 1 A C A A G 2 T C T T T 3 G G G G G 4 G G G G G 5 G G A C G 67 T C A G A 89 T G T T T 0 A C A T T 1 G G G G G 4 T C A G A 8 T C T T T 3 G G A C G 6 T G T T T 0 A C A A G 2 G G G G G 4 T C T T T 9 G G G G G 5 original alignment pseudo alignment

Bootstrapping A BC D EF A CB D EF original tree bootstrapped tree

A CB D EF A CB D EF A CB D EF A CB D EF Bootstrapping A BC D EF A CB D EF original tree bootstrapped trees

Jackknife methods the Jackknife procedure uses random sampling without replacement to obtain properties of an estimator

Jackknife methods the Jackknife procedure uses random sampling without replacement to obtain properties of an estimator

Permutation methods Permutation tests are standard in non- parametric statistics. They reorder the data to obtain a null distribution.

Permutation methods Permutation tests are standard in non- parametric statistics. They reorder the data to obtain a null distribution. N=18, x=20 N=10, x=25Dif=5

Permutation methods N=18, x=23 N=10, x=19.6Dif=3.4

Permutation methods difference f 5% largest differences 5% smallest differences actual difference

Permutation methods in phylogenetics, species can be permuted within characters AAT CGC AGT TGT TCT 1 A C A A G 2 T C T T T 3 G G G G G 4 G G G G G 5 G G A C G 67 T C A G A 89 T G T T T 0 species 1 species 2 species 3 species 4 species 5 A C A T T 1 A C A T T 1 reshuffle

Permutation methods in phylogenetics, species can be permuted within characters AAT CGC AGT TGT TCT 1 A C A A G 2 T C T T T 3 G G G G G 4 G G G G G 5 G G A C G 67 T C A G A 89 T G T T T 0 species 1 species 2 species 3 species 4 species 5 A C A T T 1 A C A A G 2 A C A A G 2 reshuffle

Permutation methods in phylogenetics, species can be permuted within characters AAT CGC AGT TGT TCT 1 A C A A G 2 T C T T T 3 G G G G G 4 G G G G G 5 G G A C G 67 T C A G A 89 T G T T T 0 species 1 species 2 species 3 species 4 species 5 A C A T T 1 A C A A G 2 A T G CG T GT C T T C T T T 3 G G G G G 4 G G G G G 5 G G A C G 67 T C A G A 89 T G T T T 0

Likelihood ratio tests A BC D EF G HI J KL X (ABCDEF) Y (GHI) W (J) Z (KL)

Likelihood ratio tests standard likelihood tests compare trees with and without the branch X (ABCDEF) Y (GHI) W (J) Z (KL) X Y W Z Likelihood = L 1 Likelihood = L 0 probability that branch exists = 2 * [ln L 1 – ln L 0 ]

Approximate likelihood ratio test aLRT is fast, accurate and powerful X Y W Z Likelihood = L 1 Likelihood = L 2 approximate probability that branch exists = 2 * [ln L 1 – ln L 2 ] X Z W Y Likelihood = L 3 X Z Y W