16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.
The Coalescent Theory And coalescent- based population genetics programs.
Chapter 6 Sampling and Sampling Distributions
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Sampling distributions of alleles under models of neutral evolution.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
BIO2093 – Phylogenetics Darren Soanes Phylogeny I.
Lecture 23: Introduction to Coalescence April 7, 2014.
Reading Phylogenetic Trees
Molecular Evolution Revised 29/12/06
Chapter 7 Sampling and Sampling Distributions
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
1 Pertemuan 06 Sebaran Normal dan Sampling Matakuliah: >K0614/ >FISIKA Tahun: >2006.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Part III: Inference Topic 6 Sampling and Sampling Distributions
1 Tricks for trees: Having reconstructed phylogenies what can we do with them? DIMACS, June 2006 Mike Steel Allan Wilson Centre for Molecular Ecology and.
Today Today: Chapter 8, start Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Tests of Hypothesis [Motivational Example]. It is claimed that the average grade of all 12 year old children in a country in a particular aptitude test.
Gene Trees and Species Trees: Lessons from morning glories Lauren A. Eserman & Richard E. Miller Department of Biological Sciences Southeastern Louisiana.
Chapter 6 Sampling and Sampling Distributions
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Molecular phylogenetics
A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.
AP Statistics Chapter 9 Notes.
Estimation of Statistical Parameters
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Extensions to Basic Coalescent Chapter 4, Part 2.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
Sampling and sampling distibutions. Sampling from a finite and an infinite population Simple random sample (finite population) – Population size N, sample.
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Gene trees and species trees (cont.). If we pick the adjacent nucleotide, what gene tree do we expect?
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Estimating Species Tree from Gene Trees by Minimizing Duplications
Confidence Interval Estimation For statistical inference in decision making:
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
New methods for estimating species trees from genome-scale data Tandy Warnow The University of Illinois.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Estimating genetic diversity (  within populations  =  a function of the number of polymorphic sites in a population (S) “Watterson’s theta”
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
CS 598 AGB Supertrees Tandy Warnow. Today’s Material Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree.
Sampling Distributions
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
1 Outline 1.Count data 2.Properties of the multinomial experiment 3.Testing the null hypothesis 4.Examples.
A coalescent computational platform to predict strength of association for clinical samples Gabor T. Marth Department of Biology, Boston College
Chapter 26 Phylogeny and the Tree of Life
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 6 Sampling and Sampling Distributions
Lecture 19 – Species Tree Estimation
Inference: Conclusion with Confidence
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Inference: Conclusion with Confidence
Mathematical and Computational Challenges in Reconstructing Evolution
Agenda 10/8 Seashell Sort Phylogeny Lecture Phylogenetics Pracice
Reading Phylogenetic Trees
Section 11.7 Probability.
Advances in Phylogenomic Estimation
26134 Business Statistics Autumn 2017
Presentation transcript:

16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human Genetics, U. of Michigan 2 Bioinformatics Program, U. of Michigan 3 Dept. of Mathematics, U. of Auckland

Outline  Species trees vs. gene trees  Consensus tree background  Asymptotic consensus trees  Finite sample consensus trees  Consistency results  Conclusions

Gene trees vary across the genome

Why? Incomplete lineage sorting, horizontal gene transfer, sampling, etc.

Gene tree discordance  From one true species tree, we expect there to be different gene trees at different loci as a result of lineage sorting, independently of problems due to estimation or sampling error.  Gene tree discordance depends especially on branch lengths in the species tree, measured by the number of generations scaled by effective population size, t / (2N).

Consensus (majority-rule)

Asymptotic consensus trees  Consensus trees are usually statistics, functions of data like x-bar.  We consider replacing observed (estimated) gene trees with their theoretical probabilities under coalescence and determining the resulting consensus tree.  Motivation: if there are a large number of independent loci, observed clade proportions should approximate their theoretical probabilities.

Types of consensus trees  Strict—only clades that are included in observed trees are in the consensus tree. In the coalescent model, all clades have probability > 0.  Democratic vote—use the gene tree that occurs most frequently.  Majority rule—consensus tree has all clades that were observed in > 50% of trees.  Greedy—sort clades by their proportions. Accept the most frequently observed clades one at a time that are compatible with already accepted clades. Do this until you have a fully resolved tree.  R*—for each set of 3 taxa, find the most commonly occurring triple e.g., (AB)C, (AC)B or (BC)A. Build the tree from the most commonly occurring triples.

Unresolved zone for majority-rule and too-greedy zone

What about finite samples?  If you sample 10 loci, you could have:  All 10 match the species tree  9 match the species tree, 1 disagrees  8 match the species tree, 2 disagree, etc.  You can consider gene trees as categories and use multinomial probabilities for the probability of your sample  By enumerating all multinomial samples, you can compute the probabilities of every possible consensus tree.

Are consensus trees inconsistent estimators of species trees?  Theorem 1. Majority-rule asymptotic consensus trees (MACTs) do not have any clades not on the species tree.  Theorem 2. Greedy asymptotic consensus trees (GACTs) can be misleading estimators of species for the 4-taxon asymmetric tree and for any species tree with n > 4 species.  Theorem 3. R* asymptotic consensus trees (RACTs) always match the species tree.

Conclusions  Coalescent gene tree probabilities are useful for understanding asymptotic behavior of consensus trees constructed from independent gene trees.  R* consensus trees are consistent and more resolved than majority-rule consensus trees.  Greedy consensus trees can be misleading, but are quicker to approach the species tree than majority-rule or R* when outside of the greedy zone.