Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity.

Slides:



Advertisements
Similar presentations
The multispecies coalescent: implications for inferring species trees
Advertisements

The Coalescent Theory And coalescent- based population genetics programs.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Model-based species identification using DNA barcodes Bogdan Paşaniuc CSE Department, University of Connecticut Joint work with Ion Măndoiu and Sotirios.
Sampling Distributions (§ )
Sampling distributions of alleles under models of neutral evolution.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Discordance due to gene flow or horizontal gene transfer.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Lecture 23: Introduction to Coalescence April 7, 2014.
The Barcode Gap Speciation or Phylogeography? BANBURY 3 ? Graham Stone, Richard Challis, James Nicholls, Jenna Mann, Sonja Preuss Mark Blaxter Institute.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Heuristic alignment algorithms and cost matrices
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Chapter 24 The Origin of Species. 1- The fossil record chronicles two patterns of speciation (origin of new species). How would you characterize these.
1 Human Evolution Chapter Human evolution Closest living relatives Fossil hominids (“missing links”) Origin and spread of Homo sapiens.
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
RATES OF DIVERSIFICATION. BACKGROUND Rapid rate of diversification often follows the adaptive radiation + (sexual) selection New niches Mutation New species.
Biodiversity IV: genetics and conservation
Molecular phylogenetics
MECHANISMS FOR EVOLUTION Honors Biology. REVIEW Evidence for Evolution and Examples What is Natural Selection? How did Darwin develop theory of Natural.
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
A Bayesian method for DNA barcoding Kasper Munch, Wouter Boomsma, Eske Willerslev, Rasmus Nielsen, University of Copenhagen.
Conservation of Hawaiian Drosophila using phylogenetic, ecological and population genetic data. Patrick M. O’Grady University of California, Berkeley.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.
Phylogeny: Evolutionary History and Ancestry Background © 2008 Regents of the University of California. All rights reserved. Use for SGI Field Test only.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
TOWARDS TESTING THE EPIDEMIC CLONE MODEL OF BACTERIAL PATHOGENS Daniel J. Wilson, Gilean A.T. McVean and Martin C.J. Maiden Peter Medawar Building for.
Underlying Principles of Zoology Laws of physics and chemistry apply. Principles of genetics and evolution important. What is learned from one animal group.
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Genetic consequences of small population size Chapter 4
The biological distance and genetic evidence for long-range migration in the prehistoric Midwest Lyle W. Konigsberg Susan R. Frankenberg.
Biological inferences from barcoding data Timothy G. Barraclough Establishing a standard DNA barcode for land plants.
DNA Barcoding Statistics Rasmus Nielsen University of Copenhagen.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Gene tree discordance and multi-species coalescent models Noah Rosenberg December 21, 2007 James Degnan Randa Tao David Bryant Mike DeGiorgio.
Parsimony is Computationally Intensive
Connectivity over ecological and evolutionary time in coral reef fishes Serge PLANES Connectivity over ecological and evolutionary time Serge Planes
Species boundaries, phylogeography and conservation genetics of the red- legged frog (Rana aurora/drytonii) complex Presented by: Chris Burton & Matt Meyer.
Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public.
Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19,
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Figure 5.1 Giant panda (Ailuropoda melanoleuca)
Phylogeny & the Tree of Life
Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Estimating genetic diversity (  within populations  =  a function of the number of polymorphic sites in a population (S) “Watterson’s theta”
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Bioinf.cs.auckland.ac.nz Juin 2008 Uncorrelated and Autocorrelated relaxed phylogenetics Michaël Defoin-Platel and Alexei Drummond.
Comparative methods wrap-up and “key innovations”.
EVOLUTION Descent with Modification. How are these pictures examples of Evolution?
Lecture 3 - Concepts of Marine Ecology and Evolution II 3) Detecting evolution: HW Equilibrium Principle -Calculating allele frequencies, predicting genotypes.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Lecture 19 – Species Tree Estimation
Lecture 1.31 Criteria for optimal reception of radio signals.
Phylogeny & the Tree of Life
Parsimony is Computationally Intensive
CS639: Data Management for Data Science
Presentation transcript:

Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity of California, Berkeley Chris Meyer Museum of Vertebrate Zoology Craig Moritz

Outline Introduction - Species Discovery Potential problems - Simulations Potential problems - Empirical data Potential statistical solutions

New specimen in the field

Match new specimen’s DNA “barcode” to voucher specimens with barcodes in database

Organizes an enormous flood of data

Proposed genetic thresholds for discovery Comparing sample to closest sister taxon in reference database 1. Hebert’s 10X rule between species divergence must be > 10 times the average within species divergence 2. Reciprocal Monophyly

Species ASpecies BSpecies C4 Sampled Individuals Species C Species Tree ≠ Gene Tree Usually a “near miss” Noisy Problem

Genetic Threshold Species Delimitation Criteria Moving Target (mtDNA Barcode locus) Equal? Doubly Noisy Problem (Mental Construct?)

Genetic Threshold Species Delimitation Criteria Moving Target Not sensitive enough too sensitive Over-Discovery Under-Discovery (mtDNA Barcode locus) Equal? Doubly Noisy Problem

DNA-barcode gene (mtDNA, CO1 690 bp) Joint Simulation Exploration Simple BDM Model of Reproductive isolation: (Bateson-Dobzhansky-Muller) Problematic parameter space? Potential statistical solutions? Coalescent model

A, b a, b A, B a, BBad OK Genotype BDM Model Neutral and divergent selection (Gavrilets 2004) Speciation events - Poisson process (Bateson-Dobzhansky-Muller)

BDM loci Barcode locus (mtDNA) Island/Continent (peripatric) Divergence Time (generations)

Hickerson et al (in press; Systematic Biology)

Reciprocal monophyly Threshold

Coyne and Orr 1997

Not Species Coyne and Orr X

Not Species Coyne and Orr 1997

Presgraves 2002 Zigler et al. 2005Sasa et al Mendelson 2003 Bolnick and Near 2005

Migration Isolation time Move beyond “Yes/No” answers: Nielsen and Metz 2005 Bayesian posterior probabilities w/ ABC -answers with quantified uncertainty -very fast (< 30 seconds per query) -flexible (parameter threshold, model and prior changes according to taxonomic group) = moderate support for new species

Prior, parameter threshold and operative model is adjustable as appropriate for particular taxonomic group Mymarommatid wasps (10 rare living fossil species) African Cichlids (recent radiation) ?

Testing: Simulated data -Yule model (stochastic speciation/extinction) Empirical data - Chris Meyer (marine taxa) Extension of msBayes software pipeline Ongoing Work Determining appropriate priors, thresholds and models

Simulated data -Yule model Speciation and extinction follows a random birth/death process Time Extinction Speciation

Test = what % of sisters and orphans are detected as new species “discoveries? Orphan Sister-pair Test Data 1.Closest Divergence times - Sister’s and Orphans 2. Population sizes - Gamma distributed 50K-2.5M 3. Single specimens from “new” species 3,5,10,20, and 40 specimens from reference species

Yule modelEmpirical Data (Cowries) 100 lineages per clade 135 lineages Reference Species Discovery? Is it a new species? Function of Posterior Probability of divergence Time and gene flow

observed data Flexible Pre-simulated prior ~< 1 minute ABC Accept 0.2% SIMULATE 1,000,000 \ draws from model Posterior probability surface msBayes Software pipeline

Approximate Bayesian Computation (ABC) Prior Posterior

Prior Posterior Parameter threshold?

Bayes Factor = M 1 = yes, new species M 2 = no, same old species f ( M 1 given Data) f ( M 2 given Data) prior ( M 2 ) prior ( M 1 ) A way to compare evidence for these 2 discrete models

From simulated Yule phylogenies

Sample size optimized at 5 (so far)

Very Near Future 1. Better priors Species divergence time AND intra-species coalescence 2. Incorporate Migration 3. Hierarchical Model New species statusHyper-Parameter YesNo Prior(T,N)Prior(N, T=0) Hyper-Prior

ACKNOWLEDGEMENTS Coauthors C. Meyer C. Moritz Discussion C. Moritz C. Meyer T. Mendelson K. Zigler N. Rosenberg J. Degnan cpu resources J. McGuire Museum of Vertebrate Zoology Funding NSF DIMACS