Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity of California, Berkeley Chris Meyer Museum of Vertebrate Zoology Craig Moritz
Outline Introduction - Species Discovery Potential problems - Simulations Potential problems - Empirical data Potential statistical solutions
New specimen in the field
Match new specimen’s DNA “barcode” to voucher specimens with barcodes in database
Organizes an enormous flood of data
Proposed genetic thresholds for discovery Comparing sample to closest sister taxon in reference database 1. Hebert’s 10X rule between species divergence must be > 10 times the average within species divergence 2. Reciprocal Monophyly
Species ASpecies BSpecies C4 Sampled Individuals Species C Species Tree ≠ Gene Tree Usually a “near miss” Noisy Problem
Genetic Threshold Species Delimitation Criteria Moving Target (mtDNA Barcode locus) Equal? Doubly Noisy Problem (Mental Construct?)
Genetic Threshold Species Delimitation Criteria Moving Target Not sensitive enough too sensitive Over-Discovery Under-Discovery (mtDNA Barcode locus) Equal? Doubly Noisy Problem
DNA-barcode gene (mtDNA, CO1 690 bp) Joint Simulation Exploration Simple BDM Model of Reproductive isolation: (Bateson-Dobzhansky-Muller) Problematic parameter space? Potential statistical solutions? Coalescent model
A, b a, b A, B a, BBad OK Genotype BDM Model Neutral and divergent selection (Gavrilets 2004) Speciation events - Poisson process (Bateson-Dobzhansky-Muller)
BDM loci Barcode locus (mtDNA) Island/Continent (peripatric) Divergence Time (generations)
Hickerson et al (in press; Systematic Biology)
Reciprocal monophyly Threshold
Coyne and Orr 1997
Not Species Coyne and Orr X
Not Species Coyne and Orr 1997
Presgraves 2002 Zigler et al. 2005Sasa et al Mendelson 2003 Bolnick and Near 2005
Migration Isolation time Move beyond “Yes/No” answers: Nielsen and Metz 2005 Bayesian posterior probabilities w/ ABC -answers with quantified uncertainty -very fast (< 30 seconds per query) -flexible (parameter threshold, model and prior changes according to taxonomic group) = moderate support for new species
Prior, parameter threshold and operative model is adjustable as appropriate for particular taxonomic group Mymarommatid wasps (10 rare living fossil species) African Cichlids (recent radiation) ?
Testing: Simulated data -Yule model (stochastic speciation/extinction) Empirical data - Chris Meyer (marine taxa) Extension of msBayes software pipeline Ongoing Work Determining appropriate priors, thresholds and models
Simulated data -Yule model Speciation and extinction follows a random birth/death process Time Extinction Speciation
Test = what % of sisters and orphans are detected as new species “discoveries? Orphan Sister-pair Test Data 1.Closest Divergence times - Sister’s and Orphans 2. Population sizes - Gamma distributed 50K-2.5M 3. Single specimens from “new” species 3,5,10,20, and 40 specimens from reference species
Yule modelEmpirical Data (Cowries) 100 lineages per clade 135 lineages Reference Species Discovery? Is it a new species? Function of Posterior Probability of divergence Time and gene flow
observed data Flexible Pre-simulated prior ~< 1 minute ABC Accept 0.2% SIMULATE 1,000,000 \ draws from model Posterior probability surface msBayes Software pipeline
Approximate Bayesian Computation (ABC) Prior Posterior
Prior Posterior Parameter threshold?
Bayes Factor = M 1 = yes, new species M 2 = no, same old species f ( M 1 given Data) f ( M 2 given Data) prior ( M 2 ) prior ( M 1 ) A way to compare evidence for these 2 discrete models
From simulated Yule phylogenies
Sample size optimized at 5 (so far)
Very Near Future 1. Better priors Species divergence time AND intra-species coalescence 2. Incorporate Migration 3. Hierarchical Model New species statusHyper-Parameter YesNo Prior(T,N)Prior(N, T=0) Hyper-Prior
ACKNOWLEDGEMENTS Coauthors C. Meyer C. Moritz Discussion C. Moritz C. Meyer T. Mendelson K. Zigler N. Rosenberg J. Degnan cpu resources J. McGuire Museum of Vertebrate Zoology Funding NSF DIMACS