Gene Trees and Species Trees: Lessons from morning glories Lauren A. Eserman & Richard E. Miller Department of Biological Sciences Southeastern Louisiana University
Introduction DNA sequences are an important source of data for phylogenetic reconstruction Single-gene trees were considered exciting and sufficient at one time Chase and 41 other authors, 1993 Phylogeny of angiosperms using rbcL
Introduction Next sequenced additional gene regions – Philosophical argument for “total evidence” –more data will strengthen the ability to determine species relationships – Used concatenated datasets to implement this idea – Still dominates the way species trees are estimated
Introduction Population genetics and coalescent theory emphasize that genes have unique histories – Gene trees do not always reflect the true species history
Introduction – Gene duplication events – Horizontal gene transfer – Incomplete lineage sorting (deep coalescence) – Branch length heterogeneity Gene tree heterogeneity can come about by: Edwards, 2009 This provides evidence against concatenation
Introduction Paradigm shift in systematics? (Edwards, 2009) – Moving away from notion that gene trees show true species relationships – Promotes synergism of phylogenetic systematics with population genetics and coalescent theory
Introduction Application of the paradigm shift: – Use collective information from multiple gene trees to estimate a species tree – Consider conflicting results, valid alternative hypotheses for species relationships
Research Objectives 1.Explore how gene trees with different phylogenetic signal influence the estimated species tree – Using 28 gene trees – Effects of concatenation on estimated species tree 2.Alternative objective is to obtain an understanding of species relationships for the organisms of interest (not discussed here)
Morning glories are generally species of the genus Ipomoea (not monophyletic) Focus on tribe Ipomoeeae −Ipomoea + 9 other genera −c. 900 species −Distributed throughout the subtropics and tropics of the world Study Organisms Ipomoea nil
Methods 1. Bayesian phylogenetic analysis of 28 gene trees Obtained 28 gene regions for species of Ipomoeeae based on our research and additional genes from GenBank Number of taxa ranged from 6 to 129 Alignments using MAFFT and manually adjusted Models of nucleotide substitution chosen using jModelTest Gene trees constructed using MrBayes v3.1.2 – 4 runs, 4 chains sampling every generations – Runs were continued until stationary distribution was estimated – Burn-in determined as asymptote in plots of total tree length by generations – Convergence criteria: Same topology among 4 runs PP of clade support ±3% among 4 runs – Majority rule consensus tree constructed from a combination of post- burnin trees from all 4 runs
ITS tree used as working hypothesis – Densest taxon sampling (129 species) – Good intrageneric resolution – NOTE: Not assuming this is the species tree – rather, a working hypothesis to compare to other gene trees Topology and clade support of 27 other gene trees compared to ITS gene tree Methods 1. Synthesis of 28 gene trees
myb1 PHAR MINA DFRB-2 PHAR MINA Results 1. Same relationships between ITS and other genes
Results 2. Individual species with unique positions not shown in any other gene tree bHLH3 PHAR DFRB-2 PHAR MINA
Results 3. Major alternative topology in CHSE CHSE PHAR MINA
Results 4. Identify new unnamed clades bHLH2 PHAR MINA BATA ‘VIOL’ waxy 1 ‘OBSC’ MINA PHAR TRIC CALO ‘AMNI’ BATA
Methods 2. Concatenated dataset To address the issue of concatenation, constructed concatenated dataset using 10 genes All gene trees showed similar topologies DFRB-1 UF3GT CHI
Results 10-gene concatenated dataset Maintains topologies of individual gene trees
Concatenated dataset What happens when one more gene is added? Add CHSE to 10-gene concatenated dataset – Alternative topology – All coding region – No indels
Results 11-gene concatenated dataset Exhibits topology of CHSE – new gene overwhelms this analysis
10-gene concatenated dataset11-gene concatenated dataset
BEST Analysis Bayesian Estimation of Species Trees (Liu, 2008) Incorporates a multispecies coalescent model to estimate species tree from many gene trees Methods: – 11-gene concatenated dataset – 2 runs, 4 chains – 8 million generations (did not reach convergence on topology)
BEST Analysis Results: – Clade present in CHSE appears in BEST tree – Overall topology differs – Species pairs supported throughout
Discussion Analysis of 28 gene trees – Provides an estimate of species tree – Alternative hypotheses for species relationships have emerged – Overall congruence among gene trees
Discussion – Concatenated datasets Total evidence philosophically justified but misleads results because of gene tree heterogeneity – Shown clearly in 11-gene concatenated dataset Left with idea that we have two alternative hypotheses of species relationships – Two estimates of the species tree
Discussion – Concatenated datasets Can now appreciate how a single gene can overwhelm results of a concatenated dataset – Topology of CHSE dominated
Ipomoea purpurea Seed Donations: M. Clegg, M. Rausher, J. A. McDonald, J. Miller P. Tiffin, B. Zufall, S.M. Chang Research Assistants: A. McDaniel, K. Robichaux, W. Terry, S. Major, H. Echlin, F. St. Cyr Acknowledgements