Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski,

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
Evolution of Populations
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Genome-Wide SNP Genotyping in Grape – What is Next? Part of National Genetic Trait Index Project CRIS# D USDA-ARS Geneva, Cornell, Davis,
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Atelier INSERM – La Londe Les Maures – Mai 2004
Signatures of Selection
14 Molecular Evolution and Population Genetics
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Generation and Analysis of AFLP Data
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Landscapes in Indiana Dunes Landscape features are dynamic and can be dated: 100s – 1000s of years for dunes 10s – 100s of years for blowouts Big blowout.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Salit Kark Department of Evolution, Systematics and Ecology The Silberman Institute of Life Sciences The Hebrew University of Jerusalem Conservation Biology.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA
Sequence Variation Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
DNA basics DNA is a molecule located in the nucleus of a cell Every cell in an organism contains the same DNA Characteristics of DNA varies between individuals.
The plant of the day Welwitschia is a monotypic gymnosperm genus
Maria Eugenia D’Amato Slide 1
POPULATION GENETIC STRUCTURE AND NATURAL VARIATION OF THE MODEL PLANT, ARABIDOPSIS THALIANA, IN ITS NATIVE SOUTHERN RANGE EXTREME AC Brennan, B Méndez-Vigo,
Genetic Variations Lakshmi K Matukumalli. Human – Mouse Comparison.
The Evolution of Populations.  Emphasizes the extensive genetic variation within populations and recognizes the importance of quantitative characteristics.
Chapter 23 Notes The Evolution of Populations. Concept 23.1 Darwin and Mendel were contemporaries of the 19 th century - at the time both were unappreciated.
GBS Bioinformatics Pipeline(s) Overview
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
“Recent next generation sequencing results” MACHADO LAB.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Population Genetics Evolution depends upon mutation to create new alleles. Evolution occurs as a result of population level changes in allele frequencies.
CS177 Lecture 10 SNPs and Human Genetic Variation
Development and Application of SNP markers in Genome of shrimp (Fenneropenaeus chinensis) Jianyong Zhang Marine Biology.
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Phylogenomics “The intersection of phylogenetics and genomics”
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.

CASE7——RAD-seq for Grape genetic map construction
NEW TOPIC: MOLECULAR EVOLUTION.
No reference available
Methods  DNA was isolated from blood samples collected at four separate locations.  Samples were Nanodropped to ensure proper concentrations of DNA.
Genomics of Adaptation
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Characterizing the short tandem repeat mutation process at every locus in the genome Melissa Gymrek Genome Informatics
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
In populations of finite size, sampling of gametes from the gene pool can cause evolution. Incorporating Genetic Drift.
SNP Discovery in Whole-Genome Light-Shotgun 454 Pyrosequences Aaron Quinlan 1, Andrew Clark 2, Elaine Mardis 3, Gabor Marth 1 (1) Department of Biology,
Analysis of Next Generation Sequence Data BIOST /06/2015.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Synteny - many distantly related species have co- linear maps for portions of their genomes; co-linearity between maize and sorghum, between maize and.
EVOLUTION Descent with Modification. How are these pictures examples of Evolution?
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Gene flow and speciation. Mechanism for speciation Allopatric speciation Sympatric speciation.
Samuel A. Logan, Prattana Phuekvilai and Kirsten Wolff
Interpreting exomes and genomes: a beginner’s guide
Lecture 6: Genotype by sequencing
BRC Science Highlight Many genomic positions in switchgrass contribute to flowering time, a major biomass yield determinant Objective Gain a better understanding.
Lucas D. Baker1 Vikram E. Chhatre2 Hayley C. Lanier1
Tell me the difference between and all that you know about…
Lecture 6: Genotype by sequencing
The Evolution of Populations
Track the Split of Crocodile Sub Populations
Introduction to Sequencing
Presentation transcript:

Genomic diversity and population structure in switchgrass, Panicum virgatum: Genotyping-by-sequencing and population genomics Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution University of Chicago Geoff Morris*, Paul Grabowski, Justin Borevitz Dept. of Ecology and Evolution University of Chicago

Genomic diversity and population structure Geographic patterns of genomic diversity reflect: drift, migration, and adaptation Genomic diversity: nucleotide variation and insertions/deletions across many loci in the nuclear and organellar genomes. Leads to design of mapping populations for quantitative genetics and molecular breeding

Genomic diversity and natural history Emerson et al. PNAS 2010 Example: Pitcher plant mosquito (Wyeomyia smithii)

Ecotypic diversity in switchgrass Switchgrass and other wide-ranging grassland species have many ecotypes Great variability in size, shape, color, and habitat preference Example: Upland/lowland divergence Upland (Michigan) Lowland (Oklahoma) Adapted to: Shorter growing season, Drier climates Adapted to: Long growing season, Wet climates

Effects of ecotype diversity of productivity Three year plot (6m 2 ) experiment at Fermilab ~20% overyield in switchgrass mixtures compared to monocultures

“Genomic diversity and population structure in switchgrass, Panicum virgatum: from the continental scale to a dune landscape” Morris, Grabowski, and Borevitz Accepted, Molecular Ecology

Biogeography of Indiana Dunes flora Coastal Plain flora: e.g. Seaside spurge, Marramgrass Boreal flora: e.g. Jack Pine, Bearberry Great Plains flora: e.g. Sandreed, Little Bluestem Eastern deciduous flora: e.g. Tulip tree Recolonized post- glaciacation: ~10,000 years ago

Switchgrass gene pools Zhang et al ?

Landscapes in Indiana Dunes Landscape features are dynamic and can be dated: 100s – 1000s of years for dunes 10s – 100s of years for blowouts Big blowout ~ 150 years old

Study questions Can switchgrass population structure be confirmed with a genome-wide sample of non-ascertained markers? In a hierarchical sample of switchgrass, how much diversity is there on a landscape, regional, and continental scale? Did multiple switchgrass gene pools contribute to the Indiana Dunes populations? Is there genomic diversity in a single landscape feature (blowout)? Is there local (private) genetic diversity in the Indiana Dunes?

Switchgrass plant samples Switchgrass cultivated varieties (cultivars) – Kanlow (Oklahoma - lowland) – Blackwell (Oklahoma - upland) – High Tide (Maryland - Coastal) – Forestburg and Sunburst (South Dakota) – Dacotah (North Dakota) – Cave-in-Rock (Illinois) – Southlow (Southern Michigan “ecopool”) Indiana Dunes switchgrass – Big Blowout – Jack pine savanna – Interdune

Problems with traditional markers systems Locus sampling: – Typically only a few kb are sequenced in a few loci (rDNA, cp introns) – Large stochastic error and loci-specific bias – e.g. Plant chloroplast has 100X lower rate of evolution than animal mitochondria Ascertainment bias: – Occurs whenever markers are discovered and typed separately – Worst when ascertainment panel is geographically restricted subpopulation – e.g. Inferred genetic diversity in Africans is spuriously low when when European markers are used

= restriction site 1) PstI digest of genomic DNA 2) End-polish, blunt-end ligation; Illumina barcodes 3) PCR amplify and pool fragments from multiple samples 4) Assemble and map reads to “stacks” and call SNPs Genomic diversity from de novo sequencing Reduced representation + multiplexing = more samples 10,000+ candidate SNPs No reference genome needed Data here from 76 or 100 bp paired end reads 40 billion base pair data set

Plastome sequence in RRLs Nuclear whole genome shotgun sequence is too light (<<1X) for assembly Plastome WGS is very high (>>1X) 1) PstI digest of genomic DNA, with star activity and random shearing 2) End-polish, blunt-end ligation

Analysis of chloroplast data Chloroplast genome sequence (plastome) included in data Random (shotgun) sequence + 20 PstI sites Switchgrass chloroplast reference available (Upland and Lowland) Mapped reads to both ~140,000 base pair chloroplast genomes Coverage (# of times each position is read): 1X – 786X

Chloroplast coverage and polymorphisms Position (kb) Chloroplas t Genome Coverage

Chloroplast phylogeny Neighbor joining tree based on 140kb Named haplogroups have >50% bootstrap Unfilled lines indicate low-coverage sample

Chloroplast phylogeny

Population analysis of nuclear loci Create “pseudoreference” of RRL loci with de novo assembly Map reads to pseudoreference to create stacks ( reads) Map reads to switchgrass chloroplast and sorghum mitochondria, and drop stacks that match organelles Select single-nucleotide variants that: – Have high sequence quality (PHRED score < for both alleles) – Vary in frequency across samples (chi-square < 0.01) – Are nearest to restriction site, closest to beginning of read Randomly select one allele per sample (weighted by observed frequency)

Coding sequence variation in the chloroplast 77 coding genes in chloroplast (including Rubisco, ribosome, etc) – 60kb of coding sequence Constraints in non-synonymous (NS) vs. synonymous (S) variation provides biological validation for SNPs Upland vs. Lowland (~1 million years): – 23 NS : 16 S (ratio = 1.4) Within upland ( < 0.5 millions years) – 16 NS : 3 S (ratio = 5.3)

Nuclear genome: Multidimensional scaling ~11000 nuclear loci, mean of 100 random allele samples

Nuclear loci: Structure analysis Bayesian clustering algorithm ~11000 nuclear loci, random allele sample, Burn-in 10K, Run 10K

Conclusions Confirmed upland vs. lowland differentiation and differentiated a local population using non-ascertained markers Lake Michigan switchgrass is distinct from broader upland population in midwest and Great Plains. Post-glacial gene flow into the Indiana Dunes included genotypes from across the Great Plains and Midwest The chloroplast diversity in the Indiana Dunes did not evolve in the current midwestern population, but originated one or more glacial cycles ago A single blowout in the dunes can have as much chloroplast diversity as the Midwest

New GBS methods for population genomics For true population analysis we need 10+ individuals in multiple populations Illumina multiplexing is too expensive – separate prep cost for each library adds $100s/sample Read count overdispersion (up to ~200X more Poisson) requires technical replicates to even counts Sticky-end ligation increases specificity and removes random sequence (including plastome)

Genotype-By-Sequencing (GBS) Based on Elshire et al. 2011, PlosONE

GBS on continental + dunes switchgrass

New population genomic studies with GBS 1.Continental population structure (126 individuals) – 50/50 deep diversity and shallow diversity based on chloroplast markers and SSRs 2.Tetraploid cultivars (24 each for TX, OK, NE, ND cultivars) – Ploidy differences may be confounded with genetic diversity – High sample size should allow traditional pop gen analyses (Fst etc...) 3.Dune half-sibs (4 mothers and 10 offspring each) – True SNPs will segregate in the offspring while homeologous substitutions will not

Bioinformatics overview No software package for population genomic analysis on GBS Stacks (U. Oregon) comes closest but multinomial sampling model expects high frequency SNPs (e.g. mapping population) Buckler lab TASSEL package (Java) may be appropriate We’ve been using custom pipeline (CLC, MySQL, R) for analysis –