Genome-Wide SNP Genotyping in Grape – What is Next? Part of National Genetic Trait Index Project CRIS# D USDA-ARS Geneva, Cornell, Davis, Cold Spring Harbor Acknowledgement: Sean Miles/Doreen Ware
Team Edward Buckler and Sean Myles – Genomics and statistical analysis Doreen Ware, Jer-Ming Chia, Bonnie Hurwitz – Bioinformatics Charles Simon, Gan-Yuan Zhong, Mallikarjuna Aradhya, Bernard Prins – Germplasm
Genus Vitis Contains over 60 species mostly found in temperate regions of the northern hemisphere (Both Old and New World Distributions) – ~3500 accessions Vitis vinifera is the most important domesticated species cultivated for table grapes and wine making (~1300 Accessions) The wild grape Vitis sylvestris is considered the progenitor of the domesticated grape Highly heterozygous and low LD (~200bp)
Genetic Diversity in the Domesticated Grape Cluster DensityCluster Size Berry Size Berry Shape ? Genetic Diversity
Objectives 1. Grape as a Model Crop for National Genetic Trait Index (NGTI) 2. Characterization of Molecular Diversity – Functional Variability 3. Genome-wide Association Mapping 4. Identify Markers Associated With Economic Traits 5. Develop Strategies for Marker Assisted Breeding – Juvenile Selection in Perennial/Tree Crops
Steps Step 1: SNP Discovery - Next-Generation Sequencing to sample diversity – DNA preparation, sequencing method and analysis of sequencing reads for variation – Characterization of SNPs: position, allele support, and coverage – 10k SNP array development Step 2: Genotype and Assemble Data for Analysis Step 3: Phenotyping
Step 1: Discovery of genetic variants (SNPs) Diverse Samples 10 cultivated Vitis varieties (Vitis vinifera) 6 wild Vitis species Genome complexity reduction Digestion with HpaII restriction enzyme Illuminia/Solexa sequencing Sequencing by synthesis 60 million sequences Total: 2 billion base pairs of sequence Discovery of >1 million SNPs Make data available Integrate SNP data into public grape genome browser
SNP Discovery Panel Goal: Capture recent variation in the genus Vitis RRLs constructed from 10 domesticated cultivars and 6 wild species 1.Ehrenfelser 2.French Colombard 3.Gewurztraminer 4.Kadarka 5.Malvasia 6.Muscat of Alexandria 7.Pinot Noir 8.Plavac Mali 9.Thompson Seedless 10.White Riesling 11.Vitis amurensis 12.Vitis cinerea 13.Vitis labrusca 14.Vitis palmata 15.Vitis rotundifolia 16.Vitis sylvestris 17.Inbred Pinot Noir (Reference Genome)
Library Construction Protocol Reducing the complexity of the Genome DNA Extraction Whole Genome Amplification* Genome Complexity Reduction: Restriction enzyme digest Genome Complexity Reduction: Restriction enzyme digest Size Selection from Gel: bp Addition of ‘A’ Base to 3`ends Ligation of Solexa Adaptors Solexa Genome Analyzer
Next-Generation Sequence Analysis Workflow Aln Consensus & Quality Variation Discovery Filters Called SNPs Variation Variation Discovery Gapped Alignment Ungapped Alignment Alignments Mapped to genome? YES NO Read Mapping Data Accessibility Image files from Solexa GA Base Calling Sequence and Base Quality Data Storage Sequence and Base Quality Firecrest, Bustard
Deciphering Genetic Diversity From High-Throughput Sequencing
Overview of the Solexa SNP pipeline 1.56 Million reads (1.8 billion bp) are aligned to the reference genome – The divergence within V. vinifera and with other Vitis is so great we need to develop other algorithms to map the reads Million regions of the genome have potential SNPs, which are statistically evaluated for genotypic basis. 3.50,000 high probability SNPs are identified 4.Empirically validating a small subset of the data. 5.With improved algorithms and increased knowledge of grape diversity, we may be able to extract 100,000s of SNPs.
Mapping Statistics of reads from each of the germplasm to the reference vitis genome
10K SNPs Consequence within Genomic Sequence SNP consequence data facilitated via the integration of SNP calls with the genome annotation through Ensembl Selected 10K SNPs enriched for genic SNPs. In contrast, genome is 46% in genic space, 41% repetitive/transposable elements
10K SNPs: Segregation Patterns
Step 2: Genotyping the grape germplasm repository SNP selection Choose 10,000 high quality SNPs from the 500,000 Solexa SNPs 10K SNP chip Production of custom 10,000 (8898) SNP genotyping array Genotype the germplasm repository cultivated species (Vitis vinifera) wild species 21 million genotypes Analyses -Establish core germplasm collection -Identify synonyms and homonyms - Association mapping - Estimate population genetic parameters
PCA analysis of array scored SNPs show clustering of the different germplasm
PCA are able to discriminate between the wild variety
1809 samples 6907 SNPs
Eurasian wild Vitis American wild Vitis
Differentiation between rotundifolia subspecies
MDS plot Vitis vinifera & Vitis sylvestris 6907 SNPs
MDS plot Vitis vinifera 6907 SNPs Error or biologically interesting?
Profiling anthocyanins (525 nm) and other phenolics in grapes (HPLC-DAD chromatograms) 525nm365nm280nm Phenotyping Economic Traits/ Key Secondary Metabolites of Grapes Phenotyping the USDA-ARS Vitis collections will be the next critical step for maximizing the value of the current genotyping effort A pilot project has been initiated for phenotyping key secondary metabolites of the Vitis collections from both Davis, CA and Geneva, NY About 400 V. vinifera and 200 North American collections will be phenotyped for 50 various phenolics including anthocyanins
Funding: CRIS: D Thanks