Creating An Allele Index For NPGS: Bioinformatic Issues Edward Buckler USDA-ARS at Cornell University, Ithaca, NY
AIM: Make more useful plants by conserving, finding and combining better alleles. NEED: The National Germplasm conserves 464,000 accessions and may contain 100,000,000 distinct alleles, but there is no index.
Population structure Familial relatedness Genetic mapping is the basis of the index, and QTL mapping approaches now exist for virtually all types of populations. Near gene level resolution achieved in multiple species Identification of genes controlling flowering, starch, nutrients, wood quality Positive Results in: Maize Rice Arabidopsis Conifers
What needs to happen? Genotyping (0.5Mdp per accession) Phenotyping (500dp per accession) Bioinformatics (GRIN) Mapping Tools Breeder Decision Tools
What data is currently available outside NPGS? Several large NSF Plant Genome projects on diversity with NPGS germplasm at the heart of these projects Numerous smaller projects (however, most data gets lost over time from these) Millions of genotypic and phenotypic data points in just maize, wheat, and rice projects. Database aware analysis tools (eg. TASSEL)
GDPDM Gramene Panzea (Maize) Rice Evol. TASSEL Other Analysis Tools Alignment & SNP Display DBs Middleware Analysis Panzea Web Data Access Display Upload Tools GDPC Germinate GDPC Data Browser GRIN GRIN?
GDPDM Germplasm Genotype Phenotype Environment Used by maize, wheat, and rice diversity projects.
GDPDM Gramene Panzea (Maize) Rice Evol. TASSEL Other Analysis Tools Alignment & SNP Display DBs Middleware Analysis Panzea Web Data Access Display Upload Tools GDPC Germinate GDPC Data Browser GRIN
Purpose The purpose of GDPC is to simplify access to the large genomic and phenotypic datasets that are becoming available in plant biology.
GDPC Data Flow Diagram
GDPC Data Flow Diagram
Databases Where has GDPC been mapped? Panzea (GDPDM schema) Gramene (GDPDM) Germinate (generic schema) GRIN (passport data)
GDPC: Select Data Service
GDPC: Select Taxa
GDPC: Nucleotide Data
GDPC: Trait Data
GDPC Browser Demo GDPC: Marker Data
Current GDPC Limitations XML is not efficient for large datasets –Several avenues are possible for improving efficiency More visualization and analysis tools need to be developed –Linkage Mapping –Breeder Decision Tools –Geographic interfaces –Pedigree Interfaces (in progress)
What should GRIN consider? Becoming the lead repository for genotypic and phenotypic diversity data Lead efforts for the consolidation of community diversity data Implement several middleware or web services standards (eg. GDPC and perhaps others IRRI) Collaborate on the development of data visualization tools
All of the software can be accessed through