Download presentation
Presentation is loading. Please wait.
1
database of Genotype and Phenotype
Kim Pruitt (for Matt Mailman) NCBI
2
Overview Phenotype Genotype Genotype X Phenotype Association
3
Overview Phenotype Genotype Genotype X Phenotype Association
Data tables Columns are phenotypes Rows are individuals Documents (ie: protocols, data collection forms) Parts of documents linked to variables Data dictionary Genotype Genotype X Phenotype Association Question for Matt: Data dictionary is a glossary describing the variables? DDictionary provides context information and isn’t made public as a stand alone object - Only parts of the data dictionary are public, variable descriptions are made public. No format requirement – get this in a variety of formats at this time – format restrictions expected in the future.. Not all values are phenotypes (eg date) so called the data ‘variables’
4
Overview Phenotype Genotype Genotype X Phenotype Association
Genotype files directly from vendor Intensity files (ie: .CEL) Genotype X Phenotype Association Oligo microarrays, measuring allelelic biallelic SNPs variant at a position aa/ab/bb. Illumina and affimetrix platforms are two data types that they get data from now. CEL files will be distributed. (determining the intensity is mature technology but calling the genotype is an area of research so if you have the CEL files you can recalculate the genotypes yourself using newer technology – for authorized access only!
5
Overview Phenotype Genotype Genotype X Phenotype Association
Various statistical models and methods P-value or LOD score for each marker Filters by P-value, HWE, minor allele frequency Map phenotypes onto genomic sequence Question for Matt: Does ‘phenotype’ protocol document also describe the association method and statistical model used?
6
Overview Phenotype Genotype Genotype X Phenotype Association
Obvious expansion potential: More species; different types of association data (QTL) Critically important to archive all data: Submit primary data to appropriate public archive! Probe DB: primers, resequencing amplicons dbSTS: STS markers Maps: UniSTS; Map Viewer GenBank: ESTs
7
dbGaP Web Site two levels of access - open and controlled
open access to non-sensitive data study summaries and documents measured variables and data elements analysis reports genome browser controlled access provides oversight and accountability for use of sensitive datasets involving personal information De-identified phenotypes and genotypes for individual subjects Pedigrees
8
Browse Studies http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gap
Link back to dbGaP homepage Instructions Description of dbGaP Question: What do you expect to have for “type of study”? What is a sub-study definition? (longitudinal etc) Currently all whole genome association, will be expanded to other types in future, can expand to other organisms.. Link to study report List of variables in study List of documents in study Automated query to PubMed for genome-wide association study articles
9
Browse Studies by Disease
Expand/collapse Link to Terms from MeSH vocabulary Link to study report
10
Advanced Search Fields to be searched
Add any number of search criteria
11
Study Report Citeable unique stable identifier Genotype x phenotype
association or linkage analyses search this study Link to variable report History Publications Attribution Access Rules Links back to submitter website Criteria for inclusion/exclusion
12
Variable Report Citeable unique stable identifier
Documents containing a section that has been linked to this variable Statistical summary of values for this variable P-value is red if cases differ from controls
13
Variable Report (continued)
Document name Section of document that has been linked to this variable Link to document
14
Analysis Report Link back to report for measured or derived variable that was analyzed Genome browser of analysis results
15
Genome Browser of Analysis Results
Slider filters results less significant than threshold 2MB bins colored to represent the most Significantly associated marker Click on bin of interest to zoom in and see association in context with other objects mapped to the same genomic region LINK
16
Genome Browser – Higher Resolution
Collapse table P-value of genotyped marker Scroll via boxes above Add maps CFH gene has been associated with AMD in several studies
17
Coming Soon… Studies Features Early 2007 Spring 2007 Summer 2007
Michael J. Fox Foundation Parkinson’s Disease Study (LEAPS) NINDS Stroke and ALS Spring 2007 GAIN (Genetic Association Information Network) Framingham SHARe – first two generations NIDDK GoKinD and EDIC Summer 2007 Framingham SHARe – third generation Late Early 2008 GEI (Genes and Environment Initiative) Features Search analysis results by: Gene SNP or microsatellite marker Genomic region Filter analysis results by: P-value HWE Minor allele frequency Call rate? Download Public summaries Authorized access for individual-level data Other associations with phenotype (expression data..)
18
Acknowledgements Phenotype Genotype XML Authorized Access
Rinat Bagoutdinov Luning Hao Mas Kimura Jimmy Jin Natasha Popova Stephanie Pretels Karl Sirotkin Jack Wang Matt Mailman Genotype Mike Feolo Lon Phan David Shao Ming Ward Steve Sherry XML Kim Tryka Laura Kelly Jeff Beck Authorized Access Steve Sherry Eugene Yaschenko Valdimir Soussov Misha Kimmelman Don Preuss Al Graeff Jim Ostell
19
Document HTML
20
Document PDF
21
Multiple maps can be displayed to elucidate what is already known in
a particular genomic region
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.