Download presentation
Presentation is loading. Please wait.
Published byIsabel McCormick Modified over 8 years ago
1
Michael Feolo
2
2013200820102012200920072011 Outline What is dbGaP How to get your study registered How to submit data Not Covered SRA Submission
3
2013200820102012200920072011 PubMed and PubMed Central National Center for Biotechnology Information GenomeVariationPhenotypeMedicine dbGaP PheGenI GTR MedGen dbVar ClinVar SRA OMIM GenBank dbSNP GEO Gene RefSeq Trace Archive dbMHC Genome Browser
4
Permanent archive Public accessions Data objects are versioned Two modes of access: Public web pages with summary level data Controlled access for individual level data Purpose of dbGaP
5
Phenotype Demographic, Clinical, Biomarker and Exposure variables Images --MRI, CT, Eye images Study Documents (questionnaires and protocols) Molecular Data Array based SNP/CNV Imputed Genotypes Sequence derived.maf,.vcf Next Gen Sequence brokered by SRA Methylation Expression Association Analysis Results
6
2013200820102012200920072011 GDS Policy Website: gds.nih.govgds.nih.gov
7
2013200820102012200920072011 dbGaP Home Pagewww.ncbi.nlm.nih.gov/gap
9
General Instructions Association Results Molecular Data Study Level Metadata Subject Phenotypes Sample Attributes Subject Consent Subject/Sample Map Pedigree Used to Map Controlled Access data to Public NCBI Archives
10
2013200820102012200920072011 Submission of Phenotype Data Compilation of files Submitter Study configuration, documents “Core” Data files: subject, sample, pedigree and dictionaries Phenotype Data files and dictionaries Files transferred using Submission Portal Submitter Protected servers at NCBI with limited access dbGaP Curatoral staff assigned on upload Data file QC dbGaP Staff QC Checking Database upload Variable summaries sent back to PI’s for verification
11
2013200820102012200920072011 Study Metadata File
12
2013200820102012200920072011 Subject /Sample “Core” Files Subject Listing Consent groups Master list of study participants May reference existing subjects– Cell or DNA repository Subject Sample mapping table Matches a study participant to any number of samples Sample Attributes file Any number of sample level variables Pedigree LINKAGE format Required when relationships are known/collected by study
13
2013200820102012200920072011 Subject Consent File and Dictionary Templates
14
2013200820102012200920072011 Phenotype Data files Rectangular Files accessioned with pht# Subjects or samples are rows Variables are columns and accessioned with phv# Longitudinal Example: Subject ID, time point for each measured value. Limited summaries will be available on public pages Matched to data dictionary Data types Coded Values
15
2013200820102012200920072011 Phenotype File and Dictionary Templates
16
2013200820102012200920072011 Phenotype Data checks Data checked to corresponding dictionary Data type Missing value encoding Description Units Core tables checked with phenotype files Computer scripts run to detect HIPAA violations Issues resolved working with the submitter
17
2013200820102012200920072011 Document Processing Protocols, data collection instruments Annotation of documents Portions of the documents are manually annotated by Scientific Curator to specific columns of data XML markup Vendors In-house markup, and review Allows end-users ability to quickly browse search from documents to variable and back
18
Example Incoming Data Flow
19
2013200820102012200920072011 Genotype Submission Read Genotype Submission Guide Individual level “Omics” data PLINK,.VCF,.MAF Vendor Software Raw files such as.IDAT,.CEL QC Checks Consistency with checks with phenotype submission Gender Check Cryptic Duplicates Inheritance Check with Pedigree file Concordance with existing Genotypes
20
2013200820102012200920072011 Upload to Submission Portal When Registration is Complete Data submitter or PI will receive an email invite for upload This should be accepted within 1 week Once Accepted Access does not expire
21
Phenotype Submitter Phenotype Submitter Genotype Submitter Genotype Submitter ? SSM SCL PED Genodb Phenodb Pheno files Documents DCC Raw Geno files Initial PLINK DCC QC Report Geno Cleaning Sample Loading QC’ed samples QC’ed Geno files Data Loading Annotation Markup GPA Registration Information Registration Information NO YES Consent Groups Loaded? Preview site‡ Pheno Curation Pass QC? NO YES Samples Match? Submitter QC’ed Geno files Genotype Metadata Loading YES NO Consent Division Packing Scripts Packed pht, phd Doc Loading Stop Resolve Sample Counts and Sample Use Release Notes AA Release Production JIRA Final QC Geno and Pheno Geno QC Pass QC? NO YES Internal Data Flow Public FTP Release Public Web display
22
Accessioning The NCBI dbGaP database of genotypes and phenotypes. Mailman M., Feolo M. et. al., Nat Genet. 2007 Oct;39(10):1181-6
23
Jimmy Jin Masato Kimura Rinat Bagoutdinov Luning Hao Anne Sturcke Natalia Popova Stephanie Pretel Lora Ziyabari Jack Wang Moira Lee Ming Xu Nataliya Sharopova NIH GPA’s and DAC’s Submitting PI’s and Staff Stefan Stefanov Svetlana Dracheva Jose Mena Wendy Wu Monica Bihan Many Other NCBI’ers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.