Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael Feolo. 2013200820102012200920072011 Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission.

Similar presentations


Presentation on theme: "Michael Feolo. 2013200820102012200920072011 Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission."— Presentation transcript:

1 Michael Feolo

2 2013200820102012200920072011 Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission

3 2013200820102012200920072011 PubMed and PubMed Central National Center for Biotechnology Information GenomeVariationPhenotypeMedicine dbGaP PheGenI GTR MedGen dbVar ClinVar SRA OMIM GenBank dbSNP GEO Gene RefSeq Trace Archive dbMHC Genome Browser

4  Permanent archive  Public accessions  Data objects are versioned  Two modes of access:  Public web pages with summary level data  Controlled access for individual level data Purpose of dbGaP

5  Phenotype  Demographic, Clinical, Biomarker and Exposure variables  Images --MRI, CT, Eye images  Study Documents (questionnaires and protocols)  Molecular Data  Array based SNP/CNV  Imputed Genotypes  Sequence derived.maf,.vcf  Next Gen Sequence  brokered by SRA  Methylation  Expression  Association Analysis Results

6 2013200820102012200920072011 GDS Policy Website: gds.nih.govgds.nih.gov

7 2013200820102012200920072011 dbGaP Home Pagewww.ncbi.nlm.nih.gov/gap

8

9 General Instructions Association Results Molecular Data Study Level Metadata Subject Phenotypes Sample Attributes Subject Consent Subject/Sample Map Pedigree Used to Map Controlled Access data to Public NCBI Archives

10 2013200820102012200920072011 Submission of Phenotype Data  Compilation of files  Submitter  Study configuration, documents  “Core” Data files: subject, sample, pedigree and dictionaries  Phenotype Data files and dictionaries  Files transferred using Submission Portal  Submitter  Protected servers at NCBI with limited access  dbGaP Curatoral staff assigned on upload  Data file QC  dbGaP Staff  QC Checking  Database upload  Variable summaries sent back to PI’s for verification

11 2013200820102012200920072011 Study Metadata File

12 2013200820102012200920072011 Subject /Sample “Core” Files  Subject Listing  Consent groups  Master list of study participants  May reference existing subjects– Cell or DNA repository  Subject Sample mapping table  Matches a study participant to any number of samples  Sample Attributes file  Any number of sample level variables  Pedigree  LINKAGE format  Required when relationships are known/collected by study

13 2013200820102012200920072011 Subject Consent File and Dictionary Templates

14 2013200820102012200920072011 Phenotype Data files  Rectangular Files accessioned with pht#  Subjects or samples are rows  Variables are columns and accessioned with phv#  Longitudinal  Example: Subject ID, time point for each measured value.  Limited summaries will be available on public pages  Matched to data dictionary  Data types  Coded Values

15 2013200820102012200920072011 Phenotype File and Dictionary Templates

16 2013200820102012200920072011 Phenotype Data checks  Data checked to corresponding dictionary  Data type  Missing value encoding  Description  Units  Core tables checked with phenotype files  Computer scripts run to detect HIPAA violations  Issues resolved working with the submitter

17 2013200820102012200920072011 Document Processing  Protocols, data collection instruments  Annotation of documents  Portions of the documents are manually annotated by Scientific Curator to specific columns of data  XML markup  Vendors  In-house markup, and review  Allows end-users ability to quickly browse search from documents to variable and back

18 Example Incoming Data Flow

19 2013200820102012200920072011 Genotype Submission  Read Genotype Submission Guide  Individual level “Omics” data  PLINK,.VCF,.MAF Vendor Software  Raw files such as.IDAT,.CEL  QC Checks  Consistency with checks with phenotype submission  Gender Check  Cryptic Duplicates  Inheritance Check with Pedigree file  Concordance with existing Genotypes

20 2013200820102012200920072011 Upload to Submission Portal  When Registration is Complete  Data submitter or PI will receive an email invite for upload  This should be accepted within 1 week  Once Accepted Access does not expire

21 Phenotype Submitter Phenotype Submitter Genotype Submitter Genotype Submitter ? SSM SCL PED Genodb Phenodb Pheno files Documents DCC Raw Geno files Initial PLINK DCC QC Report Geno Cleaning Sample Loading QC’ed samples QC’ed Geno files Data Loading Annotation Markup GPA Registration Information Registration Information NO YES Consent Groups Loaded? Preview site‡ Pheno Curation Pass QC? NO YES Samples Match? Submitter QC’ed Geno files Genotype Metadata Loading YES NO Consent Division Packing Scripts Packed pht, phd Doc Loading Stop Resolve Sample Counts and Sample Use Release Notes AA Release Production JIRA Final QC Geno and Pheno Geno QC Pass QC? NO YES Internal Data Flow Public FTP Release Public Web display

22 Accessioning The NCBI dbGaP database of genotypes and phenotypes. Mailman M., Feolo M. et. al., Nat Genet. 2007 Oct;39(10):1181-6

23 Jimmy Jin Masato Kimura Rinat Bagoutdinov Luning Hao Anne Sturcke Natalia Popova Stephanie Pretel Lora Ziyabari Jack Wang Moira Lee Ming Xu Nataliya Sharopova NIH GPA’s and DAC’s Submitting PI’s and Staff Stefan Stefanov Svetlana Dracheva Jose Mena Wendy Wu Monica Bihan Many Other NCBI’ers


Download ppt "Michael Feolo. 2013200820102012200920072011 Outline  What is dbGaP  How to get your study registered  How to submit data  Not Covered  SRA Submission."

Similar presentations


Ads by Google