Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction.

Similar presentations


Presentation on theme: "TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction."— Presentation transcript:

1 TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction to TOPMed and Data Sharing Cathy Laurie

2 4/21/2019

3 4/21/2019

4 Consent Types

5 To access the web site: Sign and return non-disclosure agreement –from TOPMed DCC –

6 *~50 Participating Studies currently in TOPMed (not shown)

7 TOPMed WGS Overview Study Sequencing Center IRC Michigan NCBI DCC UW
DNA samples Sequencing Center sequence data IRC Michigan joint genotype call sets harmonized sequence data NCBI Study Coordinating Center DCC UW phenotypes dbGaP SRA* harmonized phenotypes phenotypes phenotypes, genotypes, sequence data Working Group COPD Working Group atherosclerosis Working Group asthma Study A analysis team Study B analysis team Scientific Community etc... etc... Cross study publications Study-focused publications Personalized Medicine *SRA is being replaced with an NIH cloud as repository for aligned sequence data

8 Data Organization on dbGaP
Parent study accessions* Phenotype data Prior SNP array and other non-TOPMed genotype data Some have ‘omics data Most are currently released (available to general scientific community) TOPMed accessions* Exchange Area – accessible only by authorized TOPMed investigators Whole genome sequence data and genotype call sets Some have phenotype data Some have files contributed by Working Groups for sharing Released accessions Phase 1 studies currently being released (~18k, samples) Include SRA/BAM, VCF, annotation and phenotype files *click to find accession numbers

9 TOPMed Exchange Area Organization and Current Content
Common Exchange Area Cross-study genotype call set Genotype call sets (cross-study) Variant & sample annotation Study-specific Exchange Areas Study A Study B Study C Study-specific TOPMed EA content: Sample files – sample-subject mapping, subject consent, sample attributes, pedigrees Study-specific phenotype files - submitted specifically for TOPMed (many studies already have their phenotypes in a released parent study accession that is publicly available) Harmonized phenotype files – some from DCC and others contributed by Working Group members BAM/SRA files Misc files (e.g. prior SNP array data for some studies)

10 Joint call sets from IRC
Sequence Data Joint call sets from IRC Freeze 4 (current version) - alignment to build 37; includes all samples from phase 1 studies (except SAFS), ~18k samples Next freeze (August 2017)– alignment to build 38; to include all phase 1 samples and a large fraction of phase 2 samples

11 Phenotype Data Study-specific phenotypes
Parent study accessions – some have thousands of phenotypes; see website document for tips on how to find what you need in released accessions TOPMed accessions – most current data are in Exchange Area; phase 1 study releases can be searched like other released accessions Cross-study harmonized phenotypes DCC is performing harmonization for a limited set of traits based on data in released dbGaP accession; currently this includes blood cell counts and basic demographics; the harmonized data are in the study-specific Exchange Areas Working Groups are exchanging files through the Exchange Areas for their own harmonization efforts

12 Other -Omics Data Many studies have some prior (i.e. non-TOPMed) –omics data See survey results Heterogeneous platforms Much of this is not currently available on dbGaP TOPMed Omics Pilot – MESA – currently underway RNASeq Metabolomics Array-based methylation Proteomics TOPMed plans for additional –Omics data generation and analysis PAR : Omics Phenotypes of Heart, Lung, and Blood Disorders (X01) RFA-HL : Integrative Computational Biology for Analysis of NHLBI TOPMed Data (R01)

13 Access & Use Permissions for Data from TOPMed Exchange Areas
Accessing Data Only 6 Individuals per study are eligible to apply for access (named by PI) Data Access Requests (DARs) are submitted to dbGaP using TOPMed-generic application Successful applicants may share data with others at their institution A group of applicants with coordinated DARs may share data in a cloud environment Using Data Data may NOT be used for any purpose without an APPROVED paper proposal Exception: a study investigator may use his/her own study’s data as they wish Paper proposals originate in the TOPMed Working Groups IMPORTANT: Access to TOPMed Exchange Area data does NOT confer permission to use it in analysis

14 Data Access Mechanisms for Exchange Areas
Each study PI and his/her nominees from other institutions apply to dbGaP for access to multiple TOPMed studies (6 total per study) NHLBI DAC reviews and approves/disapproves applications Data are downloaded by each approved investigator to their own institution’s IT system Analysts gain access to cross-study data through the PI/nominee of the study through which they are affiliated Currently, several PIs/nominees have approved access See section “Data sharing”

15 Cross-study genotype call set
How data sharing via the Exchange Area works Study A: phenotypes Study B: phenotypes Study C: phenotypes dbGaP Exchange Areas Cross-study genotype call set Local Study Storage Study B Cross-study Association analysis Local Study Computers Uploaders Downloader Uploading requires study registration Downloading requires Data Access Request approved by NHLBI DAC

16 Other Data-Sharing Mechanisms
Sharing Exchange Area data in a Cloud Environment requires coordinated dbGaP applications and a Cloud management plan Study investigators may share their own study’s TOPMed data outside of dbGaP Data Transfer Agreements are generally required DCC’s focus is dbGaP sharing, so not able to help with these kinds of arrangements

17 Principles of Data Sharing & Publication in TOPMed
PIs and other investigators who obtain dbGaP approval to download TOPMed data are responsible for how it is used – i.e. making sure that consents and Data Use Limitations are respected by everyone with whom they share the data (generally only within an institution) Investigators obtain access to data through the PI or other senior investigator of the study with which they are affiliated Investigators may begin analyzing TOPMed data only after they have an approved paper proposal Paper proposals must be approved by a TOPMed Working Group prior to submission Each paper proposal must specify what studies’ data they intend to use and form a collaboration with investigators from that study. PI approval of data use is required prior to submitting the proposal. The person submitting the proposal must also select specific study-consent groups as they become available and sign off on their agreement to abide by the Data Use Limitations

18 Paper Proposal Process
Steps for TOPMed paper proposal development and approval: Develop proposal  within a TOPMed Working Group (WG), including selection of studies to be analyzed. Approval by this WG is required before proceeding. Discuss data access mechanisms with leaders of the TOPMed study with which you are affiliated Request initial approval from PIs for use of data from selected studies. Approval (or failure to respond within 2 weeks) is required before proceeding. Submit the paper proposal for scientific review using the online form; this will be reviewed by the TOPMed Publications Committee. Submit data set selection for review ; this will be reviewed by the PIs of the selected data sets.

19 Date set selection sample
Excerpt from Each data set corresponds to a single dbGaP consent group within one TOPMed component or main study that has completed TOPMed dbGaP registration By selecting a data set, the proposer agrees to its displayed Data Use Limitations (DUL)

20 Check your paper proposal to see for which consent groups you have approval. Your manuscript will not be approved for publication if it uses data sets not listed here as approved!

21 Links to resources Guide for Working Groups
Data Sharing through the TOPMed Exchange Areas Paper Proposal Instructions Many other pages on the TOPMed web site Questions: Contact the DCC program coordinators and they will answer or route your query to the appropriate person

22 Extras

23 Paper proposal submission for scientific review

24 Agree to Data Use Limitations (DULs)
After you select your data sets on this page, you will be asked to agree to the Data Use Limitations (DULs)

25 Your dashboard https://www.nhlbiwgs.org/paperproposals/dashboard
Your paper proposals will appear here. You can also use this to access the data set selection page.


Download ppt "TOPMed Analysis Workshop Genetic Analysis Center Biostatistics Department University of Washington TOPMed Data Coordinating Center August 7-9, 2017 Introduction."

Similar presentations


Ads by Google