Download presentation
Presentation is loading. Please wait.
Published byKerrie Mason Modified over 9 years ago
1
Systems Biology Data Dissemination Working Group 25FEB2015
2
Goal of the SysBio DDWG Coordinate approaches for sharing and dissemination of systems biology project data within the program and to the broader infectious disease and systems biology research communities Share best practices in data management Leverage external data resources for data dissemination, including the NIAID Bioinformatics Resource Centers
3
DDWG Activities Monthly 1 hr conference calls Annual workshops Membership – Representatives from FluDyNeMo, Fluomics, OMICS- LHV, Omics4TB, MaHPIC – Representatives from EuPathDB, PATRIC, ViPR/IRD – Representatives from DMID Co-Chairs – Michelle Craft, OMICS-LHV – Richard Scheuermann, ViPR/IRD
5
Workshop Agenda A. SysBio data management best practices (Michelle Craft, presenter) * Data Management Best Practice Highlights * Overview of Data Carpentry and Software Carpentry B. SysBio center plans for project websites (Richard Scheuermann, moderator) * Presentation of highlights by each center (5' each) * Discussion of general internal data sharing strategies and short term public dissemination plans * Discussion of long term dissemination plans C. Relevant public data archives (Jessie Kissinger, presenter) * Which existing public archives could be used for long term dissemination of SysBio data * What SysBio data types are not currently supported by public data archives * Discussion of long term dissemination plans D. Transcriptomic data derived from RNA-seq (Brian Aevermann, presenter) * Determine if new transcriptomic (meta)data needs to be captured for new SysBio program * Determine which aspects of RNA-seq data are not covered by current microarray data support * Decide how to support data processing (meta)data – structured data fields vs free text protocols * Determine which RNA-seq data should be disseminated and where
6
Best Practices Overview File Management – Descriptive Names – Metadata – Sensitive Data – Data Versions File Content – Rows vs Columns – Spreadsheet Mistakes – File Formats Working with Data – Find useful tools – Quality control, data manipulation – Software and Analysis Versions Courtesy of Michelle Craft
7
Project Websites Informational content using content management systems, e.g. WordPress, Drupal Data sharing portal – Within consortium – Public
8
Previous Data Submission Workflows Study metadata Experiment metadata Primary results Analysis metadata Processed data matrix Free text metadata GEO/PeptideAtlas/SRA/MetaboLights ViPR/IRD/PATRIC Host factor biosets pointer submission pointer Systems Biology sites
9
Experimental metadata Study Subject Biological Sample Experiment Bioset
10
Data standards background Ontology for Biomedical Investigations (OBI) – Peters, Bjoern and OBI Consortium, The. Ontology for Biomedical Investigations. Available from Nature Precedings (2009). – Ryan R Brinkman, et al. “Modeling biomedical experimental processes with OBI”. Journal of Biomedical Semantics (2010). OBX data standard – Developed for ImmPort using OBI structure and implemented in a relational database – Y. Megan Kong, et al. “Toward an Ontology-Based Framework for Clinical Research Databases”. J Biomed Inform (2011). Systems Biology data standard – Derived from OBX/ImmPort and extended to capture data transformations and derived data (Biosets)
12
1 3 5 8 14 Serial Challenge Timeline 0 -2 0 3 5 8 Sequential Sampling Studies Serial/Longitudinal Studies -2 days A/California/07/2009 Courtesy of Elodie Ghedin
13
1 3 5 8 14 Serial Challenge Timeline 0 n=4 Ferrets at each time point -2 0 3 5 8 Nasal Wash FACS Whole Blood Serum Bronchial Lavage Lungs FACS Whole Blood Blood in RNAlater Nasal Wash FACS Whole Blood Serum Blood in RNAlater Nasal Wash FACS Whole Blood Serum Blood in RNAlater Nasal Wash Serum Lungs FACS Whole Blood Blood in RNAlater Nasal Wash Serum Lungs FACS Whole Blood Blood in RNAlater Nasal Wash Serum Lungs FACS Whole Blood Blood in RNAlater Nasal Wash Serum Lungs FACS Whole Blood Blood in RNAlater Nasal Wash Serum Lungs FACS Whole Blood Blood in RNAlater Bronchial Lavage Sequential Sampling Studies Serial/Longitudinal Studies -2 Nasal Wash Serum FACS Whole Blood Blood in RNAlater days Courtesy of Elodie Ghedin
14
Experiment Types
15
subject organism treatment agent T1 treatment process specimen isolation 1 treated organism data transformation 1 omics assay 1 primary data 1 processed data 1 Generalized Experiment Workflow treated organism isolated specimen 1 treated organism sacrificed organism sacrifice process physical assessment data specimen isolation 2 data transformation 2 omics assay 2 primary data 2 processed data 2 isolated specimen 2 specimen isolation 3 data transformation 3 omics assay 3 primary data 3 processed data 3 isolated specimen 3 T2T3T4 T5
16
subject organism treatment agent T1 treatment process specimen isolation 1 treated organism data transformation 1 omics assay 1 primary data 1 processed data 1 Generalized Experiment Workflow treated organism isolated specimen 1 treated organism sacrificed organism sacrifice process physical assessment data specimen isolation 2 data transformation 2 omics assay 2 primary data 2 processed data 2 isolated specimen 2 specimen isolation 3 data transformation 3 omics assay 3 primary data 3 processed data 3 isolated specimen 3 T2T3T4 T5 t
17
subject organism treatment agent T1 treatment process specimen isolation 1 treated organism data transformation 1 omics assay 1 primary data 1 processed data 1 Generalized Experiment Workflow treated organism isolated specimen 1 treated organism sacrificed organism sacrifice process physical assessment data specimen isolation 2 data transformation 2 omics assay 2 primary data 2 processed data 2 isolated specimen 2 specimen isolation 3 data transformation 3 omics assay 3 primary data 3 processed data 3 isolated specimen 3 T2T3T4 T5 t
18
subject organism treatment agent T1 treatment process specimen isolation 1 treated organism data transformation 1 omics assay 1 primary data 1 processed data 1 Generalized Experiment Workflow treated organism isolated specimen 1 treated organism sacrificed organism sacrifice process physical assessment data specimen isolation 2 data transformation 2 omics assay 2 primary data 2 processed data 2 isolated specimen 2 specimen isolation 3 data transformation 3 omics assay 3 primary data 3 processed data 3 isolated specimen 3 T2T3T4 T5
19
Courtesy of Adolfo Garcia-Sastre
20
Typical RNA-seq Data Processing Workflow Raw data: fastq* Mapped reads: SAM/BAM Cufflinks analysis Assembled transcripts: SAM/BAM TopHat analysis Differential Expression analysis (edgeR) Differentially expressed genes: text* Data archiving SRA Record Ref Genome: fasta (version) ENSEMBL version Data archiving GEO Scaling and norm (cuffMerge) Transcript abundance values: text* BRC Data archiving BRC
21
Future Directions Finalize core generic (meta)data modules for treatments, specimen sampling, organism assessments, omics assays, data processing Determine if additional assay-specific data fields are needed Decide which results data should be captured for public dissemination Decide which public data archives should be used Ensure appropriate linkage between related data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.