Richard H. Scheuermann, Ph.D. November 5, 2012 Support for Systems Biology Data in IRD/ViPR - Proteomics
Projects with Host Factor Data Four systems biology groups funded by NIAID, including: – Systems Virology (Michael Katze group, Univ. Washington) Influenza H1N1 and H5N1 and SARS Coronavirus statistical models, algorithms and software, raw and processed gene expression data, and proteomics data – Systems Influenza (Alan Aderem group, Institute for Systems Biology/Seattle Biomed) Various influenza viruses microarray, mass spectrometry, and lipidomics data ViPR Driving Biological Projects – Abraham Brass, Mass. General Hospital Dengue virus host factor database from RNAi screen – Lynn Enquist / Moriah Szpara, Princeton University Deep sequencing and neuronal microarrays for functional genomic analysis of Herpes Simplex Virus – Richard Kuhn, Purdue University Metabolomics data of Dengue virus infection of human cells and mosquitos – Mike Diamond, Washington University Identification of inhibitory interferon-stimulated genes against flaviviruses and noroviruses using shRNA knockdown Determine the mechanism of action of individual inhibitory ISGs
“Omics” data management (MIBBI vs MIBBI-DB) – Project metadata (1 template) Title, PI, abstract, publications – Experiment metadata (~6 templates) Biosamples, treatments, reagents, protocols, subjects – Primary results data Raw expression values – Data processing metadata (1 template) Normalization and summarization methods – Processed data Data matrix of fold changes and p-values – Data interpretation metadata (1 template) Fold change and p-value cutoffs used – Interpreted results (Host factor biosets) Interesting gene, protein and metabolite lists Visualize biosets in context of biological pathways and networks Statistical analysis of pathway/sub-network overrepresentation Strategy for Handling “Omics” Data
Data Submission Workflows Study metadata Experiment metadata Primary results Analysis metadata Processed data matrix Free text metadata GEO/PRIDE/PNNL/SRA/MetaboLights ViPR/IRD/PATRIC Host factor bioset pointer submission pointer Systems Biology sites
Metadata Submission Template Examples
Host Factor Data
8 Studies To Date
Host Factor Bioset
Transcriptomics => Proteomics Metadata fields are largely re-usable, with some exceptions – Exp_sample_template (protein).xls Exp_sample_template (protein).xls Results data differences – Peptide-level and protein-level IM005_Peptide_normalization_matrix.V2.xlsx IM005_Protein Normalization matrix.xlsx – Statistical measures Results_matrix_ IM005_sig Protein_RM.xlsx
Metadata Field Changes GEO GSM ID => Primary Data Archive + Primary Data Archive ID Semi-structured Experiment Variable to Structured Experiment Variable – Free text (1 day) => value unit pairs in separate fields (1/day; 10^4/plaque forming units) Multiple processed data matrix files – Concatenated IDs separated by (; |) Reagents and protocols are different but should not require submission template changes
Normalized Data Archive at BRC (standard format?) – Peptide normalized data – Protein normalized data – Results matrix of significant proteins BRCs derive bioset lists from results matrix – Handling different significance measures t-test flag, t-test p-value, g-test flag, g-test p-value, log10 ratio
Host Factor Bioset
On Deck Metabolomics and lipidomics data Integration of RNA expression, protein abundance and metabolite abundance Pathway/network visualization and analysis
Acknowledgement Lynn Law, U. Washington Richard Green, U. Washington Peter Askovich, Seattle Biomed Brett Pickett, U.T. Southwestern/JCVI Jyothi Noronha, U.T. Southwestern Eva Sadat, U.T. Southwestern Entire Systems Biology Data Dissemination Task Force, especially Jeremy Zucker NIAID (Alison Yao and Valentina DiFrancesco)
Future Development Plans
GO enrichment Network visualization GO