Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION COMMONS INCUBATOR Data Repository Discovery Platform Building Disease Maps Commons Pilots
Two approaches to building common scientific and technical knowledge Text summary of the completed project Assembled after the fact Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Synapse is GitHub for Biomedical Data Data and code versioned Analysis history captured in real time Work anywhere, and share the results with anyone Social Science Every code change versioned Every issue tracked Every project the starting point for new work All evolving and accessible in real time Social Coding
Data Analysis with Synapse Run Any Tool On Any Platform Record in Synapse Share with Anyone
Demo
Analysis Records and Visualizations
Scalable Analysis Pipelines Full case study at
Additional Data Analysis Tools and Capabilities
Digital Publication Builder
The Solution: Competitions to crowd-source research in biology and other fields Why competitions? Objective assessments Acceleration of progress Transparency Reproducibility Extensible, reusable models Intensity and focus Parallel efforts Competitions in biomedical research CASP (protein structure) Fold it / EteRNA (protein / RNA structure) CAGI (genome annotation) Assemblethon / alignathon (genome assembly / alignment) SBV Improver (industrial methodology benchmarking) DREAM (co-organizer of Sage/DREAM competition) Generic competition platforms Kaggle, Innocentive, MLComp
Sage/DREAM Challenge: Details and Timing Phase 1 : July thru end-Sep 2012 Training data: 2,000 breast cancer samples from METABRIC cohort Gene expression Copy number Clinical covariates 10 year survival Synapse hosting Sage curated and normalized data Data available via download or API (R) Models implemented in R and conforming to interface 2000 core cloud computing integration with Google Compute Engine (donation) Real-time scoring of model predictions and posting to leaderboard Will evaluate accuracy of models to predict survival in: Held out samples from METABRIC Other datasets Phase 1 : July thru end-Sep 2012 Training data: 2,000 breast cancer samples from METABRIC cohort Gene expression Copy number Clinical covariates 10 year survival Synapse hosting Sage curated and normalized data Data available via download or API (R) Models implemented in R and conforming to interface 2000 core cloud computing integration with Google Compute Engine (donation) Real-time scoring of model predictions and posting to leaderboard Will evaluate accuracy of models to predict survival in: Held out samples from METABRIC Other datasets Phase 2: Oct 1 thru Nov 12, 2012 Evaluation of models in novel dataset. Validation data: ~500 fresh frozen tumors from Norway group with: Clinical covariates 10 year survival Gene expression and copy number data to be generated for model evaluation Sent to Cancer Research UK to generate data at same facility as METABRIC Models built on training data evaluated on newly generated data Winners announced at November 12 DREAM conference Winners to be published in Science Translational Medicine Synapse as alternative to traditional peer review Phase 2: Oct 1 thru Nov 12, 2012 Evaluation of models in novel dataset. Validation data: ~500 fresh frozen tumors from Norway group with: Clinical covariates 10 year survival Gene expression and copy number data to be generated for model evaluation Sent to Cancer Research UK to generate data at same facility as METABRIC Models built on training data evaluated on newly generated data Winners announced at November 12 DREAM conference Winners to be published in Science Translational Medicine Synapse as alternative to traditional peer review
Sage-DREAM Breast Cancer Prognosis Challenge one month of building better disease models together 154 participants; 27 countries 268 participants; 32 countries 290 models posted to Leaderboard breast cancer data Challenge Launch: July 17 August 17 Status
Funding Acknowledgements Synapse Team Chris Bare Matt Furia John Hill Jay Hodgson Bruce Hoff Michael Kellen Bennett Ng Geoff Shannon Xavier Schildwachter Eric Wu
Slide Backups
What is the problem? Our current models of disease biology are primitive and limit doctor’s understanding and ability to treat patients Current incentives reward those who silo information and work in closed systems
Biological System Data Analysis Iterative Networked Approaches To Generating Analyzing and Supporting New Models Uncouple the automatic linkage between the data generators, analyzers, and validators