SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

Creating Institutional Repositories Stephen Pinfield.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
Providing an environment where every data-driven researcher will thrive Professor Carole Goble University of Manchester,
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
NERC Data Grid Helen Snaith and the NDG consortium …
RightField Rich Annotation of Experimental Biology through Stealth Using Spreadsheets Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang.
1 FACS Data Management Workshop The Immunology Database and Analysis Portal (ImmPort) Perspective Bioinformatics Integration Support Contract (BISC) N01AI40076.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
Gene Expression Omnibus (GEO)
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
The Environmental Genomics Thematic Programme Data Centre Dawn Field, Director.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
Linking Models & Data within the ISA structure Stuart Owen (based upon notes by Olga Krebs).
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
High throughput biology data management and data intensive computing drivers George Michaels.
Describing and Annotating Experimental Data: Hands On.
ArrayExpress Ugis Sarkans EMBL - EBI
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
An ontology for e-Research
Presentation transcript:

SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester

SysMO-DB approach Linking data to models SysMO-DB, the e-Laboratory

SysMO-DB A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of Data, Models and experimental protocols Local data management systems That promotes shared understanding Using a common platform and common technologies DB

Systems Biology Challenges Interdisciplinary work Heterogeneous data and models Modellers and experimentalists have different skills, training, experience Modellers and experimentalists have different vocabularies and jargon Working together

Pan European collaboration Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave Systems Biology of Microorganisms

The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data

Linking and using Data Models Constructed from experimental data Constructed by using parameters from literature Data Analysed and compared and integrated Statistics, pipelines and workflows Identification of the same entities in different data sets Identification of where data sets overlap Experimental context

Started in June 2008 Web-based solution to facilitate: exchange of data, models and processes (intra- and inter- consortia) search for data, models and processes across the initiative maximisation of the "shelf life" and utility of the data, models and processes generated dissemination of results DB SysMO-DB

SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep Hits, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Carole Goble Stuart Owen Katy Wolstencroft Finn Bacall SABIO-RK JWS Online Taverna myExperiment

SysMO-DB PALS team Power Contributors. 21 Postdocs and PhD students Design and technical collaboration team Intense collaboration UK and Continental PALS Chapters Audits and Sharing. Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. 20 questions Deployment into Projects

Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Incremental development Fitting in with normal lab practices

The Lowest Hanging Fruit SysMO SEEK – a catalogue of assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics analyses Publications The catalogue references assets held elsewhere

SEEK screenshot?

COSMIC BaCell- SysMO SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE Harvesters

Why not a central Warehouse? Protective of models in progress vs published models. Access and Version management Curator-Rival conflict Reluctant to share data Even within their own projects Legacy spreadsheets dominate Curation practices vary Centralised archive take-up Point to Point Exchange People don’t mind sharing methods People want to advertise publications Nature 461, 145 (10 Sept09)

Access Permissions Just Enough Sharing Reusing myExperiment

Data Models Processes SysMO DB SysMO-DB Architecture SysMO-SEEK web interface Assets and Yellow Pages Catalogues JERM

Making use of the Assets Understanding the content of the data Linking assets together Linking assets to experimental context Running comparisons between data files Running model simulations Running data analysis pipelines

What is the JERM? JERM “Just Enough Results Model” Minimum information to exchange data What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Which experiment does it relate to? How does it relate to models? How was the data created SOPs and protocols

CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment MIENSMIENS Minimum Information about an ENvironmental Sequence MIFlowCytMIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGenMIGen Minimum Information about a Genotyping Experiment MIGSMIGS Minimum Information about a Genome Sequence MIMIxMIMIx Minimum Information about a Molecular Interaction Experiment MIMPPMIMPP Minimal Information for Mouse Phenotyping Procedures MINIMINI Minimum Information about a Neuroscience Investigation MINIMESSMINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQEMINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFEMIPFE Minimal Information for Protein Functional Evaluation MIQASMIQAS Minimal Information for QTLs and Association Studies MIqPCRMIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIEMISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDASTRENDA Standards for Reporting Enzymology Data TBCTBC Tox Biology Checklist BioPAX : Biological Pathways Exchange FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions Minimum Information Models

The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template Define a JERM….. Top down analysis of standards Bottom up analysis of practice ISA- TAB

Experimental Data Metadata People Projects Assay Study Experimental conditions Factors studied Models SOPs Homogenised terminology and values in the datasets themselves Workflows Based on ISA-TAB Investigation SEEK + JERM

For publishing JERM data needs to be related to SOPs, experimental context (ISA) and other data JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant

ISA-TAB Relating data and its experimental context Investigation, Study, Assay TAB = tabular A format suitable for spreadsheets

ISA Provides.... A common framework for relating different types of data e.g. microarrays and proteomics Facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies

Identifying Biological Objects What do you have in your data? Proteins/enzymes, genes/expression levels, metabolites Where/how do these objects interact? Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies

BioPortal Integration for Searching Repository for submitting and sharing Biological ontologies Search for concepts across all or selected ontologies BioPortal provides a number of Restful Webservices Search Concept lookup Visualisation Integrated within SEEK as a plugin

Tools to help manage data: Annotation standards by stealth Controlled vocabulary plug in BioPortal

Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate lowering the adoption barrier

Off the shelf Except for the JERM, we have only used community resources, vocabularies and services You can get a long way by implementing community practices and providing ways to integrate them

SysMO-DB and Models

Nicolas Le Novere, Data Integration in the Life Sciences, Manchester, 2009

Models: Incentives for using Standards Models can be shared in SysMO-SEEK in any format SBML is the recommended format We also recommend MIRIAM compliance and SBO annotation If you use SBML, you can use JWS Online to run simulations in SEEK

Screenshot of JWS Online JWS Online Plugin online simulator, runs in your browser upload models in SBML format Web Service enabled SBGN schemas, with annotations and external links

Falko Krause, Humboldt-University, Berlin

Models Resources Models can be published in public repositories JWS-Online, BioModels Models can be annotated SBML, MIRIAM, SBO No public resources currently for sharing models with associated data, or for loading new data into models

Linking Data to Models Relating data and models Where did the data come from for developing the model? Where did the data come from for validating the model? What were the results of model simulations?

Current Functionality in SEEK Show all data used for construction together with the model, such that process can be repeated Uploaded models loaded with this data by default Manually alter parameters and run simulations

Next Steps: Model Validation Test/compare model with experimental data for complete system Find data in SEEK Upload data from elsewhere Automatically load into model Run simulations and compare with original results JERM for models Mapping tools – allows you to identify columns/rows in spreadsheets containing the right information

ISA for Models Modelling and experimental work intersect Investigations, Study, Assay.....or modelling analysis..... Modelling analysis types Metabolic models, gene networks Modelling type ODE, algebraic Studies – combinations of experimental assays, modelling analyses, and informatics analyses

SysMO-DB the e-Laboratory An e-Laboratory is an information system for bringing together people, data and analytical methods at the point of investigation or decision-making

Current Status Finding things so that we can compare them Understanding who has what Understanding what can be compared with what – the experimental context

Where we are going… A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models

Workflows from myExperiment Data preparation, annotation and analysis Systems Biology workflow Pack on myExperiment Microarray analysis and text mining Created by Afsaneh Maleki-Dizaji from SUMO, University of Sheffield Based on previous work by Paul Fisher, University of Manchester

SEEK as a data analysis and meta analysis service SBML model construction and population Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li

Data analysis and meta analysis SEEK Analysis Service with pre-cooked analysis tools. Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li Load model: Load data: GO

New Directions

Opening SysMO Out Using SysMO as a dissemination space for the SysMO consortium Supplementary material in publications Data citation Packaging software so that others can use it Easy to install a SEEK for yourself Packaging and exchanging JERM Templates Helping with standardisation Promotion and example work with SBRML and data and models linkage

SysMO-DB Approach in Other projects SysMO2 – new projects and legacy EraSysBio+ Lungsys and SBCancer Virtual Liver

New Considerations Eukaryotic organisms Interactions between host and pathogen Human disease multicellular interactions, tissues, organs multiscale modelling

Outstanding Issues Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project.

How it works Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work PhD students, Post-docs Let the scientists retain control over their data and who can see it Don’t reinvent. Use available vocabularies, minimal model standards Help prevent people duplicating work by linking the people as well as the resources

Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online teams EMBL-EBI, MCISB