SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
University of Southampton, U.K.
1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Good practice in Research Data Management Module 6: Tools, training and support.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
The Digital Library for Earth System Science: Contributing resources and collections Meeting with GLOBE 5/29/03 Holly Devaul.
The Physiome Model Repository – PMR David Nickerson Auckland Bioengineering Institute The University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
SEAD Virtual Archive :: A Thin Layer for Scientific Discovery and Long-Term Preservation Inna Kouper April #dlbbspring2013.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop iPlant Data Store.
RoSBM Registry of Standard Biological Models Barry Canton (MIT) Vincent Rouilly (Imperial College) Registry Workshop November 2007,Boston.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
TopCAT Use Cases Priorities User Interface 1 ICAT developer workshop, August 2009 Laurent Lerusse – STFC
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Data sharing and open SEEK. Objectives More encouragement to make Data* open Allowing using Data in publications. Simple control over ISA Elements. Allowing.
High throughput biology data management and data intensive computing drivers George Michaels.
Sharing OERs via Jorum Siobhán Burke and Sarah Currier 12 th December 2012.
Describing and Annotating Experimental Data: Hands On.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
An Overview of Data-PASS Shared Catalog
An ecosystem of contributions
opening our collections data to the public
The MRC Research Data Gateway
Presentation transcript:

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK

Pan European collaboration Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years Systems Biology of Microorganisms

The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia) search for data, models and processes across the initiative dissemination of results SysMO-DB

Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. Wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians Many do not follow standards that exist or know who is doing what. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics Images Reaction Kinetics Models Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data The same across many Systems Biology projects

Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Address doubt and anxiety Incremental development

The Lowest Hanging Fruit A Catalogue of SysMO assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics analyses The catalogue references assets held elsewhere

Data Models Processes SysMO DB Technical Approach SysMO-SEEK web interface Assets and Yellow Pages Catalogues JERM

Social Approach PALS 21 Postdocs and PhD students Experimentalists, modellers and bioinformaticians Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Communication via PALs DB teamPALSProjects Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements Double check Transmit Disseminate Collect answers

Discovery SysMO-SEEK Single, web based, access point Access control & Versioning management Yellow pages (“who is who”) People, Expertise, Equipment Assets catalogue (“who has what”) SOPs, Spreadsheets, pre-published models Metadata about Data held by projects Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue) Call out to external resources e.g. PubMed Does not hold data and results Holds metadata on results and links to results A component for SysMO groups to incorporate in their own environments and applications

Sharing Policies Default private until you say otherwise Project defaults Private Share with the group Share with project Share with sysmo

“Just Enough” Exchange of SysMO Assets

Experimental Processes Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References Protocols and SOPs Nature Protocols format recommendation You can upload Protocols in any format, but if you use this one, we will index it and make searching easier Encouraging standardisation

Workflow Management System Bioinformatics Processes: Workflows Data preparation, annotation and analysis pipelines SBML model construction and population Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Workflows as a mechanism for linking inside SEEK Free and Open Source

Libraries of SysMO workflows

Models SBML is the recommended format Not all models are SBML JWS online allows storing and simulation of SBML models But - all models need to be shared JWS Online doesn’t have version and access control Models can be shared in SEEK instead of directly in JWS online Can still connect to JWS online and run simulations

Models JWS online – a database of curated models and a model simulator Web service enabled to run from workflows Used and accessed through SEEK…. Special instance of JWS Online for SysMO Store, validate and run models from SysMO-SEEK and publish later Access to other models resources Biomodels, Copasi and Semantic SBML

Data Comparison and Exchange Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO SABIO-RK, iChiP, MeMo …. Local databases & Files Excel Spreadsheets The most common form of experimental data format. Proteomics Metadata Metabolomics Microarray Proteomics Single Cell Data Variable descriptions of data Little adoption of community controlled vocabulary terms

JERM JERM “Just Enough Results Model” Minimum information to exchange data What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Which experiment does it relate to How was the data created SOPs and protocols Harvesting standards, current practice and consortium schemas and spreadsheets Inspired by MCISB Key Results initiative and SBRML [Paton]

The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template Define a JERM….. Top down analysis of standards Bottom up analysis of practice ISA-TAB

COSMIC BaCell- SysMO SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE JERM Adaptors

JERM Source Extractor Generator New spreadsheets adopt JERM templates Legacy spreadsheet JERM mapper Databases have JERM mapper Spreadsheet Ontology Annotator Restrict the values that a range of fields can have Just Enough Results Model Tools Metadata SABIO- RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface Access Control JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor

Experimental Data Metadata People Projects Assay Study Experimental conditions Factors studied Models SOPs Homogenised terminology and values in the datasets themselves Workflows Based on ISA-TAB Investigation SEEK + JERM

Incremental Annotation Metadata can be added to assets at any time Extracted from JERM templates Added by the data owner through SEEK Added by another SysMO consortium member with editing permission

In Practice for Spreadsheets Native JERM TemplateJERMed + + +

Register Extract Matched to the JERM Adding metadata browse search + + Now Whole record

Register Extract Matched to the JERM Adding metadata here browse search Whole record Near future Filtered record Enriched record

Register Extract Matched to the JERM Adding metadata here browse search + + Future Collections of Records + Meta-analysis

Spreadsheet Repository Models Repository SOP Repository Workflow Repository Consortium Data Models Processes Sops and Workflows What we have done.. SysMO-SEEK web interface JWS Online Assets Catalogue Yellow Pages Search SysMO DB JERM Public data SBML Nature Protocols Workflow Management System JERM

Outstanding Issues Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project. What happens when a project is no longer part of the SysMO consortium

Lessons Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work PhD students, Post-docs Let the scientists retain control over their data and who can see it Don’t reinvent. Use available vocabularies, minimal model standards Help prevent people duplicating work by linking the people as well as the resources

Acknowledgements SysMO-DB Team SysMO-PALS myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB