SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Mirror Mirror on the wall does your repository reflect it all? Peter West and Timothy Miles-Board EPrints Services University of Southampton Southampton,
The use of SDMX at the ECB Xavier Sosnovsky European Central Bank Bonn,
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
® © 2006 Open Geospatial Consortium, Inc. OGC Catalog CEOS WGISS September 2006 Chuck Heazel
David De Roure Social Networking and Workflows in Research.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
UKOLN is supported by: Put functionality Augmenting interoperability across scholarly repositories 20/21 April 2006 Rachel Heery, UKOLN, University of.
Pure Silver Reusing and Repurposing Bibliographic Data in a Current Research Information System and Institutional Repository 15 September.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
DELIVERING SHAREPOINT AS A SERVICE
DRIS/BP Task Group Report, Madrid, Sergey Parinov, TG leader Barbara Ebert, deputy TG leader.
1 Enterprise Search at A.T. Kearney Amin Negandhi Co-Founder, Partner, Echelon Consulting, LLC An overview of the industry leading search toolsets that.
Reproductions of this material, or any parts of it, should refer to the IMF Statistics Department as the source. IMF Statistics Department Louis Marc Ducharme.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
The CEMS Faculty Information System Project 23 June 2006.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Good practice in Research Data Management Module 6: Tools, training and support.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
European Life Sciences Infrastructure for Biological Information ELIXIR
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Funding: STEER – INTELLIGENT ENERGY EUROPE (IEE) Grant Agreement IEE/10/290/SI Full title: ECOSTARS Europe Acronym: ECOSTARS Project start date:
Taverna and my Grid Basic overview and Introduction Tom Oinn
Communication & Web Presence David Eichmann, Heather Davis, Brian Finley & Jennifer Laskowski Background: Due to its inherently complex and interdisciplinary.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
Upgrading to IBM Cognos 10
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Professor Carole Goble
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Construction of Shanghai Life Science & Bio-technology Service Platform for Data Access and Sharing International Workshop on Strategies Presentation of.
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
July 19, 2004Joint Techs – Columbus, OH Network Performance Advisor Tanya M. Brethour NLANR/DAST.
International Planetary Data Alliance Registry Project Update September 16, 2011.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Software & Technologies: an overview
Professor Carole Goble University of Manchester, UK
Summit 2017 Breakout Group 2: Data Management (DM)
IP Publishing From IP Data Base to IP list to IP catalog
ESS roadmap on Linked Open Data State of play
Consortium for Entrepreneurship Education
SDMX IT Tools SDMX Registry
NOAA OneStop and the Cloud
Presentation transcript:

SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel Rojas, EML Research gGmbH, Germany

Goal of SysMO Eleven individual projects Different research outcomes A cross-section of microorganisms, including bacteria, archaea and yeast. Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models. Pool research capacities and know-how.

The crunch No one concept of experimentation or modelling No planned, shared infrastructure for pooling

SysMO-DB Retrofit a data access, model handling and data integration platform: To support and manage the diversity of Data and Models Competencies That promotes shared understanding Using a common platform and common technologies DB

Web-based solution to facilitate: exchange of data, models and processes (intra- and inter- consortia) search for data, models and processes across the initiative maximisation of the "shelf life" and utility of the data, models and processes generated dissemination of results DB SysMO-DB

Our experimental conditions…. Progressive and incre mental Something in it for me all the way along Low hanging fruit immediately Return that matches investment Realistic Eases pressure points and concerns of the groups Lower barriers of engagement Sustainable Flexible, extensible and open

Experimental data Models Processes SysMO DB SysMO-DB Concept SysMO-HUB web interface

SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Prof Jacky Snoep Models EML Research gGmbH, Germany Data Metadata Prof Isabel Rojas University of Manchester, UK Processes (Workflow) Portal Infrastructure Prof Carole Goble

SysMO-DB Team University of Manchester, UK Workflow Portal Infrastructure Software Engineer Stuart Owen University of Manchester, UK Workflow Metadata Bioinformatician Katy Wolstencroft EML Research gGmbH, Germany Databases Metadata Isabel Rojas and Olga Krebs

Backed up by the Rest

…..and more

…and more

Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update Model Building New data Simulation New data Validation Predict

Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update New data Predict

Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update Workflows New data JWS Online COPASI Workflows SysMO Data SABIO- RK External Data and Applications Predict

SysMO-Hub Portal Data Models Workflows

SysMO-Hub Portal Data Models Workflows External Resources

SysMO-Hub Portal My Stuff Data Models Workflows External Resources Private Access Controlled publication

SysMO-Hub Portal My Stuff Data Models Workflows External Resources Private Access Controlled publication Metadata SysMO-SEEK Access Control

Stitching it together Metadata on everything recommendations, MIBBI, our own controlled vocabularies that incrementally evolve Web services simple interfaces that incrementally evolve Web 2.0 style Atom feeds, blogs, wikis, mash ups, REST

JERM Web Service Access Interface Metadata SysMO Data Models JERM Extractor Metadata External Resources Web Service Access Interface Taverna Workflows SysMO HUB Portal (Liferay) Metadata Workflows SysMO SEEK Repositories & Resources Service Interface Integration Discovery, Access Annotation & Collaboration Results Cache myExperiment JWS Online SABIO- RK Metadata Bio Catalogue Access Control

Major Technologies Access – SysMO-Hub Portal (using Liferay) Discovery – SysMO-SEEK SysMO Data Assets and Projects registry Search over myExperiment, JWS Online, BioCatalogue… Data Management – local solutions, recommended data specific solutions, (e.g. SABIO-RK for reaction data) Data Publishing – Just Enough Results Model Web interface Models – publishing, mgt and running –(using JWS Online) Integration and population - Workflows (using Taverna)

Customised web portal Unified access to SysMO resources, and integrated queries across data, workflow and model catalogues, and repositories A common entry to the information created by the SysMO partners. Pre-cooked queries and processes Umbrella for eGroupWare, OpenWetWare, wikis and other solutions Liferay ( portal framework Web Access - SysMO-Hub

Data Exchanges Use existing community standards e.g: MIRIAM: Minimum Information Requested for the Annotation of (biochemical) Models MIAME: Minimum Information for the Annotation of Microarray Experiments MIAPE: Minimum Information for the Annotation of Proteomics Experiments SBML: Systems Biology Markup Language Definition of minimal sets for information exchange within the consortia

Data and Metadata “Just Enough Results Model” minimum metadata for exchange Where storage solutions exist Expose through JERM Where storage solutions do not exist SABIO-RK, iChiP, Brenda, MeMo and many more JWS Online, BioModels COPASI myExperiment Ontologies, catalogues and controlled vocabularies for annotation SysMO SEEK: Registry JERM Web Service Access Interface Metadata SysMO Data JERM Extractor SABIO-RK Access Control

Discovery SysMO-SEEK Self-curated, access-controlled catalogue of assets to promote cooperation Metadata database (who has what) Progressive refinement Projects, Group, Provenance, Files It will NOT hold results. Meta catalogue Search over other catalogues BioCatalogue, myExperiment, JWS Online, BioModels Is itself a web service Incorporate in your own group ware environments and applications

SysMO SEEK Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? and many more

Models Publish, manage, run, validate JWS Online Database of curated models and a simulator Web service enabled Each SysMO projects will have a separate password protected website.

Processes - Workflows Applications and services become accessible to the workflow machinery as Web services or Java applications. Data and application integration and analysis Model construction and population Repeatable and shareable plan Transparent provenance log Taverna Workflow Management System

Processes Technology Taverna myExperiment.org

Example - Manipulation of SBML models in workflows Using libSBML For data integration For constructing and annotating SBML models libSBML written in C then wrapped with a Java API

Related Activities BioCatalogue Community and Expert Curated Catalogue of Life Science Web Services Started June Target Practice Informatic and metabolomic assessment of biological network changes and of drug-cell interactions Utopia, Taverna workflows Solutions held by SysMO partners eGroupWare, PHProjekt, Basecamp, wikis etc

Training, Consultancy, Know-how Us: Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata. Kick-starting, toolkits, templates You: Social networking for shared content, know-how and best practice Contribution Best of breed solutions in place already User Focus Group of PALS PhD Students and Post Docs

Pals

1Falko KrauseTRANSLUCENTBioinformaticsBerlin, Germany 2Leif SteilBaCell-SysMOExperimentalist, databases ?? 3Walter GlaserMOSESBioinformaticianVienna, Austria 4Malkhey VermaMOSES + SulfoSysExperiment/ modeller interface Manchester, UK 5Femke MensonidesMOSES + SulfoSysExperiment/ modeller interface Vrije University, Netherlands 6Hanan Messiha GirgisMOSESExperimentalistManchester, UK 7Pawel SierocinskiSulfoSysExperimentalistWageningen, Netherlands 8Maria RodriguesKOSMOBACModellerVigo. Spain 9Afsaneh Maleki-DizajiSUMOBioinformaticianSheffield, UK 10John HeapCOSMICExperimentalistNottingham, UK 11Walid OmarSTREAMExperimentalistWarwick, UK 12Elon CorreaVallaModellerManchester, UK 13Renate KaniaSysMO-LABDatabaseEML, Germany 14Mark MustersSysMO-LABModellerWageningen, Netherlands 15Terry McGenity’s postdoc?PSysMOExperimentalistEssex, UK 16Maksim ZakhartsevMOSESExperimentalistStuttgart, Germany

Hands On Which data do you need to exchange? i.e. What do you need, what can you give? What are the minimal exchange formats? How to best to annotate your data (giving semantics to your data)? How to cross-relate different types of data (e.g. Genomic, Transcriptomic, Proteomic, Metabolomic, Kinetic, and modelling data) What should be in the SysMO SEEK? What should the portal look like?

Steps so far….Questionnaire Current situation in each project Contribute to design of work packages Responses from: Project 1: BaCell-SysMO Project 2: COSMIC Project 3: SUMO Project 4: KOSMOBAC Project 6: Psysmo Project 7: Pseudomonas fluorescens Project 9: Translucent Project 10: Streptomyces coelicolor Project 11: Silicon cell model

……..Results A spectrum of resources and data management and integration expertise Each project is concerned with data, models and processes, but each partner may not do all All projects are concerned with sharing between their sites. Some are not yet ready to share with all of SysMO. Respect privacy. Governance.

Produced on site and stored in files or excel spread-sheets Not consolidated between group members or project members. No common database solutions; Only who and when produced. Does not to conform to existing minimum metadata ‘omics standards Google search over basic indexing Common format or in a database or repository Group members and project partners, but not the rest of SysMO or outside Annotation of data, may be free-text, but may not conform to existing standards. Google search over basic indexing and annotations. Stored and indexed in relational databases from consortium or other formats Project partners & SysMO but not outside. Some web service interface access to data resources Minimum metadata standards Fully searchable Stored and indexed in relational databases, using databases from consortium or using other formats Fully searchable Project partners, SysMO & the Systems Biology community via web services and data services. Some data exported to public repositories Minimum metadata standards Storage AccessAnnotationDiscovery Data

Model Data but No Models Models are developed in a non-SBML format and are not converted to SBML. None Models are submitted to JWS online in their native format Models are developed in SBML, or in another format, and converted into SBML Little or no annotation of the models using current standards, such as, MIRIAM Models are submitted to JWS Online in SBML Models are submitted to JWS online. Models are developed in SBML Fully annotated using MIRIAM RepresentationAnnotationAccess

All processes are manual with no scripted pipelines or workflows; Some data may also be gathered from external sources, Data is produced and stored locally No automation of routine processes No reference to external resources Data used for models by other groups in same project or locally Some of gathering or model population automated workflows Some web service interfaces to locally generated data and tools Some of gathering or model population is mediated by workflows Verifying simulation results against experimental data is mediated by workflows. Web service interfaces to all locally generated data and tools Workflows are annotated and published on myExperiment for SysMO consortium members. Processes

Silver or Gold Pilots Project 1 BaCell-SysMo Produce datasets, use models, workflow ready Project 7 Pseudomonas fluorescens and Project 6 Psysmo Pseudomonas organisms, use third party data sets and produce their own, model ready, workflow ready Project 10 Streptomyces coelicolor Omics and standards compliant, use third party standard data, workflow ready, model standards skeptic but use models Project 3 SUMO Produce own data and own models, have their own wiki for sharing data, workflow ready and model ready, using COPASI MOSES (though no questionnaire) Local, using models, produce their own data, similar work in Target Practice using UTOPIA and Taverna workflows already SulfoSYS Data solutions, eGroupWare Project 9 TRANSLUCENT Our first Pal! Protein-protein interaction data. PHProjekt SysMOLab and MeMo (though no questionnaire) Wikis, SABIO-RK, etc

Bronze Data Pilot Data storage solutions for project partners who need it Many work mainly with Excel or flat files Need data storage first to disseminate to others and start collaborating KOSMOBAC (Booth) Group

Development Approach You already got something, we will not reinvent. Development and deployment of all components will be incremental Metadata specs SW rapid prototyping Leverage Limited -> Sophisticated Cater for different levels of readiness Customised for each project

Comprehensive up to date audit and list of meetings. First cut Hub and SEEK Project areas set up & access control scheme SysMO-SEEK of data assets & projects with interface Collection of queries/use cases for SEEK and Hub Data With Gold and Silver pals define the first cut JERM With Bronze pal identify storage solution Establish best practices on data annotation Prepare two or three SysMO datasets for workflow readiness Models and Workflows Access to JWS Online and myExperiment Seed with SysMO-specific workflows and models Identify useful workflow packs Engagement Project web site and wiki Build up our PALS team Visits and training timetable JERM and SEEK workshop First Steps – end October 2008

JERM and SEEK Workshop First Pals face 2 face September 2008 EML, Heidelberg, Germany Facilitated Preparation: Audit Sweet spots & pains In Meeting: SysMO-SEEK Just Enough Results Model for Exchange

Audit The repositories you use now and plan to use to store your experimental data: home grown; standard; public; private The other repositories you use now and plan to use The data formats you use now and plan to use The SOPs you have in place or plan to The software you use for data management, group ware & project management, model simulation etc: e.g. Rosetta, Oracle, Matlab, R, Mathematica, eGroupware, PHProjekt, wikis Software you have that would be of benefit to all, and are willing to share – e.g. Falko’s Semantic SBML tool, MCISB SBML annotation tool The programming and software environments you use – e.g: Java, Python, C++, Ruby on Rails, Perl Your local expertise available for data management e.g. full time bioinformatician, database manager, commercial outsourcing, none What facilities do you have for coping with external access – how do you export data now? Design a systematic collection mechanism with two of the pals Wiki mining

Sweet spots and Pains Confidentially……in your humble opinion…. What would be the first three low hanging fruit for your project? And what are three obstacles / barriers? Tell us about your experimentalists, modellers and bioinformaticians What doesn’t work right now? What does?

SysMO-SEEK - not the results themselves What schemas or metadata do you have for groupware, projects, SOPs, procedures we can use as a basis for the SEEK model and for sourcing the content? How do you know who has what and what are they doing? What controlled vocabularies do you use for this if any? Which data do you need / would like to know from others in SysMO and outside SysMO? What data would you be willing to give? What is the data release policy of your project? Availability, conditions of use, permissions, credit etc What is the lifecycle of your data Versioning policy,

Just Enough Results Model Exchange Which data do you need / would like to know from others? What data would you be willing to give? How do you annotate your data? How do you cross relate different types of data? What standards for data do you already use and know about?

Revised Hub and SEEK Enhanced SysMO-SEEK of data assets & projects Data Gold and Silver - the first JERM interface Access to a few data sets through Hub using JERM Bronze - established a storage solution JERM-based SysMO datasets for workflow readiness Disseminate best practices on data annotation Models and Workflows Models and Workflows on JWS Online and myExperiment Demoed workflow using data sets through JERM interface Useful workflow packs & launch workflows from portal Engagement Devising next steps with PALS team Visits and training timetable First Steps end March 2009

Back up Teams PALS and DMG Data Management Group SysMO-DB Delivery Team Back up Technical Teams SysMO-Pals Funders Steering Review Governance Hands on engagement SysMO Projects

Realism

Light touch Incremental One size will not fit all Use what is already there