SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.

Slides:



Advertisements
Similar presentations
1 of 15 Information Access Internal Information © FAO 2005 IMARK Investing in Information for Development Information Access Internal Information.
Advertisements

Single Sign-On with GRID Certificates Ernest Artiaga (CERN – IT) GridPP 7 th Collaboration Meeting July 2003 July 2003.
SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Project Report1 Dave Inman Project report. Project Report2 Ways to write a report Top down: Write the structure of the report (maybe use the web templates.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Developing PANDORA Mark Corbould Director, IT Business Systems.
ACAT 2008 Erice, Sicily WebDat: Bridging the Gap between Unstructured and Structured Data Jerzy M. Nogiec, Kelley Trombly-Freytag, Ruben Carcagno Fermilab,
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
Simship.com LRC, September 22, 2004 Dr. Stephen Flinter Connect Global Solutions.
Creating Business Workflow Using SharePoint Designer 2007 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft SQL Server.
A socio-technical model for content sharing
NetArchive Suite Workshop 2011 Technical Track - Code refactoring with the Spring Framework.
Web 2.0: Concepts and Applications 4 Organizing Information.
SharePoint Users Group Content Classification Step by Step SharePoint 2007 and 2010.
CATS Conference n Conference for Academic Technology Staff n Designed and implemented for the staff by the staff of the CSU.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
South Africa Data Warehouse for PEPFAR Presented by: Michael Ogawa Khulisa Management Services
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Best of Both Worlds: Information Management Solutions SmartCore Management Dashboards.
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Yogesh Gautam B.Sc., MCA, Ph.D. (Computer Science) MBA, PGP Cyber Law.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
EUN Learning Resource Exchange Jim Ayre Multimedia Ventures.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
New Ideas for IA Readings review - How to manage the process Content Management Process Management - New ideas in design Information Objects Content Genres.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
LHC Database Developers ’ Workshop (2D) January 24 – , CERN.
IBISAdmin Utah’s Web-based Public Health Indicator Content Management System.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
TopCAT Use Cases Priorities User Interface 1 ICAT developer workshop, August 2009 Laurent Lerusse – STFC
Working Wiki-ly An Information Tool for the Global Marketing Team April 18, 2012.
Monte-Carlo Event Database: current status Sergey Belov, JINR, Dubna.
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
TDWG – Looking Backward and Forward Donald Hobern, Director, Atlas of Living Australia 20 October 2008.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Application of the ISO for BIM Xenia Fiorentini, Engisis.
Describing and Annotating Experimental Data: Hands On.
SharePoint University of the Highlands and Islands SharePoint for Records Management.
Crafter case: European Bank Piergiorgio Lucidi Open Source ECM Specialist Certified Alfresco Instructor and Engineer Alfresco Wiki Gardener and Forum Moderator.
9 Copyright © 2004, Oracle. All rights reserved. Getting Started with Oracle Migration Workbench.
CyVerse Data Store Managing Your ‘Big’ Data. Welcome to the Data Store Manage and share your data across all CyVerse platforms.
A presentation on ElasticSearch
Crafter case: European Bank
The importance of being Connected
IP Publishing From IP Data Base to IP list to IP catalog
Malte Dreyer – Matthias Razum
Presentation transcript:

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch MS eScience Workshop, Pittsburgh, PA

SysMO=SYStems biology of Micro Organisms (2) (29) (22) (9) (4) (1) 11 projects, 91 partners, 9 countries, started 2007

Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB

SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft

Connect projects, connect to outside Project specific solutions Internally used tools & data Outside data and tools Project Public My Disk: Data Models Workflows Personal SysMO-DB, inter-project

Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

Principles… Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible Provide instant gratification Address doubt and anxiety Build it

Modellers Exchange Experimentalists Exchange Bioinformaticians Three types of people

„Natural“ collaboration within SysMO Short, simplified, black and white: Collaboration during project design Varying methods of collaboration during project Binomes (One modeller, one experimentalist) Groups collaborating with groups (occasional/formalized exchange of information) Varying success  Need for a watering hole/meeting point  Application where experimentalists/bioinf/ modelers meet ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |taken= :04:32) Trying to make experimentalists, modellers, bioinformaticians peacefully share resources

Some numbers & Some consequences 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist 11 projects, 91 partners 20 programmer days/year/project 2.5 programmer days/year/partner  “just in case“ approach impossible  Focus on real needs  “just in time“, “just enough“  The right 20%  Help people help themselves  Communication! rule: 80% of the features won‘t be used anyway Useful features

Social Approach Questionnaires PALs (Project Area Liaison) 21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Communication via PALs DB teamPALSProjects Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements Double check Transmit Disseminate Collect answers

Need to find the guy who does xyz: Yellow pages Need to store Standard Operating Procedures Almost all our data is Excel Outcome of first PALs meeting:

What‘s there SysMO-SEEK screenshots

Yellow pages Tag clouds Bookmarks Yellow pages tabs ISA tabs

Standard Operation Procedures

JWS connection for modellers

View Study

New Assay (ISA)

Rights and sharing

Rights and sharing: create group

So much for the webapp Rights+Sharing Connection to modelers‘ tools Yellow pagesSOPs

Almost there: Improved excel support Matthew Horridge

Towards Just-Enough Exchange Incremental steps from beta to beta

Towards Just-Enough Exchange Largely a story about how to handle Excel sheets for user‘s benefits

SysMO Just Enough Exchange COSMIC Alfresco BaCell-SysMO Alfresco MOSES Wiki SysMO-LAB Wiki SABIO-RK Public Resources SABIO-RK Spread sheets Spread sheets Spread sheets Spread sheets BASE

Need for tradeoff Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation  Need to move incrementally to just-enough standard implementation

Path = goal The journey is part of the reward Let people use what they use anyway If changes necessary, be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many users as possible: Simple search, simple exchange, simple tool use

A roadmap Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs Upload service: Hand-triggered upload of link/file Hand-added metadata Harvesting+change detection service Automatic download Hand-added metadata Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers Use other data types where appropriate SBML, Matlab, Mathematica…

Stability hierarchy Single group Single SysMO project Whole SysMO Template for a group of experiments More stable JERM data model Template best practise Project-level template Increasing stability Parsers/ annotators Enter into that Use mappers where needed

JERM Extraction Architecture MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Metad. Data Metad. Data Mapper Parser Data Metad. MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Mapper Parser Project repositories

Oops Some projects not prolonged Need all project data in the system fast, so…

JERM Extraction Architecture MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Metad. Data Metad. Data Mapper Parser Data Metad. MapperExtractor Template recognizer Data handler Harvester Data handler Classifier/Dispatcher Template recognizer Extractor Data Mapper Parser Data Project repositories

Lessons we‘re learning Some interesting bits along the way

Subsetting: Don‘t overwhelm Standards need to be comprehensive Goal: „Minimum information“… (MIBBI) Tends to be superset of what is needed for a project Example for non-applicable attributes Tissue of a single cell Gender  Useful to use adapted subset-templates Experimental design selection list

From biofolksonomy to ontology Observation: Fast growing set of standards Standards are moving target Incremental approach Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to standard ontologies Provide migration tools Tags + suggestions Home-brewed taxonomy

A word on software Template tooling Excel JAVA SysMO-SEEK (open source under Apache license) Ruby on Rails Convention over configuration Libraries & plugins Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby Database: MySQL also tested with SQLite (exclude db depedencies)

Summary SysMO-DB as a virtual meeting point for different flavours of systems biologists SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done still a lot todo

Challenges ahead… Social PALs work great and motivated Now need moremoremore datadatadata Technical Publishing into public repositories Search + exploration: The test for data quality Hierarchical Faceted Search Distributed search via Taverna workflows More workflows via SysMO-SEEK Improve modelling support

Bonus track: what if… …the average data quality is below par?  „Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings

Thanks EML People: Isabel Olga UMAN People: Carole Katy Finn Stuart Sergejs Jacky at Stellenbosch BBSRC BMBF KTF …and Microsoft for sponsoring this workshop

End + questons

END