A BioCatalogue Cataloguing Web Services for the Life Science Community Carole Goble, Khalid Belhajjame, Robert Stevens, Jiten Bhagat, Franck Tanoh, Katy.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

©2006 University of Southampton IT Innovation Centre and other members of the SIMDAT consortium A SIMDAT Perspective on Grid Standards and Specifications.
CICC June meeting IUPUI team: Kelsey Forsythe Malika Mahoui Deepthi Jonnala Usha Cheemakurthi.
© Geodise Project, University of Southampton, Applying the Semantic Web to Manage Knowledge on the Grid Feng Tao, Colin.
David De Roure Social Networking and Workflows in Research.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
BiodiversityCatalogue How-Tos Robert Haines. BiodiversityCatalogue Home Hover over the ‘s for more information!
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
WS-VLAM Introduction presentation WS-VLAM Semantic tools Systems, Networking, and Engineering group Institute of informatics University of Amsterdam.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Mining Semantic Descriptions of Bioinformatics Web Resources from the Literature Hammad Afzal, Robert Stevens, Goran Nenadic School of Computer Science.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
OWL-S: Semantic Markup for Web Services
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
EBI is an Outstation of the European Molecular Biology Laboratory. Publishing Web Services – a provider’s point of view Rodrigo Lopez External Services.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Ontology Development and Usage for Protozoan Parasite Research John A. Miller and Alok Dhamanaskar Collaborators: Michael E. Cotterell, Chaitanya Guttula,
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Semantic Professor Carole Goble
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Professor Carole Goble
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
1 Curation and Characterization of Web Services Jose Enrique Ruiz October 23 rd IVOA Fall Interop Meeting - Sao Paolo.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Service discovery with semantic alignment Alberto Fernández AT COST WG1 meeting, Cyprus, Dec, 2009.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Usage scenarios, User Interface & tools
Professor Carole Goble University of Manchester, UK
The GEMBus Architecture and Core Components
Pipeline Execution Environment
Alan Williams, Donal Fellows, Finn Bacall,
Shim (Helper) Services and Beanshell Services
Grid Systems: What do we need from web service standards?
Microsoft Azure Data Catalog
Presentation transcript:

A BioCatalogue Cataloguing Web Services for the Life Science Community Carole Goble, Khalid Belhajjame, Robert Stevens, Jiten Bhagat, Franck Tanoh, Katy Wolstencroft, Steve Pettifer University of Manchester, UK Rodrigo Lopez, Thomas Laurent, Hamish McWilliams, Eric Nzuobontane European Bioinformatics Institute, UK David De Roure, myExperiment

Web Services in the Life Sciences Programmatic Interfaces to services on the rise EMBL-European Bioinformatics Institute –3 million/month accesses to Web Service APIs –1 million/month compute jobs > 50% are over WS Guessimate services. Why? –Specialisation and segregation of methods from monolithic servers. –How one should publish data. –Automated Life Science applications, like workflow systems - Taverna, Kepler, Triana, Trident, KNIME, BPEL …..

Chain stores and Boutiques Major data centres and national centres –EMBL-EBI (UK), DDBJ, PDBJ (Japan), NCBI, SDSC PDB (USA) Investigator and community projects –Kanehisa Laboratory, Kyoto, Japan –BASIS, University of Newcastle, UK –Biomolecular Interaction Network Database, BIND, University of Toronto, Canada –Institute of Bioinformatics, Tsinghua University, China –EMAP, Edinburgh Mouse Atlas Project, UK –The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC), Indiana University, USA and more and more…. Variable sustainable stewardship

Service Flavours Generalist –SOAP –REST Specialist –DAS (Distributed Annotation Services) –BioMOBY

Web Services in the Wild Visible? Findable? “EMMA” is the Clustalw multiple sequence alignment program from the Emboss suite Poor adoption for providers. Forum for advertising and shopping. Executable? WSDL, WADL, WSDL2, Other kinds of services. Transcend the specific grounding

Web Services in the Wild Understandable? Input0:string, Output0: string? What does the SeqRet actually do? Examples? Example data? Example Parameter configurations? Input-Output correlations? Adequate documentation for anonymous reuse. Usable? Available? Quality of Service, robustness, test scripts? Stability and dependability (see BioMART)? Licensing, execution restrictions? Trust and risk. Monitoring and intelligence gathering.

Metadata from a WSDL Pathport Web service from the Virginia Bioinformatics Institute Name of the service Uninformative names for parameters What kind of string?

Result? Reinvention and reduce time to insights….

Cataloguing Services Investigator and project specific registries –EMBRACE, BioSapien, Stargate Portal Community lists –Bioinformatics Links Directory, BioLinks, BioPlanet, Project specialist registries –BioMOBY Central, DAS Registry, myGrid Registry, Sswap General catalogues and search engines –SeekDa!, Web Services List, XMethods Sustainability and curation Accessibility Rich annotation & customisation Provider engagement

A reliable, trusted, up to date and sustained catalogue customised for the Life Sciences. –EBI curation and service commitment Discovery interface for decision support. –Drawing on myExperiment and EBI legacies Community and specialist curation. –Pooled and accumulative annotation. –A platform for service monitoring and analytics. Incorporated into applications and mashups. –Itself a web service, with a (REST) API. Lets Pool our Knowledge

Started June 08 Closed pilot Dec 08 Pilot release April 09 BioCatalogue-Friends focus group Perpetual beta Three year award

Influences

Curation Model Versioning Quantitative Content Tags Service Model Semantic Content Ontologies Functional Capabilities Provenance Operational Capabilities Operational Metrics Use Policy Social Standing Ratings Usage Statistics Attribution Service Profile Wheel Free text Searching Statistics

WADL External Descriptions Service Profile Discovery WSDL WSDL2 A.N. Other SAWSDL SA-REST Analytics Sorting Browse/Shop Search Customised Services Workflows Monitoring Profiles Ranking Validating Parse Generate Parse Invoke Searches Matchmaking

Discovery Decision Support Effective (anonymous) Reuse -> Palpability Automated service composition and validation Decision Making Pain Gain Modelling Functional Capability WSMO OWL-S SAWSDL ……. Tags Ontology myGrid Service Ontology Text Descriptions [Lord et al 2004]

Grounding WSDL Informatics Bioinformatics Molecular Biology Formats Tasks Inputs Outputs Operations Domain Content Service features Task Method Resource Service myGrid Functional Capability Ontology W3C OWL and RDFS Number of classes ~750 myGrid and BioMOBY [Wroe 2003]

Example BLAST from the DDBJ Performs task: Alignment –Uses Method: Similarity Search Algorithm –Uses Resources: DNA/Protein sequence databases –Inputs: biological sequence (and format) database name (and format) blast program (and format) –Outputs: Blast Report

Free text and tagging in the user’s language Smart interfaces for people Semantically annotated services for driving interfaces and automated processing

Workflows and Services Experts Social by User Community refine validate refine validate Self by Service Providers seed refine validate seed Automated refine validate seed Content Capture and Curation

People-Powered Registration By Provider and by Proxy. Ownership. Incentives Completeness vs Cost. Relative rankings feedback. Visibility and reputation. (which may not always be flattering) Do not presume that providers are unhelpful.

People-Powered Curation Third party and Provider Incentives. –Quick and easy. –Credit (and Blame). Incremental and partial descriptions. Peer review. The Wisdom of the Wisdom of the Crowd –Quality, Slander Content. Distributed Human Grid of Annotators. Annotation Jamborees. T Shirts.

Expert Curation Added value of Biocatalogue –Review –Quality assurance and Trust Enriched annotations A curation pipeline. –Tags to Ontologies. –Ontology husbandry A Sweatshop. –How do we make this smarter?

Uniform Annotation model Minimum for discovery and invocation Partial annotations Multiple annotations Polymorphic: text, tags, statistics, ontologies Annotation provenance Trust Curation pipeline and monitoring Multiple providers Multiple versions Multiple deployments Service Annotation Assertion Value Provenance Free text Tag term Ontology term

Ranking, Sorting, Filtering and Comparing Grading: bronze -> platinum Presence, quantity and quality Judgement by the users, not us. Usable and Useful Understand able

Auto Curation Auto scavenging SeekDa! Auto Annotation Specialist parsing Auto-tagging Text mining Inferring service descriptions from myExperiment workflows (Quasar framework) Auto Monitoring Test Workflows / scripts Service monitoring Feeds from applications and third parties: dial home diagnostics, customer reports, predicted down times Auto Usage Analytics Workflow usage Search patterns

Quasar Quality Assurance of Semantic Annotations for Services Using mismatch-free workflows to infer information about the semantics of linked parameters [K. Belhajjame 2008, 2006]

Users Services Discovery Curation Monitoring Integration registration registration test scripts registration dashboard scavenging tagging wsdl parsing seeded controlled vocab. text search sorting on criteria and categories ownership REST API Open Search myExperiment Identity management account management profile management Wsdl monitoring live tests soap services versions instances 500 services 250 full curated specialist parsers. bookmarkingnotification QoS app feeds browse and drill down recommendations ratings recommendations. tag search usage-based Identity management Content Batch migration Provider engagement Policy identification Pilot

SeekDa! BioMOBY Central DAS Registry Feed Migrate EMBRACE BioSapien myGrid Feed and Cross-link BioLinks Scrap myExperiment Code Base Content for Pilot

Workflow analytics Alternative access Discovery access Curation application Service use feeds Workflow Management System Integration Pilots REST API

So why is it taking so damn long to get here? The final 9 yards and 80:20 rule. All or nothing. Dedicated resources and best intentions. Content, content, content. Being too damn, and unnecessarily, clever. A social activity

BioCatalogue Team Thomas Laurent Hamish McWilliams Franck Tanoh Jiten Bhagat Carole Goble Rodrigo Lopez Eric Nzuobontane Mark Wilkinson Holger Lausen

Further information Join our friends Supply technology! Carole Goble, Robert Stevens, Duncan Hull, Katy Wolstencroft, Rodrigo Lopez, Data Curation + Process Curation = Data Integration + Science, Briefings in Bioinformatics, in press