Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.

Slides:



Advertisements
Similar presentations
SOMA2 – Drug Design Environment. Drug design environment – SOMA2 The SOMA2 project Tekes (National Technology Agency of Finland) DRUG2000 program.
Advertisements

David De Roure Social Networking and Workflows in Research.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Wrapping third- party analytical services for caBIG Taverna-caBIG project Stian Soiland-Reyes Alexandra Nenadic University of Manchester, UK
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Ontology Development and Usage for Protozoan Parasite Research John A. Miller and Alok Dhamanaskar Collaborators: Michael E. Cotterell, Chaitanya Guttula,
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,

Semantic Professor Carole Goble
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
A BioCatalogue Cataloguing Web Services for the Life Science Community Carole Goble, Khalid Belhajjame, Robert Stevens, Jiten Bhagat, Franck Tanoh, Katy.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Professor Carole Goble
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
An Ontological Approach to Financial Analysis and Monitoring.
Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Professor Carole Goble University of Manchester, UK
An ontology for e-Research
Taverna workflow management system
Explore Evolution: Instrument for Analysis
Shim (Helper) Services and Beanshell Services
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester

Lots of Resources NAR 2008 – over 1000 databases

Taverna Workflow Workbench Design and execution of workflows Access to local and remote resources and analysis tools Automation of data flow Iteration over large data sets Part of the my Grid project

Who Uses Taverna? Access to public service operations 55,000+ sourceforge downloads 10,000+ downloads of v downloads per day Ranked 148 sourceforge activity (11 Nov 2008) 350+ known organisations 17 known commercial active users at any one time Users throughout UK, USA, Europe, SE Asia and South America Netherlands Bioinformatics Centre Genome Canada Bioinformatics Platform BioMOBY US iPlant Consortium US FLOSS social science program RENCI French SIGENAE farm animals project ThaiGrid CARMEN Neuroscience project SPINE consortium EU Enfin, EMBRACE, BioSapian, Casimir EU SysMO Consortium NEBC The NERC Environmental Bioinformatics Centre Bergen Centre for Computational Biology Max-Planck institute for Plant Breeding Research Genoa Cancer Research Centre AstroGrid caBIG/caGRID

What do Scientists use Taverna for? Data gathering, annotation and model building Data analysis from distributed tools Data mining and knowledge management –Hypothesis generation and modelling and Text mining Data curation and warehouse population Parameter sweeps and simulation Systems biology model building Proteomics Sequence analysis Protein structure prediction Gene/protein annotation Proteomics Microarray data analysis QTL studies QSAR studies Chemoinformatics Medical image analysis Public Health care epidemiology Heart model simulations High throughput screening Phenotype studies Phylogeny Statistical analysis Text mining Astronomy, Music, Meteorology

Create and run workflows Create and manage services as components API Consumer Share, discover and reuse workflows Manage the metadata needed and generated RDF, OWL Discover and reuse services Feta Open Source Workflow Environment for Scientists

Workflow Reuse Workflows allow high throughput experiments and automation Workflows are encapsulations of experiments Workflows developed for one experiment can be reused for others Easier to share, reuse and repurpose The METHODS section of a scientific publication

Recycling, Reuse, Repurposing Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating mouse Whipworm infection. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this.

Where are the Services From? Over 3500 services available Major Service Providers –European Bioinformatics Institute –DNA DataBank of Japan –NCBI – USA ‘Boutique’ Services –Individual research labs producing public data sets –Specialist tools for niche experiments We are not service providers

What types of services? HTML WSDL Web Services BioMart R-processor BioMoby Soaplab Local Java services Beanshell Workflows ….coming soon – REST, Matlab Variable or non-existent documentation or help

Taverna in a ‘open’ world Advantages Connection to lots of resources Flexible system Can adapt to new technologies Disadvantages Services are developed for other purposes We can’t control how they work We have to deal with the heterogeneity

Finding Services When using services, scientists need to: Find them – in distributed locations, produced by different host institutions Interpret them – what do the services do - what experiments can they perform using them? Know how to invoke them – what data and initial parameters do they need to supply?

Metadata from a WSDL Pathport Web service from the Virginia Bioinformatics Institute Name of the service Uninformative names for parameters What kind of string?

Semantics and Web Services SAWSDL – Semantic Annotations for WSDL working group Virtually no uptake by bioinformatics service providers Doesn’t address non-WSDL services

Adding Semantics – Annotating Services Find services by their function instead of their name The services might be distributed, but a registry of service descriptions can be central and queried We need to annotate services with semantics In my Grid, we use the Feta Semantic Discovery tool and a semantic annotation tool – and expert curation

my Grid Ontology Logically separated into two parts: Service ontology Physical and operational features of (web) services Domain ontology (Semantic Content Model) Annotation vocabulary for core bioinformatics data, data types and their relationships

Service Ontology Models services from the point of view of the scientist – Where is it? – How many inputs/outputs? – Who hosts it? Invocation details are hidden by the Taverna workbench Differs from related initiatives in this respect

Domain Ontology Informatics: captures the key concepts of data, data structures, databases and metadata. Bioinformatics: The domain-specific data sources (e.g. the model organism sequencing databases), and domain-specific algorithms for searching and analyzing data (e.g. the sequence alignment algorithm, clustalw). Molecular biology: Concepts include examples such as, protein sequence, and nucleic acid sequence. Formats: A hierarchy describing bioinformatics file formats. For example, fasta format for sequence data, or phylip format for phylogenetic data Tasks: A hierarchy describing the generic tasks a service operation can perform. Examples include retrieving, displaying, and aligning.

Example Service Annotation Example : BLAST from the DDBJ –Performs task: Alignment –Uses Method: Similarity Search Algorithm –Uses Resources: DNA/Protein sequence databases –Inputs: biological sequence (and format) database name (and format) blast program (and format) –Outputs: Blast Report

my Grid Ontology First version of the ontology ~ 2002 Originally developed in DAML+OIL Now developed in OWL and a version exported to RDFS Number of classes in the ontology ~750 Domain and service ontology used by my Grid users and developers of my Grid related plugins Service ontology also used by BioMoby W3C compliant WRT ontology modelling

How do we use the ontology? Two methods of service description 1. Decision Support - querying Composite matches to ontology terms Multiple terms are used to query the annotations 2. Decision Making - reasoning Single description – whole service model Enables automated detection of service mismatches Enables possibility of automated addition of services

Curation Sweatshop  Steady increase in numbers of services and workflows  Users able to find annotated services BUT  Time-consuming and expensive  More and more services built daily SO  Should we encourage service providers to add value?  Should we get users involved?

Collaboration between University of Manchester and EBI Drawing on 6 years experience in Taverna of semantic annotation of services using RDF and OWL ontologies Drawing on experience at EBI in service provision Drawing on experience of social curation and networking from myExperiment First pilot December 2008

Getting the Minimum Community annotation Must be easy and quick Must allow partial descriptions Multiple annotations of the same service What is the minimum information to enable –service discovery –service invocation

Grading Services Bronze – enough to locate the service. Example of service invocation Silver Gold Platinum – full description. All properties annotated – including dependencies between them – reliability metrics – AND CHECKED AND VERIFIED BY A CURATOR

Automatic Annotation Inferring service descriptions from workflows Gathering usage data –How many workflows use this service Gathering reliability data - monitoring –When is this service available –How many times does it fail Helps with “shopping” for services –People who used this service also used this service –Top 10 services –Services that do the same things

Annotation Provenance Who said what about what? Harvesting community annotation Verifying and augmenting by a curator ‘Trust’ Models Annotation versions –In a workflow context –As stand alone services

Feta Model Semantic Content Model Service Model

Curation Model Quantitative Content Tags Service Model Semantic Content Model Ontologies FunctionalProvenance Operational Metrics Conditions of Use Social Standing Biocatalogue Service Profile

A.N. Other Curation Quant’ve Service Model Semantic Content Model Execution Host Service ProfileFinding WSDL WADL S-A.N. Other SAWSDL SA-REST Analytics Ranking Browse/Shop Search Customised Service Workflow

Annotation Process

BioCatalogue: The pilot Features:  User Registration  Service Registration  Search  Annotation  Notification  Integration with myExperiment

For More Information BioCatalogue website BioCatalogue wiki myGrid website / /

my Grid Team

Services Interface Neutral Functional Conditions of Use Operational Social Standing Operational Metrics Provenance Multiply described Third Party Aggregated Feeds Monitoring Multiple Sources Multiple Versions Dynamic Multiple Instances Discovery Interoperability Composition Reuse Trusted Authorities Policies Ranking