Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
© Geodise Project, University of Southampton, Semantic Web based Content Enrichment and Knowledge Reuse in e-Science.
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Workflows within Taverna Stuart Owen University of Mancester, UK
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
The Representation of Scientific Data
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
OMII-UK Software Activities Steven Newhouse, Director.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Taverna Workbench Stuart Owen University of Mancester, UK
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
N NESSTAR: A Semantic Web Application for Statistical Data and Metadata Pasqualino “Titto” Assini Nesstar Ltd - UK.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
First International Workshop on Portals for Life Sciences Sandra Gesing
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
National e-Science Centre, Edinburgh 27/11/06 (Ontology-based) Metadata: What is it, Where and How can we use it, and How can we share it?
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Tools for Navigating and Analysis of Provenance Information Vikas Deora, Arnaud Contes and Omer Rana.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
Katy Wolstencroft University of Manchester
Enrico Fattibene INFN-CNAF
Distributed Computing for System Biology using Taverna Workflows
Shim (Helper) Services and Beanshell Services
Presentation transcript:

Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK

2 A UK e-Science project to build middleware for in silico experiments by individual life scientists, stuck in under-resourced labs, who use other people’s applications. Sequence analysis, microarray analysis, proteomics, chemoinformatics, image processing, rendering Dilbert cartoons acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt

3 Two tiers of services my Grid services for workflow, data management, provenance management, browser clients, service discovery etc Open extensible SO architecture: Web services, APIs, e-Science events, messages, plug-in framework, information model Neat and controlled Domain services BioMART, BioMOBY, NCBI, EMBL-EMBL, R package, Seqhound, EMBOSS, PubMed, caBIG etc of these. None of them ours. Scruffy and independent. And not much WSDL.

4 Open World Burden Independent third party service providers Independent, unknown users No compatibility compliance between domain services expected No one application (data pipeline focused) No common domain data model Lightweight + Jam today

5 Explicit exposed description for the scientist about how to do stuff …and what you did…and the provenance of what you got. Easier to explain, share, relocate, reuse and repurpose. User viewpoint. Pattern books and workflow catalogues A market of workflows Workflows

6

7 How to hide the complexity of interoperating these domain services? Bury it Freefluo Workflow enactor Processor Plain Web Service Soap lab Processor Local Java App Processor Enactor Processor Bio MOBY Processor WSRF Processor Bio MART Styx client Processor R package

8 How to cope with data incompatibility between services? Fix up the services to be compatible Shims – libraries of adapters.

9 Experience Report Workflows and bits of workflow are popular and get exchanged. Buy-in depends on MY service’s availability. User-oriented workflow language hides a multitude of sins. Shims are ok. And we should hide ‘em. Results management is killer. Need workflow patterns and best practice. Did not use BPEL.

services? 100s of workflows? How do I find anything? How do I know what works with what and what it does? Service ModelOntology

11 Experience Report OWL Reasoning to classify and match services Capturing and curating content bottleneck. People vs machine descriptions. For people - a little semantics goes a long way. Don’t be too clever. Semantic Web Service models (OWL-S, WSMO, WSDL-S) immature

12 Workflow outcomes A record of outcome data and its provenance. Store data outcomes with a unique id, link together in a typed graph. In fact store all provenance as graph! Life Science Identifier

13 urn:data:f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data12 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2…. urn:hit1… urn:hit50….. [instanceOf] [similar_sequence_to] Data generated by services/workflows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10….. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation5 urn:data:f1 [output] New sequence Missed sequence [hasName] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom] Concept Data

14 Fusion between different data models using shared concepts or data outputOf createdFrom contains_similiar_seq_t o urn:genbank2 … urn:genbank1 … urn:genbank5 0… Blast_reportDNA_sequence urn:BlastNInvocation3 urn:data:3 urn:data2 inputOf Blast_servic e instanceOf urn:williams A urn:run5 urn:data2 urn:run7 urn:williamsB GenBankUniProt runOf inputOf runOf createdBy LSID createdB y urn:data: f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data1 2 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2…. urn:hit1… urn:hit50 ….. [instanceOf] [similar_sequence_to] Data generated by services/workfl ows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10 ….. [contains] [instanceOf] urn:BlastNInvocation 3 urn:invocation 5 urn:data: f1 [output] New sequence Missed sequence [hasName ] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFro m] Add assertions, Add rules Reason over assertions

15 Experience Report Classification and reasoning over results. Graph matching. User provenance + machine provenance Extensible non-prescriptive model Maturity of standards – LSID . Scalability and maturity of tools. RDF graphs are not for humans. Customised presentation tools.

16 Take home Workflows and semantic web technologies powerful tools. Especially for scruffies. Both about description. Both help us be flexible.