Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Enabling and Supporting Provenance in e-Science Applications Luc Moreau University of Southampton
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
High Performance Computing Course Notes Grid Computing.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
A Successful RHIO Implementation
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Provenance Challenges and Technologies for Grids Luc Moreau University of Southampton
Client-Server Processing and Distributed Databases
Provenance in myGrid and beyond Luc Moreau, University of Southampton, UK.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Web Engineering Web engineering is the process used to create high quality WebApps. Web engineering is not a perfect clone of software engineering. But.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Data Provenance and Data Quality Inference The University of Texas at Dallas Computer Science 11/13/2006 Ping Mao Jungin Kim.
Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
Max Ong University of Sheffield, UK. AHM 2004 Session 2.3: Workflow Composition, Wednesday 1 st September 2004, 4pm. Workflow Advisor in DAME Abstract.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Chapter 6 CASE Tools Software Engineering Chapter 6-- CASE TOOLS
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
OPODIS'04 A protocol for recording provenance in service-oriented Grids Paul Groth, Michael Luck, Luc Moreau University of Southampton.
Formalising a protocol for recording provenance in Grids Paul Groth – University of Southampton.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Advanced Higher Computing Science
CIS 375 Bruce R. Maxim UM-Dearborn
IS301 – Software Engineering V:
Provenance: Problem, Architectural issues, Towards Trust
POW MND section.
Distribution and components
Component Based Software Engineering
Abstract descriptions of systems whose requirements are being analysed
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Globus Toolkit™: Information Services
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
High Performance Computing Center – HLRS
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
ROLE OF «electronic virtual enhanced research-engaged student teams» WEB PORTAL IN SOLUTION OF PROBLEM OF COLLABORATION INTERNATIONAL TEAMS INSIDE ONE.
Chapter 6: Architectural Design
AI Discovery Template IBM Cloud Architecture Center
From Use Cases to Implementation
Presentation transcript:

Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton

Contents  A definition of provenance  Example 1: Aerospace engineering  Example 2: Organ transplant management  Example 3: Bioinformatics grid  Provenance architecture  Provenance service  Conclusion

The Grid and Virtual Organisations  The Grid problem is defined as coordinated resource sharing and problem solving in dynamic, multi- institutional virtual organisations [FKT01].  Effort is required to allow users to place their trust in the data produced by such virtual organisations  Understanding how a given service is likely to modify data flowing into it, and how this data has been generated is crucial.

Provenance and Virtual Organisations  Given a set of services in an open grid environment that decide to form a virtual organisation with the aim to produce a given result;  How can we determine the process that generated the result, especially after the virtual organisation has been disbanded?  The lack of information about the origin of results does not help users to trust such open environments.

Provenance and Workflows  Workflow enactment has become popular in the Web Services and Grid communities  Workflow enactment can be seen as a scripted form of virtual organisation.  The problem is similar: how can we determine the origin of enactment results.

Provenance: Definition  Provenance is an annotation able to explain how a particular result has been derived.  In a service-oriented architecture, provenance identifies what data is passed between services, what services are available, and what results are generated for particular sets of input values, etc.  Using provenance, a user can trace the “process” that led to the aggregation of services producing a particular output.

Provenance in Aerospace Engineering Aerospace engineering requires to undertake scientific simulations, data pre- and post- processing and visualisation, composed in complex workflows.

Provenance in Aerospace Engineering  Provenance is crucially required in this context, as the need to maintain a historical record of outputs from each sub-system is an important requirement for many customers that utilise the end result of simulations.  For instance, aircrafts’ provenance data need to be kept for up to 99 years when sold to some countries.  Currently, however little direct support is available for this.

Provenance in Organ Transplant Management  Medical information systems, and in particular decision support systems for organ and tissue transplant, rely on a wide range of data sources, patient data, and knowledge added by doctors, surgeons and other individuals using the systems.

Provenance in Organ Transplant Management  Such a domain is heavily regulated  European, national, regional and site specific rules govern how decisions are made  Application of these rules must be ensured, be auditable and may change over time  Patient recovery is highly dependent on  organ allocation choice,  extraction and insertion methods,  care/recovery regime.

Provenance in Organ Transplant Management  Tracking back previous decisions in any one centre to identify whether the best match was made, who was involved in the decision, what was the context.  Maximise the efficiency in matching and recovery rate of patients.

Provenance in a Bioinformatics Grid (myGrid) myGrid aims to build a personalised problem-solving environment, in which:  the scientist can construct in silico experiments,  find and adapt others,  store results in data repositories,  have their own view on public repositories,  be better informed as to the provenance and the currency of the tools and data directly relevant to their experimental space.

Provenance in a Bioinformatics Grid (myGrid) Two major forms of provenance [Greenwood03]: The derivation path records the process by which results are generated from input data. Derivation data provides the answer to questions about what initial data was used for a result, and how was the transformation from initial data to result achieved. FDA requirement on drug companies to keep a record of provenance of drug discovery as long as the drug is in use (up to 50 years sometimes).

Provenance in a Bioinformatics Grid (myGrid) Two major forms of provenance [Greenwood03]: Annotations are attached to objects, or collections of objects. Annotation data provides more contextual information that might be of interest: who performed an experiment, when did they supply any comments on the specific methods and materials used, when an object was created, last updated,who owns it and its format. Useful to provide personalised environment.

Other Provenance Requirements and Uses  Standard lineage representation, automated lineage recording, unobtrusive information collecting [Frew and Brose 02]  To give reliability and quality, justification and audit, re-usability, reproducibility and repeatability, change and evolution, ownership, security, credit and copyright [Goble02]

What is the problem?  Provenance recording should be part of the infrastructure, so that users can elect to enable it when they execute their complex tasks over the Grid or in Web Services environments.  Currently, the Web Services protocol stack and the Open Grid Services Architecture do not provide any support for recording provenance.

Our Contributions  A service-oriented architecture for provenance support in Grid and Web Services environments, based on the idea of a provenance service;  A client-side API for recording provenance data for Web Service invocation;  A data model for storing provenance data;  A server-side interface for querying provenance data;  Two components making use of provenance: provenance browsing and provenance validation.

Overall Architecture

 Provenance gathering is a collaborative process that involves multiple entities, including the workflow enactment engine, the enactment engine's client, the service directory, and the invoked services.  Provenance data will be submitted to one or more “provenance repositories” acting as storage for provenance data.  Upon user's requests, some analysis, navigation and reasoning over provenance data can be undertaken.

Overall Architecture  Storage could be achieved by a provenance service.  A library, optionally hosted in the provenance service, would perform the analysis, navigation or reasoning.  A client side library would submit provenance data to the provenance service.

System Overview

Sequence Diagram  To identify the interactions between provenance service, client side library and enactment engine  Creation of a session  Need to be able to support the most complex workflows including conditional branching, iteration, recursion and parallel execution.  Support asynchronous submission of provenance data so that provenance submission does not delay workflow execution.

Sequence Diagram

Provenance Data Model  Must support recording of all information necessary to replay execution  Must support all complex forms of workflows (recursion, iterations, parallel execution).

Provenance Data Model

Discussion  In order for provenance data to be useful, we expect such a protocol to support some “classical” properties of distributed algorithms.  Using mutual authentication, an invoked service can ensure that it submits data to a specific provenance server, and vice- versa, a provenance server can ensure that it receives data from a given service.  With non-repudiation, we can retain evidence of the fact that a service has committed to executing a particular invocation and has produced a given result.  We anticipate that cryptographic techniques will be useful to ensure such properties

 The purpose of project PASOA to investigate provenance in Grid architectures  Funded by EPSRC under the “fundamental computer science for e-Science call”  In collaboration with Cardiff 

Conclusion  Provenance is a rather unexplored domain  Strategic to bring trust in open environment  Our provenance service is the first attempt to incorporate provenance in the infrastructure of Web and Grid services  Need to further investigate the algorithmic foundations of provenance, which will lead to scalable and secure industrial solutions.

Acknowledgements  Syd Chapman, IBM  Omer Rana, Cardiff  Andreas Schreiber and Rolf Hempel, DLR  Lazslo Varga, SZTAKI  Ulises Cortes and Steven Willmott, UPC  Mark Greenwood, Carole Goble, Manchester