Provenance: Problem, Architectural issues, Towards Trust

Slides:



Advertisements
Similar presentations
Exploiting the WWW: Lessons from a UK Research Project on a Health Record BrokerExploiting the WWW: Lessons from a UK Research Project on a Health Record.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Enabling and Supporting Provenance in e-Science Applications Luc Moreau University of Southampton
© Geodise Project, University of Southampton, Applying the Semantic Web to Manage Knowledge on the Grid Feng Tao, Colin.
ASPiS - Architecture for a Shibboleth-Protected iRODS System Mark Hedges, Tobias Blanke Centre for e-Research, Kings College London Adil Hasan, Jens Jensen.
Collaborative Orthopaedic Research Environment Lester Gilbert, Gary Wills, Yee-Wai Sim, Chu Wang, Matt Stenning School of Electronics and Computer Science.
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton
Architecture Tutorial Summary and Conclusions. Architecture Tutorial The Provenance Architecture.
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
FREMA: e-Learning Framework Reference Model for Assessment David Millard Yvonne Howard IAM, DSSE, LTG University of Southampton, UK.
Provenance Challenges and Technologies for Grids Luc Moreau University of Southampton
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Provenance in myGrid and beyond Luc Moreau, University of Southampton, UK.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Data Provenance and Data Quality Inference The University of Texas at Dallas Computer Science 11/13/2006 Ping Mao Jungin Kim.
Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
WSMX Execution Semantics Executable Software Specification Eyal Oren DERI
UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Semantic Web Services CS - 6V81 University of Texas at Dallas November.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
DAME: A Distributed Diagnostics Environment for Maintenance Duncan Russell University of Leeds.
Secure Systems Research Group - FAU SW Development methodology using patterns and model checking 8/13/2009 Maha B Abbey PhD Candidate.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
OPODIS'04 A protocol for recording provenance in service-oriented Grids Paul Groth, Michael Luck, Luc Moreau University of Southampton.
Formalising a protocol for recording provenance in Grids Paul Groth – University of Southampton.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Access Control for Dynamic Virtual Organisations Duncan Russell, Peter Dew & Karim Djemame University of Leeds.
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Advanced Higher Computing Science
CIS 375 Bruce R. Maxim UM-Dearborn
Accessing the VI-SEEM infrastructure
Clouds , Grids and Clusters
Distribution and components
Toward XDS V2 Draft, September 2004
The Extensible Tool-chain for Evaluation of Architectural Models
Knowledge Based Workflow Building Architecture
The Globus Toolkit™: Information Services
1st International Conference on Semantics, Knowledge and Grid
Chapter 17: Client/Server Computing
High Performance Computing Center – HLRS
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Chained Negotiation for Distributed Notification Services
Presentation transcript:

Provenance: Problem, Architectural issues, Towards Trust Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton

Contents A definition of provenance Example 1: Aerospace engineering Example 2: Organ transplant management Example 3: Bioinformatics grid Provenance architecture Towards Trust Conclusion

The Grid and Virtual Organisations The Grid problem is defined as coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organisations [FKT01]. Effort is required to allow users to place their trust in the data produced by such virtual organisations Understanding how a given service is likely to modify data flowing into it, and how this data has been generated is crucial.

Provenance and Virtual Organisations Given a set of services in an open grid environment that decide to form a virtual organisation with the aim to produce a given result; How can we determine the process that generated the result, especially after the virtual organisation has been disbanded? The lack of information about the origin of results does not help users to trust such open environments.

Provenance and Workflows Workflow enactment has become popular in the Grid and Web Services communities Workflow enactment can be seen as a scripted form of virtual organisation. The problem is similar: how can we determine the origin of enactment results.

Provenance: Definition Provenance is an annotation able to explain how a particular result has been derived. In a service-oriented architecture, provenance identifies what data is passed between services, what services are available, and what results are generated for particular sets of input values, etc. Using provenance, a user can trace the “process” that led to the aggregation of services producing a particular output.

Provenance in Aerospace Engineering Provenance requirement: to maintain a historical record of outputs from each sub-system involved in simulations. Aircrafts’ provenance data need to be kept for up to 99 years when sold to some countries. Currently, little direct support is available for this.

Provenance in Organ Transplant Management Decision support systems for organ and tissue transplant, rely on a wide range of data sources, patient data, and doctors’ and surgeons’ knowledge Heavily regulated domain: European, national, regional and site specific rules govern how decisions are made. Application of these rules must be ensured, be auditable and may change over time Provenance allows tracking previous decisions, which is crucial in maximising the efficiency in matching and recovery rate of patients Tracking back previous decisions in any one centre to identify whether the best match was made, who was involved in the decision, what was the context.

Provenance in a Bioinformatics Grid (myGrid) myGrid builds a personalised problem-solving environment that helps bioinformaticians find, adapt, construct and execute in silico experiments Keep the scientist informed as to the provenance of data relevant to their experiment space Provenance in Drugs Discovery process: FDA requirement on drug companies to keep a record of provenance of drug discovery as long as the drug is in use (up to 50 years sometimes).

What is the problem? Provenance recording should be part of the infrastructure, so that users can elect to enable it when they execute their complex tasks over the Grid or in Web Services environments. Currently, the Web Services protocol stack and the Open Grid Services Architecture do not provide any support for recording provenance.

Architectural Vision

Architectural Vision Provenance gathering is a collaborative process that involves multiple entities, including the workflow enactment engine, the enactment engine's client, the service directory, and the invoked services. Provenance data will be submitted to one or more “provenance repositories” acting as storage for provenance data. Upon user's requests, some analysis, navigation and reasoning over provenance data can be undertaken.

Architectural Vision Storage could be achieved by a provenance service. Provenance service would provide support for analysis, navigation or reasoning over provenance Client side support for submitting provenance data to the provenance service.

A First Prototype (Szomszor,Moreau 03) A service-oriented architecture for provenance support in Grid and Web Services environments, based on the idea of a provenance service; A client-side API for recording provenance data for Web Service invocation; A data model for storing provenance data; A server-side interface for querying provenance data; Two components making use of provenance: provenance browsing and provenance validation.

Prototype Overview

Prototype Sequence Diagram

Prototype Provenance Data Model

Prototype Provenance Browser

Discussion In order for provenance data to be useful, we expect such a protocol to support some “classical” properties of distributed algorithms. Using mutual authentication, an invoked service can ensure that it submits data to a specific provenance server, and vice-versa, a provenance server can ensure that it receives data from a given service. With non-repudiation, we can retain evidence of the fact that a service has committed to executing a particular invocation and has produced a given result. We anticipate that cryptographic techniques will be useful to ensure such properties

Towards Trust

Towards Trust Using the provenance of data, trust metrics of the data can be derived from: Trust the user places in invoked services Trust the user places in the input data Trust the user places in the enacted workflow Trust the user places in the provenance service.

The purpose of project PASOA to investigate provenance in Grid architectures Funded by EPSRC under the “fundamental computer science for e-Science call” In collaboration with Cardiff www.pasoa.org

Conclusion Provenance is a rather unexplored domain Strategic to bring trust in open environment Necessity to design a configurable architecture capable of support multiple requirements from very different application domains. Need to further investigate the algorithmic foundations of provenance, which will lead to scalable and secure industrial solutions.

Publications [SM03] Martin Szomszor and Luc Moreau. Recording and reasoning over data provenance in web and grid services. In International Conference on Ontologies, Databases and Applications of SEmantics (ODBASE'03), volume 2888 of Lecture Notes in Computer Science, pages 603-620, Catania, Sicily, Italy, November 2003. [MCS+03] Luc Moreau, Syd Chapman, Andreas Schreiber, Rolf Hempel, Omer Rana, Lazslo Varga, Ulises Cortes, and Steven Willmott. Provenance-based trust for grid computing - position paper. 2003.

Acknowledgements Martin Szomzor, Southampton Syd Chapman, IBM Omer Rana, Cardiff Andreas Schreiber and Rolf Hempel, DLR Lazslo Varga, SZTAKI Ulises Cortes and Steven Willmott, UPC Mark Greenwood, Carole Goble, Manchester