Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton

Slides:



Advertisements
Similar presentations
Andrea Maurino Web Service Design Methodology Batini, De Paoli, Maurino, Grega, Comerio WP2-WP3 Roma 24/11/2005.
Advertisements

Fujitsu Laboratories of Europe © 2004 What is a (Grid) Resource? Dr. David Snelling Fujitsu Laboratories of Europe W3C TAG - Edinburgh September 20, 2005.
Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce
Chapter 19 – Service-oriented Architecture
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
Database System Concepts and Architecture
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Architecture Tutorial Summary and Conclusions. Architecture Tutorial The Provenance Architecture.
ARCH-05 Application Prophecy UML 101 Peter Varhol Principal Product Manager.
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Provenance Challenges and Technologies for Grids Luc Moreau University of Southampton
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Introduction To System Analysis and design
Architecture Tutorial Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Electronically Querying for the Provenance of Entities Simon Miles Provenance-Aware Service-Oriented Architectures.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Provenance: an open approach to experiment validation in e- Science Professor Luc Moreau University of Southampton
Programming in Java Unit 3. Learning outcome:  LO2:Be able to design Java solutions  LO3:Be able to implement Java solutions Assessment criteria: 
Information System Development Courses Figure: ISD Course Structure.
Lecture 7: Requirements Engineering
Provenance: an open approach to experiment validation in e- Science Professor Luc Moreau University of Southampton
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Security Issues in a SOA- based Provenance System Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou and Luc Moreau PASOA/EU.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Project Database Handler The Project Database Handler dbCCP4i is a brokering application that mediates interactions between the project database and an.
July 27, 2005High Performance Distributed Computing 05 Recording and Using Provenance in a Protein Compressibility Experiment Paul Groth, Simon Miles,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
OPODIS'04 A protocol for recording provenance in service-oriented Grids Paul Groth, Michael Luck, Luc Moreau University of Southampton.
Formalising a protocol for recording provenance in Grids Paul Groth – University of Southampton.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Tools for Navigating and Analysis of Provenance Information Vikas Deora, Arnaud Contes and Omer Rana.
Metadata Driven Aspect Specification Ricardo Ferreira, Ricardo Raminhos Uninova, Portugal Ana Moreira Universidade Nova de Lisboa, Portugal 7th International.
Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.
Provenance: an open approach to experiment validation in e-Science
Provenance: Problem, Architectural issues, Towards Trust
Sabri Kızanlık Ural Emekçi
Some Basics of Globus Web Services
Distribution and components
Web Ontology Language for Service (OWL-S)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment Pearson Education © 2009.
Service-centric Software Engineering
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Distributed Systems through Web Services
WEB SERVICES From Chapter 19, Distributed Systems
Chapter 2 Database Environment Pearson Education © 2009.
Chapter 2 Database Environment Pearson Education © 2009.
SDMX IT Tools SDMX Registry
Presentation transcript:

Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton

Provenance Team University of Southampton Luc Moreau, Victor Tan, Paul Groth, Simon Miles, Luc Moreau IBM UK (Project Coordinator) John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff Omer Rana, Arnaud Contes, Vikas Deora Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez SZTAKI Laszlo Varga, Arpad Andics German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman

Overview Context Provenance Concepts & Definitions Architectural Design Provenance tools Conclusions

Context: Importance of Past Processes

Context (1) Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

Context (2) High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Bioinformatics: verification and auditing of “experiments” (e.g. for drug approval)

Concepts & Definitions

Provenance: dictionary definition Oxford English Dictionary: the fact of coming from some particular source or quarter; origin, derivation the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners. Concept vs representation

Provenance Definition Our definition of provenance in the context of applications for which process matters to end users:  The provenance of a piece of data is the process that led to that piece of data Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases

Provenance “Lifecycle” Application Results Provenance Store Record Documentation of Execution Query Provenance of Data Administer Store and its contents Core Interfaces to Provenance Store

Nature of Documentation We represent the provenance of some data by documenting the process that led to the data: documentation can be complete or partial; it can be accurate or inaccurate; it can present conflicting or consensual views of the actors involved; it can provide operational details of execution or it can be abstract.

p-assertion A given element of process documentation will be referred to as a p-assertion p-assertion: is an assertion that is made by an actor and pertains to a process.

Service Oriented Architecture Broad definition of service as component that takes some inputs and produces some outputs. Services are brought together to solve a given problem typically via a workflow definition that specifies their composition. Interactions with services take place with messages that are constructed according to services interface specification. The term actor denotes either a client or a service in a SOA. A process is defined as execution of a workflow

M1 M2 M3 M4 Actor 1 Actor 2 I received M1, M4 I sent M2, M3 I received M3 I sent M4 From these p-assertions, we can derive that M3 was sent by Actor 1 and received by Actor 2 (and likewise for M4) If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages Process Documentation (1)

M1 M2 M3 M4 Actor 1 Actor 2 M2 is in reply to M1 M3 is caused by M1 M2 is caused by M4 M4 is in reply to M3 These assertions help identify order of messages, but not how data were computed Process Documentation (2)

f M1 M2 M3 M4 Actor 1 Actor 2 f1 f2 M3 = f1(M1) M2 = f2(M1,M4)M4 = f(M3) These assertions help identify how data is computed, but provide no information about non-functional characteristics of the computation (time, resources used, etc) Process Documentation (3)

M1 M2 M3 M4 Actor 1 Actor 2 I used 386 cluster Request sat in queue for 6min I used sparc processor I used algorithm x version x.y.z Process Documentation (4)

Types of p-assertions (1) Interaction p-assertion: is an assertion of the contents of a message by an actor that has sent or received that message I received M1, M4 I sent M2, M3

Types of p-assertions (2) Relationship p-assertion: is an assertion, made by an actor, that describes how the actor obtained output data or the whole message sent in an interaction by applying some function to input data or messages from other interactions. M2 is in reply to M1 M3 is caused by M1 M2 is caused by M4 M3 = f1(M1) M2 = f2(M1,M4)

Types of p-assertions (3) Actor state p-assertion: assertion made by an actor about its internal state in the context of a specific interaction I used sparc processor I used algorithm x version x.y.z

Data flow Interaction p-assertions allow us to specify a flow of data between actors Relationship p-assertions allow us to characterise the flow of data “inside” an actor Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

Architectural Design

Interfaces to Provenance Store Application Results Provenance Store Record Documentation of Execution Query Provenance of Data Administer Store and its contents

Provenance Tools

Five core deliverables Data model and schema Provenance store Client side libraries Generic Provenance tools Methodology

Provenance Modelling

Provenance Store Reference Implementation Implementation of recording, querying and managing interface Provenance store implemented as a Web Service Client side libraries for using Provenance Store Axis Handler for automatically recording communication between Axis-based Web Services

Axis Handler Axis Handler Provenance Store OGSA DAI Interface Exist DB2 … Backend Stores PS Client Side Library PS Client Side Library Web Service WS Client Query Actor WS PS Client Side Library WS Calls Java Calls Implementation Diagram

Implementation Details Currently functional prototype is a pure Web Services solution (based on Tomcat/AXIS) Security will be based on WS-Security WSRF offers a number of interesting opportunities, and we are considering mapping the (technology-neutral) architecture on to a WSRF-oriented stack.

Query Interface Purpose Obtain the provenance of some specific data Allow for “navigation” of the documentation of execution Abstract interface Allows us to view the provenance store as if containing XML data structures Independent of technology used for running application and internal store representation Seamless navigation of application dependent and application independent provenance representation

Structure of Documentation The documentation of processes recorded by actors can be categorised into a hierarchy All documentation Message exchange Message sender’s viewMessage receiver’s view Message contentState of actor during exchangeRelationships

XML Query Languages Two existing query languages provide ways of navigating hierarchical data: XPath and XQuery For instance, we can use XPath to refer to: The message exchange with ID 345 The client’s view of that exchange The body of the message exchanged // messageExchange [id=“345”] / clientView / messageContent

Navigating Message Content If message content is in XML format, or can be mapped to it, then XPath and XQuery can be used to navigate into the message content For example, we can add application-specific navigation to the previous XPath: The SOAP envelope that encloses the message The body of the message within the envelope The customer name within the body // messageExchange [id=“345”] / clientView / messageContent / soap:envelope / soap:body // customerName

Other Query Requirements Execution Filtering: include/exclude all p- assertions that are marked as part of an execution by a single actor. Functionality Filtering: include/exclude p- assertions that have one of a given set of operation types. Process Filtering: include/exclude p- assertions that belong to a given (set of) process(es).

Navigation Relations ComparisonConflict Analysis Assertion Engine Visualisation uses AB A makes use of B Generic Tools

Analysis: constraint satisfaction over p- assertions and their content Comparison: comparison between assertions Conflict detection: detect conflicts between assertions Rule engine: verify that provenance of some data satisfy some constraints Visualisation: Implemented as a Portlet (using the eXo Portal Framework – JSR 168 compliant

Methodology How to design applications (whether legacy or new) so that they become provenance aware Sets of useful schema Guidelines on what to record

Key Deliverables NOW: First functional prototype NOW: Architecture (technology independent), first public version 04/06: Set of tools 04/06: Final Architecture 09/06: Web Service standardisation proposal 09/06: Full implementation, secure and scalable 09/06: Methodology: how to make your application provenance-aware

Conclusions

Provenance Information Record Applying Provenance Query Compliance Reproduction Analysis Standardising the documentation of Business Processes Provenance Architecture Methodology Apply Healthcare Distribution Finance Aerospace Automobile Pharmaceutical

Conclusions Mostly unexplored area that is crucial to develop trusted systems Definition of provenance Specification of provenance representation Architecture Tools Data models Provenance Store Client side tools Generic tools Methodology

Conclusions Current work: System and protocol designing, architecture specification, generic support for use cases Pursue the deployment in concrete application and performance evaluation Work towards a standardisation proposal Methodology Software soon to be available Tell us about your use cases: we are keen to find new collaborations in this space! Download the architecture definition from