Open Provenance Model Tutorial Session 1: Background Luc Moreau University of Southampton.

Slides:



Advertisements
Similar presentations
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
Advertisements

David De Roure Social Networking and Workflows in Research.
Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
Open Provenance Model Tutorial Session 6: Interoperability.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Open Provenance Model Tutorial Session 3: OPM Serializations Luc Moreau University of Southampton.
UTPB: A Benchmark for Scientific Workflow Provenance Storage and Querying Systems Artem Chebotko Joint work with E. De Hoyos, C. Gomez, A. Kashlev, X.
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
PSAE Practice Session Science Mr. Johns Room 2012.
Ragib Hasan Johns Hopkins University en Spring 2010 Lecture 7 03/29/2010 Security and Privacy in Cloud Computing.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
Modified from Sommerville’s originalsSoftware Engineering, 7th edition. Chapter 8 Slide 1 System models.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Universe Design Concepts Business Intelligence Copyright © SUPINFO. All rights reserved.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Open Provenance Model Tutorial Session 5: OPM Emerging Profiles.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
January, 23, 2006 Ilkay Altintas
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Usage of `provenance’: A Tower of Babel Luc Moreau.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton
Configuration Management (CM)
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Application of Provenance for Automated and Research Driven Workflows Tara Gibson June 17, 2008.
1 Dr. Paolo Missier, Prof. Carole Goble Information Management Group School of Computer Science, University of Manchester, UK with additional material.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Shruthi(s) II M.Sc(CS) msccomputerscience.com. Introduction Digital Libraries have become the source of information sharing across the globe for education,
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Sharing Design Knowledge through the IMS Learning Design Specification Dawn Howard-Rose Kevin Harrigan David Bean University of Waterloo McGraw-Hill Ryerson.
Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
ITGS Databases.
Data Integration and Management A PDB Perspective.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
OWL Representing Information Using the Web Ontology Language.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Marine Metadata Interoperability Acknowledgements Ongoing funding for this project is provided by the National Science Foundation.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos.
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.
Provenance: Problem, Architectural issues, Towards Trust
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Workflow Provenance Bill Howe.
AGENT FRAMEWORK By- Arpan Biswas Rahul Gupta.
Presentation transcript:

Open Provenance Model Tutorial Session 1: Background Luc Moreau University of Southampton

Session 1: Aims In this session, you will learn about: The notion of provenance The Open Provenance Vision The Provenance Challenge Series The birth of OPM

Session 1: Contents Brief introduction to provenance The Open Provenance Vision The Provenance Challenge Series W3C XG-Prov Conclusions Further reading

PROVENANCE 101

Provenance Use Cases Which doctor was involved in a decision? Why an organ was rejected for transplant? Was an organ allocated according to rules? Was the data used in a manner compatible with the purpose it was captured for? Was the latest data used in the computation? Was the data deleted after its use? Organ Transplant Management (Vazquez Salceda, Willmott 05-07) Auditing of private data processing (Rocio Aldeco Perez 08) For an extensive catalogue of provenance use cases, see W3C incubator

The Problem Processes matter – To validate experimental results – To reproduce scientific experiments – To check compliance – To audit applications Computers are good at producing results quickly Computers are bad at explaining their past actions Is there a principled way of addressing this problem.....

Provenance Definition Oxford English Dictionary: – the fact of coming from some particular source or quarter; origin, derivation – the history or pedigree of a work of art, manuscript, rare book, etc.; – concretely, a record of the passage of an item through its various owners. The provenance of a piece of data is the process that led to that piece of data

THE OPEN PROVENANCE VISION

Context: heterogeneous environments Applications consist of compositions of loosely coupled, multi-institutional, heterogeneous components How to trace the origin of data in such environments?

The Science Lifecycle scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Adapted from David De Roure’sslides

scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Finding the Provenance of research outputs across all the systems data transited through

Provenance in a Single Application Application Provenance Store data Feedback (notifications, alarms, continuous audit) Query and reason over provenance of data Record process assertions

Provenance in a Single Application We’re becoming good at tracking provenance in a single (monolithic) application – Provenance in databases (e.g., Perm, Trio, theory) – Provenance in workflow systems (e.g., Taverna, Kepler, VisTrails) – Provenance in operating system (e.g., PASS) – Provenance in some applications (e.g., R, browser)

Provenance Across Applications Application How to understand the provenance of data products derived by all these applications?

Provenance Across Applications Application Provenance Inter-Operability Layer The Open Provenance Model (OPM)

Provenance Inter-Operability Layer

Open Provenance Vision Open Provenance Vision is a vision of a set of architectural guidelines to support provenance inter-operability, consisting of – controlled vocabulary, – serialization formats and – APIs Open Provenance Vision allows provenance from individual systems to be expressed, connected in a coherent fashion, and queried seamlessly.

Export/Import Approach(PC3) N+1 conversions Centralisation (scalability, security concerns) Running queries is easy PS1 PS2 PS3 PS4 Provenance Inter-Operability Layer PS Convert PS i content to OPM Import OPM into PS Run queries over PS

Distributed Query Approach Query API not specified N query APIs to implement Running queries is challenging Better scalability PS1 PS2 PS3 PS4 Query API Offer OPM based Query API Federated query component Federated Queries Query API

Provenance Inter-Operability Layer Common Tools VisualisationReasoningConversion

BACKGROUND: PROVENANCE CHALLENGES

Provenance Challenge 1 Idea came after IPAW’06 standardisation discussion Set up to be informative rather than competitive Aims to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations

fMRI Workflow

Provenance Questions 1.Find the process that led to Atlas X Graphic /everything that caused Atlas X Graphic to be as it is. 2.Find the process that led to Atlas X Graphic, excluding everything prior to the averaging of images with softmean. 3.Find the Stage 3, 4 and 5 details of the process that led to Atlas X Graphic. 4.Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter model that ran on a Monday.

Participating Teams REDUX, MSR Karma, Indiana U. myGrid, U. of Manchester Gridprovenance, Cardiff U. Zoom, U. of Pennsylvania DAKS, UC Davis SDG, PNNL UChicago, U. of Chicago USC/ISI, ISI MINDSWAP, U. of Maryland JP, CESNET VisTrails, U. of Utah ES3, UCSB RWS, UC Davis and SDSC PASS, Harvard NcsaD2k and NcsaCi, NCSA PASOA, U. of Southampton

PC1 outcomes Challenge 1 Provenance questions and expected answers not precise enough Difficult to validate if results returned are correct or even comparable Challenge 2 aimed at establishing inter- operability of systems, by exchanging provenance information

Provenance Challenge 2 Stage 1 Stage 2 Stage 3

Participating Teams MyGrid U. of Manchester SDG, PNNL Karma, Indiana U. OntoGrid, OntoGrid project VisTrails, U. of Utah NCSA, NCSA ISIwithPASOA, ISI PASOA, U. of Southampton MINDSWAP, U. of Maryland Lineage for JOpera, ETH Zurich CESNET, CESNET ES3, UCSB PASS, Harvard

Outcomes Differences between “process provenance” and “data provenance” easily bridged Integrating two or three systems’ provenance data meant interpreting where an identifier produced by one system referred to the same entity as another identifier produced by a different system. Provenance must, at least, contain a causality graph, i.e. the process that occurred, the derivation of data etc. It must be an annotated causality graph, in order to capture the details and not just the structure of the provenance.

OPM: the Open Provenance Model OPM v1.00 (Dec 2007): Luc Moreau, Juliana Freire, Joe Futrelle, Robert E. McGrath, Jim Myers, Patrick Paulson OPM v1.01 (Jul 2008): Luc Moreau, Beth Plale, Simon Miles, Carole Goble, Paolo Missier, Roger Barga, Yogesh Simmhan, Joe Futrelle, Robert E. McGrath, Jim Myers, Patrick Paulson, Shawn Bowers, Bertram Ludaescher, Natalia Kwasnikowska, Jan Van den Bussche, Tommy Ellkvist, Juliana Freire, Paul Groth

Provenance Challenge 3 Identify weaknesses and strengths of the OPM specification Encourage the development of concrete bindings for OPM in a variety of languages Determine how well OPM can represent provenance for a variety of technologies (scientific workflow, databases, etc.) Demonstrate that a complex data products provenance can be constructed from process assertions produced by multiple combinations of heterogeneous applications Bring together the community to further discuss the interoperability of provenance systems.

PC3 Workflow The Pan-STARRS project is building and operating the next generation sky surveyPan-STARRS project The load workflow PC3, appearing at the handoff between the image pipeline and the object data management, ingests incoming CSV files into a SQL database.

PC3 Objectives Implement Load workflow Implement queries: – For a given detection, which CSV files contributed to it? – The user considers a table to contain values they do not expect. Was the range check (IsMatchTableColumnRanges) performed for this table? Export provenance to OPM Import other teams OPM outputs Run queries over other teams’ provenance

Participating Teams NCSA National Center for Supercomputing Applications Swift, U. Chicago Trident, Microsoft Research UCDGC, UC Davis Genome Center SotonUSCISIPc3 University of Southampton and USC/ISI UCSBtake3, University of California, Santa Barbara UoM University of Manchester, UK TetherlessPC3, Rensselaer Polytechnic Institute/Tetherless World Constellation UvA/VL-e University of Amsterdam, NL SDSCPc3 San Diego Supercomputer Center VisTrails3 University of Utah KCL, King's College London PASS3, Harvard Karma3, Indiana University UTEP, University of Texas at El Paso

Outcomes Open source governance model for OPM Promotion of “profiles” to specialize OPM to specific application domains Towards OPM1.1, allowing us to achieve the desired inter-operability for PC3 PC4... Less workflow centric... Focusing more on retrieving/querying the provenance of data produced by several systems

OPM: the Open Provenance Model OPM v1.1 (July 2010): Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche.

W3C Incubator on Provenance

Provenance Challenge 4

Open Provenance Model Issued from a community effort Open source governance model Exploited by teams in the Provenance Challenge Series Being used, studied and adopted beyond … … but what is OPM? … meet us in Session 2!