Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.

Slides:



Advertisements
Similar presentations
For the e-Stat meeting of 6-7 April 2011 Paul Lambert / DAMES Node inputs 1)Updates on DAMES 2)Bringing DAMES inputs to e-Stat 3)Misc. feedback - Stat-JR.
Advertisements

Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Open Provenance Model Tutorial Session 6: Interoperability.
Provenance Challenge, Sept Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
UTPB: A Benchmark for Scientific Workflow Provenance Storage and Querying Systems Artem Chebotko Joint work with E. De Hoyos, C. Gomez, A. Kashlev, X.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
WS-VLAM Introduction presentation ws-VLAM workflow Composer System and Network Engineering group Institute of informatics University of Amsterdam.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.
Ewa Deelman, Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal,
WS-VLAM Introduction presentation WS-VLAM Workflow Engine System and Network Engineering group Institute of informatics University of Amsterdam.
WS-VLAM Introduction presentation WS-VLAM Semantic tools Systems, Networking, and Engineering group Institute of informatics University of Amsterdam.
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
1 Ivan Lanese Computer Science Department University of Bologna Roberto Bruni Computer Science Department University of Pisa A mobile calculus with parametric.
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,
Towards interoperability of Learning Activities Design: Transforming BPEL Workflows to IMS Learning Design Level A Learning Flows This work is licensed.
Resource Fabrics: The Next Level of Grids and Clouds Lei Shi.
© 2014 The MITRE Corporation. All rights reserved. Approved for Public Release G. Blake Coe +, R. Christopher Doty *, M. David Allen +, Adriane.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Open Provenance Model Tutorial Session 5: OPM Emerging Profiles.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
Modeling Interactive Web Sources for Information Mediation Information Mediation Framework/Motivation Modeling Interactive Sources with Interaction Diagrams.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Conceptual Modeling Issues in Web Applications enhanced with Web services Sara Comai, Politecnico di Milano In collaboration with:
Usage of `provenance’: A Tower of Babel Luc Moreau.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
GEO Work Plan Symposium 2012 ID-03: Science and Technology in GEOSS ID-03-C1: Engaging the Science and Technology (S&T) Community in GEOSS Implementation.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Kepler+PF+RWS, Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance GGF18 RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
S. Shumilov – Zürich Analytical Visualization Framework - a visual data processing and knowledge discovery system Ivan Denisovich, Serge Shumilov Department.
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Daniel Zinn Bertram Ludäscher University of California at Davis.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
WS-VLAM Tutorial Part I: Hands on the User Graphical Interface Adam Belloum.
David Adams ATLAS Virtual Data in ATLAS David Adams BNL May 5, 2002 US ATLAS core/grid software meeting.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Provenance in Sensornet Republishing Unkyu Park and John Heidemann University of Southern California Information Science Institute June 18, 2008.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Provenance Interoperability and Reasoning Yannis Tzitzikas Assistant.
Graph-Based Operational Semantics
Laura Bright David Maier Portland State University
A Semantic Type System and Propagation
Scientific Workflows Lecture 15
Presentation transcript:

Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar Anand (5), Carole Goble (1) Linking Multiple Workflow Provenance Traces for Interoperable Collaborative Science (1)School of Computer Science, University of Manchester (2)Dept. of Computer Science, University of California, Davis (3)Dept. of Computer Science, Gonzaga University (4)Dept. of Computer Science, Appalachian State University (5)San Diego Supercomputer Center, University of California, San Diego WORKS’10, New Orleans

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Context: Data Sharing Implicit collaboration through data sharing –Alice uses n th generation input dataset x and produces n+1 st output dataset z –… as part of run R A of workflow W A –… output z is published in some data-space. –Bob uses Alice’s outputs z and produces n+2 nd generation dataset v –… using workflow W B, possibly with pre-processing f – Alice and Bob may not know each other

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Motivation: Virtual Joint Experiments How do we ensure that Charlie gets a complete account of the history of W c ’s outputs? How do we ensure that Alice gets her due (partial) credit when Charlie uses Bob’s data v?  traces T A and T B will be critical  need to compose them to obtain T C We can view the composition W C as a new, virtual workflow

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Provenance Composition: the Data Tree of Life (DToL) We can formulate our questions in terms of provenance of the datasets produced by virtual workflow W C : –What is the complete provenance of v? Answering the question requires tracing v’s derivation all the way to x But, to achieve this, we need to ensure: T A and T B are properly connected Provenance queries run seamlessly over and across T A and T B

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Test scenario: 1 st Provenance Challenge Workflow DataONE Summer-of-Code Project –Split First Provenance Challenge workflow at various points –Publish Part-I from system X, use as input for Part-II on system Y X, Y in { Kepler/SDF, Kepler/COMAD, Taverna }

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Common Model of Provenance (approx. OPM) Data provenance for a single workflow run is well understood T A trace instance of W A : h: T A ➔ W A homomorphism h(x 1 ➔ a 1 ) = h(x 2 ➔ a 2 ) = X ➔ A, h(a 1 ➔ y 1 ) = h(a 2 ➔ y 2 ) = A ➔ Y... Workflow spec: digraph W= (V W, E W ) V W = A ∪ C - actors A (processors) - channels C (FIFO data buffers) E W = E in ∪ E out in edges E in ⊆ A x C out edges E out ⊆ C x A Trace graph: acyclic digraph T = (V T, E T ) V T = I ∪ D (invocations I, data D) E T = E read ∪ E write read edges E read ⊆ D x I write edges E write ⊆ I x D

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Data and Invocation Dependencies ( ddep, idep ) - read, write are natural observables for a workflow run - possible additional relations (recorded or inferred): invocation dependencies: data dependencies: “a 2 depends on a 1 ” because a 1 has written data d, a 2 has read d Explicit or via: “d 2 depends on d 1 ” … because some actor invocation a read d 1 prior to writing d 2 (Note: in some models of computation the rules above are not correct)

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Provenance queries Local (“non-closure”) queries on a trace T: –Find the data and traces published by Alice / Bob –Find the inputs, outputs, and intermediate data products of T –Find (selected) actors and channels used in T –Find inputs and outputs of an invocation a i in T Easy and not very interesting E.g. answer to (3) is just the set of nodes in h(T) Closure queries: operate on the transitive closure ddep* over ddep: suppose ddep* spans multiple traces T A, T B we must define the standard query: so that it operates on the composition of T A, T B

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Issues in Provenance Composition Main problems and approaches: Closure queries now must span multiple provenance traces –heterogeneity of both workflow and provenance models I - Trace disconnect: –traces that should “join” on the shared data, are really disconnected – make data sharing process itself provenance-aware III - Data identifiers mismatch –different workflows adopt different data identification schemes – assert data equivalence as part of provenance II - Model heterogeneity: –common provenance model with local ➔ global mapping –different workflow and provenance models

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Part I – Provenance Stitching The missing link: make every data copy step provenance-aware - r : data reference in store S - trace-equivalence of data items d in S, d’ in S’: d ≃ d’ if d’ is obtained by copying d from S to S’:

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Part II - Mapping to a Common Provenance Model Mapping rules (= code, queries) defined from Kepler and Taverna provenance models to common model (details omitted): In the result T P each reference r found in T S is replaced with ρ(r) – OPM used as intermediate target model – … doesn’t “nail” everything – a mixed blessing … – … but team-work made it work!

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Part III – Data Identifier Reconciliation We have seen that the copy operation … r’ = copy(r, S, S’) … on shared data store S generates a data equivalence assertion It also keep track of ID mappings: added to renaming map from a set of S-specific references to a set of public references

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Extended (across-runs) Provenance Queries Closure queries are redefined on the extended provenance trace that includes trace-equivalences d ≃ d’ as follows: for instance between

Prototype Architecture

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 Conclusions 1/2 In theory, provenance interoperability should be solved/easy using e.g. OPM In practice it isn’t (cf. Provenance Challenge workshops), e.g. –different mappings to OPM –different identifier schemes –traces broken “at the seams” Summer-of-code DToL prototype demonstrates feasibility of provenance-aware collaboration / workflow interoperation through data –Extends potential of provenance analysis beyond isolated workflow- based experiments Findings relevant for data preservation in –Tracing data access is key

Linking Provenance Traces … P Missier, B Ludäscher et al. WORKS’10 DataONE: – Data Tree-of-Life (DToL Summer Project) – Runtime wf systems interoperability can be very hard –… and benefits not clear (unless “layered” approach w/ different roles of wf systems)  wf provenance interoperability to the rescue! Next Steps: –DataONE Working Group on Provenance for Scientific Workflows –Develop DOPM (DataONE Provenance Model; OPM++) Conclusions 2/2