Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.

1 Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

2 [2/25] 2006-09-13 Outline  Architecture of Karma  Workflow Setup & Collecting Provenance  Provenance Traces  “canonical” Challenge Queries  Suggested Variations

3 [3/25] 2006-09-13 Provenance Collection: Challenges & Uses  Linked Environments for Atmospheric Discovery (LEAD) project Weather & Severe Storm Prediction Applications  Provenance on workflow (process) & data products at fine granularity  Dynamic, Long running workflows  Helps scientists to search for workflows & data products estimate data quality, track workflow execution, and analyze & mine data products from runs

4 [4/25] 2006-09-13 Karma Provenance Framework  Lightweight – do not duplicate existing metadata cataloging effort myLEAD personal metadata catalog ResCat service & data registry  Glue to integrate metadata on data & services with runtime workflow information  Scalability 1 – 500 users, 100’s of workflows, 10,000’s of data products [1] [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

5 [5/25] 2006-09-13 Karma Provenance Framework  Key Provenance Activities generated during lifetime of wrokflow Workflow | Service Invoked Data Consumed Data Produced Sending Response  Activities modeled as XML messages  Published asynchronously by service|workflow|client Presently use WS-Eventing messaging system  Activities stored in relational database

6 [6/25] 2006-09-13 Karma Provenance Service Provenance Listener Provenance Listener Activity DB Activity DB Karma Architecture 1 Workflow Instance 10 Data Products Consumed & Produced by each Service Workflow Instance 10 Data Products Consumed & Produced by each Service Service 2 Service 2 … … Service 1 Service 1 Service 10 Service 10 Service 9 Service 9 10P/10C 10C 10P10C10P/10C 10P Workflow Engine Workflow Engine Message Bus WS-Eventing Service API WS-Messenger Notification Broker WS-Messenger Notification Broker Publish Provenance Activities as async Notifications ServiceInvoked & Sending Response, Data–Produced & –Consumed Activities WorkflowInvoked & SendingResponse Activities Provenance Query API Provenance Query API Provenance Browser Client Provenance Browser Client Query for Workflow, Process, & Data Provenance Subscribe & Listen to Activity Notifications [1] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006A Framework for Collecting Provenance in Data-Centric Scientific Workflows

7 [7/25] 2006-09-13 Provenance Challenge Workflow  Applications modeled as web-services Generic Factory toolkit creates web-service wrappers for command-line applications Service invokes a shell-script/application, passing command-line arguments Created services automatically instrumented to generate provenance using Karma client library  Workflow composed as GPEL * script XBaya Workflow composer GUI Central GPEL workflow engine orchestrates execution *Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

8 [8/25] 2006-09-13 Provenance Challenge Workflow

9 [9/25] 2006-09-13 Provenance Traces – Building Block Queries  Data Provenance: get[Recursive]DataProvenance What (ID), where (URL), when (Timestamp) How (Process, inputs)

10 [10/25] 2006-09-13 Provenance Traces – Building Block Queries  Process Provenance: getProcessProvenance What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Input & Output data products

11 [11/25] 2006-09-13 Provenance Traces – Building Block Queries  Workflow Trace: getWorkflowTrace What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Process provenance of workflow steps

12 [12/25] 2006-09-13

13 [13/25] 2006-09-13 Provenance Challenge Queries  !Answered by Karma Service API Directly  Answered by Karma Service API, with post-processing by client  ~Answered by access to backend DB (SQL)   Not answered Query 123456789 Result ! ! ~ ~ ~ ~ 

14 [14/25] 2006-09-13 Provenance Challenge Queries: Q1  Find everything that caused Atlas X Graphic to be as it is  !Answered by Karma Service API Directly  This is the recursive data provenance of the Atlas X Graphic file  A call to getRecursiveDataProvenance( ‘lead:uuid:1157946992-atlas-x.gif’) returns this [www]thiswww

15 [15/25] 2006-09-13 Provenance Challenge Queries: Q2  Find the process that led to Atlas X Graphic, excluding all prior to softmean  Answered by Karma Service API, with post- processing by client 1. First call getDataProvenance 2. Then recursively get data provenance till ‘SoftmeanService’ is seen Returns this [www]thiswww 1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif'] 2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

16 [16/25] 2006-09-13 Provenance Challenge: Q4  Find all invocations of align_warp with parameter "-m 12" that ran on a Monday  ~ Answered by access to backend DB (SQL) 1. Use SQL query to get matching invocations 2. Call getProcessProvenance to get description of align_warp Returns this [www]thiswww SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:' AND notifications.notification_xml LIKE'% 12 %‘ AND DayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday,...

17 [17/25] 2006-09-13 Provenance Challenge: Q9  Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.   Not answered  We do not expect to answer such queries through the provenance system  We push the provenance information to external metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance

18 [18/25] 2006-09-13 Variations of Workflow  Workflows with loops  Workflows whose structure changes dynamically or, as a simpler case, workflows with conditional branches  Hierarchical composition of workflows workflows invoking other workflows ~Similar to user-views (UPenn), nested- workflows (myGrid), …

19 [19/25] 2006-09-13 Variations of Queries  Find all [workflows | processes] with a particular execution status [completed | failed | waiting for input] Dynamic attribute of provenance?  Query for client view and service view of the provenance Check for differences

20 Acknowledgements Alek Slominski (GPEL Engine) Satoshi Shirasuna (XBaya Composer) LEAD Members NSF Questions

21 [21/25] 2006-09-13  More here [www]here Sample Activities Published

22 [22/25] 2006-09-13 Karma DB Schema

