Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Research Issues in Web Services CS 4244 Lecture Zaki Malik Department of Computer Science Virginia Tech
Harvards PASS Takes on The Provenance Challenge September 13, 2006 Margo Seltzer Harvard University Division of Engineering and Applied Sciences.
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
Wrapping Scientific Applications as Web Services Gopi Kandaswamy (RENCI) Marlon Pierce (IU)
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
As computer network experiments increase in complexity and size, it becomes increasingly difficult to fully understand the circumstances under which a.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A. Slominski.
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
May 29, 2007 Dynamically Adaptive Weather Analysis and Forecasting in LEAD: Issues in Data Management, Metadata, and Search Beth Plale Director, Center.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University.
Sponsored by the National Science Foundation netKarma Spiral 2 Year-end Project Review Indiana University Beth Plale (PI) School of Informatics and Computing.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
The Client/Server Database Environment
May 29, 2007 Metadata, Provenance, and Search in e-Science Beth Plale Director, Center for Data and Search Informatics School of Informatics Indiana University.
Karma Provenance Collection Framework for Data-driven Workflows Yogesh Simmhan Microsoft Research Beth Plale, Dennis Gannon, Ai Zhang, Girish Subramanian,
Grid Computing for Real World Applications Suresh Marru Indiana University 5th October 2005 OSCER OU.
January, 23, 2006 Ilkay Altintas
DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
Demonstrating WSMX: Least Cost Supply Management.
Server-side Scripting Powering the webs favourite services.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida Dataflows in SRB using SDSC Matrix Arun Jagatheesan Architect & Team.
Developing Reporting Solutions with SQL Server
2. Database System Concepts and Architecture
Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and.
Usage of `provenance’: A Tower of Babel Luc Moreau.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Kuali Enterprise Workflow Presented at ITANA October 2009 Eric Westfall – Kuali Rice Project Manager.
1 Dryad Distributed Data-Parallel Programs from Sequential Building Blocks Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, Dennis Fetterly of Microsoft.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Interoperability between Scientific Workflows Ahmed Alqaoud, Ian Taylor, and Andrew Jones Cardiff University 10/09/2008.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Extreme! Computing Lab, Dept. of Computer Science, Indiana University 1 Programming the Grid with Components Madhu Govindaraju Aleksander Slominski Dennis.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
AgINFRA science gateway for workflows and integrated services 07/02/2012 Robert Lovas MTA SZTAKI.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
A Cyberinfrastructure Framework for Discovery, Integration, and Analysis of Earth Science Data A Prototype System A. K. Sinha, Z. Malik, A. Rezgui, A.
1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya.
Overview of Grid Webservices in Distributed Scientific Applications Dennis Gannon Aleksander Slominski Indiana University Extreme! Lab.
Distributed Logging Facility Castor External Operation Workshop, CERN, November 14th 2006 Dennis Waldron CERN / IT.
VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
Workflow Management Concepts and Requirements For Scientific Applications.
1 / 23 Presenter: Dong Dai, DISCL Lab. TTU Data-Intensive Scalable Computing Laboratory Department of Computer Science Accelerating Scientific.
The Client/Server Database Environment
The Client/Server Database Environment
Open Grid Computing Environments
San Diego Supercomputer Center University of California, San Diego
A General Approach to Real-time Workflow Monitoring
Presentation transcript:

Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University

[2/25] Outline  Architecture of Karma  Workflow Setup & Collecting Provenance  Provenance Traces  “canonical” Challenge Queries  Suggested Variations

[3/25] Provenance Collection: Challenges & Uses  Linked Environments for Atmospheric Discovery (LEAD) project Weather & Severe Storm Prediction Applications  Provenance on workflow (process) & data products at fine granularity  Dynamic, Long running workflows  Helps scientists to search for workflows & data products, Track workflow execution, Analyze & mine data products from runs

[4/25] Karma Provenance Framework  Lightweight – do not duplicate existing metadata cataloging effort myLEAD personal metadata catalog ResCat service & data registry  Glue to integrate metadata on data & services with runtime workflow information  Scalability 1 – 500 users, 100’s of workflows, 10,000’s of data products [1] [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006

[5/25] Karma Provenance Service Provenance Listener Provenance Listener Activity DB Activity DB Karma Architecture 2 Workflow Instance 10 Data Products Consumed & Produced by each Service Workflow Instance 10 Data Products Consumed & Produced by each Service Service 2 Service 2 … … Service 1 Service 1 Service 10 Service 10 Service 9 Service 9 10P/10C 10C 10P10C10P/10C 10P Workflow Engine Workflow Engine Message Bus WS-Eventing Service API WS-Messenger Notification Broker WS-Messenger Notification Broker Publish Provenance Activities as Notifications Application–Started & –Finished, Data–Produced & –Consumed Activities Workflow–Started & –Finished Activities Provenance Query API Provenance Query API Provenance Browser Client Provenance Browser Client Query for Workflow, Process, & Data Provenance Subscribe & Listen to Activity Notifications [2] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006A Framework for Collecting Provenance in Data-Centric Scientific Workflows

[6/25] Provenance Challenge Workflow  Applications modeled as web-services GFac toolkit creates service for command-line applications Service invokes a shell-script wrapper of the application, passing command-line arguments Created services automatically instrumented to generate provenance using Karma client library  Workflow composed as GPEL * script XBaya Workflow composer GUI Central GPEL workflow engine orchestrates execution *Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)

[7/25] Provenance Challenge Workflow

[8/25] Provenance Traces  Data Provenance: get[Recursive]DataProvenance What (ID), where (URL), when (Timestamp) How (Process, inputs)

[9/25] Provenance Traces  Process Provenance: getProcessProvenance What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Input & Output data products

[10/25] Provenance Traces  Workflow Trace: getWorkflowTrace What (ID), when (Timestamp), who (Invoker) State (execution/completion status) Process provenance of workflow steps

[11/25]

[12/25] Provenance Challenge Queries  !Answered by Karma Service API Directly  Answered by Karma Service API, with post-processing by client  ~Answered by access to backend DB (SQL)   Not answered Query Result ! ! ~ ~ ~ ~ 

[13/25] Provenance Challenge Queries: Q1  Find everything that caused Atlas X Graphic to be as it is  !Answered by Karma Service API Directly  This is the recursive data provenance of the Atlas X Graphic file  A call to getRecursiveDataProvenance( ‘lead:uuid: atlas-x.gif’) returns this [www]thiswww

[14/25] Provenance Challenge Queries: Q2  Find the process that led to Atlas X Graphic, excluding all prior to softmean  Answered by Karma Service API, with post- processing by client 1. First call getDataProvenance 2. Then recursively get data provenance till ‘SoftmeanService’ is seen Returns this [www]thiswww 1. let $dataList := ['lead:uuid: atlas-x.gif'] 2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End

[15/25] Provenance Challenge: Q4  Find all invocations of align_warp ( with parameter "-m 12") that ran on a Monday  ~ Answered by access to backend DB (SQL) 1. Use SQL query to get matching invocations 2. Call getProcessProvenance to get description of align_warp Returns this [www]thiswww SELECT invokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHERE invokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname: AND notifications.notification_xml LIKE'% 12 %‘ AND DayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday,...

[16/25] Provenance Challenge: Q9  Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files.   Not answered  We do not expect to answer such queries through the provenance system  We push the provenance information to external metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance

[17/25] Variations of Workflow  Workflows with loops  Workflows whose structure changes dynamically or, as a simpler case, workflows with conditional branches  Hierarchical composition of workflows workflows invoking other workflows

[18/25] Variations of Queries  Find all [workflows | processes] with a particular execution status [completed | failed | waiting for input]  Show the client view and service view of the provenance and check for differences

Acknowledgements Alek Slominski (GPEL Engine) Satoshi Shirasuna (XBaya Composer) LEAD Members NSF Questions

[20/25]  More here [www]here Sample Activities Published

[21/25] Karma DB Schema