Automatic Generation of Workflow Execution Provenance Roger S. Barga Database Group, Microsoft Research (MSR)

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Understanding an Apps Architecture ASFA Computer Science: Principles Fall 2013.
Developing Event Driven State Machine Workflows S1 S2 S3 S4 Adam Calderon Principal Engineer - Interknowlogy Microsoft MVP – C#
Designing, Deploying and Managing Workflow in SharePoint Sites Steve Heaney Product Development Manager OBS
JTX Overview Overview of Job Tracking for ArcGIS (JTX)
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT),
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB JavaForum.
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
Vakgroep Informatietechnologie – Breedbandcommunicatienetwerken (IBCN) Dynamic and generic workflows in.NET Bart De Smet.
Objectives Explain the purpose and objectives of object- oriented design Develop design class diagrams Develop interaction diagrams based on the principles.
File Management Chapter 12.
Department of Computer Science 1 CSS 496 Business Process Re-engineering for BS(CS)
Using Microsoft SharePoint to Develop Workflow and Business Process Automation Ted Perrotte National Practice Manager, Quilogy, Microsoft Office SharePoint.
Professional Informatics & Quality Assurance Software Lifecycle Manager „Tools that are more a help than a hindrance”
The chapter will address the following questions:
Process-oriented System Automation Executable Process Modeling & Process Automation.
1 Developing Rules Driven Workflows in Windows Workflow Foundation Jurgen Willis COM318 Program Manager Microsoft Corporation.
This chapter is extracted from Sommerville’s slides. Text book chapter
Creating Business Workflow Using SharePoint Designer 2007 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft SQL Server.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Christopher Jeffers August 2012
Workflow and SharePoint Presented by Ben Geers. Overview What is workflow? Windows Workflow Foundation How does workflow apply to SharePoint? WSS v3 vs.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
SWE 316: Software Design and Architecture – Dr. Khalid Aljasser Objectives Lecture 11 : Frameworks SWE 316: Software Design and Architecture  To understand.
Towards a Provenance Architecture Karen Schuchardt PNNL.
Todd Kitta  Covenant Technology Partners  Professional Windows Workflow Foundation.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Office Business Applications Workshop Defining Business Process and Workflows.
Framework for MDO Studies Amitay Isaacs Center for Aerospace System Design and Engineering IIT Bombay.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
ModelPedia Model Driven Engineering Graphical User Interfaces for Web 2.0 Sites Centro de Informática – CIn/UFPe ORCAS Group Eclipse GMF Fábio M. Pereira.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
37 Copyright © 2007, Oracle. All rights reserved. Module 37: Executing Workflow Processes Siebel 8.0 Essentials.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Configuration Management CSCI 5801: Software Engineering.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Chapter 3: Introducing the UML
Component Patterns – Architecture and Applications with EJB copyright © 2001, MATHEMA AG Component Patterns Architecture and Applications with EJB Markus.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
SharePoint Workflow Prepared By: Eng. Rasha Farouk.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Workflow Management Concepts and Requirements For Scientific Applications.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Windows Workflow Foundation Guy Burstein Senior Consultant Advantech – Microsoft Division
© 2009 Artisan Software Tools. All rights reserved. Testing Solutions with UML/SysML Andrew Stuart, Matthew Hause.
Managing, Storing, and Executing DTS Packages
COM210 Introduction to Workflow in Windows Applications
Machine Independent Features
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
Building event-driven, long-running apps with Windows workflow
Presentation transcript:

Automatic Generation of Workflow Execution Provenance Roger S. Barga Database Group, Microsoft Research (MSR)

My interest in scientific workflow and provenance… In a previous life… Research Scientist, PNNL, DOE National Laboratory Research Scientist, PNNL, DOE National Laboratory Machine learning, pattern recognition over large data sets Machine learning, pattern recognition over large data sets Scientific experiment management system (EMSL) Scientific experiment management system (EMSL) Electronic laboratory notebook for experiment capture Electronic laboratory notebook for experiment capture More recently… Database Group, Microsoft Research in Redmond, WA Database Group, Microsoft Research in Redmond, WA ImmortalDB (ICDE’06, SIGMOD’06), Event Processing, Phoenix Extend commercial software to support scientific research Extend commercial software to support scientific research Tailor software for the sciences, provide free of charge Serve as a positive force in the community (Tony Hey) Practical value, challenging information management research issues…

Objectives for this initial effort Provenance capture that is automatic & transparent Should persist provenance data for a fixed period of time Support multiple levels of representation WF description  Logical log (o & p)  deviations  step-by-step trace. Version and lock the executables Efficient representation and management Opportunity to significantly reduce execution provenance storage costs An enactment engine for scientific workflows that documents all steps linking original inputs with final results so an experiment (execution) can be verified, reproduced or rerun

Issues NOT considered in our initial effort Annotations and provenance of the workflow Annotations and provenance of the workflow How to include external provenance How to include external provenance Evaluate our prototype on actual scientific workflows Evaluate our prototype on actual scientific workflows Provide query and analysis support over execution provenance traces… Focus on mechanism, implement something simple but useful, consider how to manage this virtual data product Provenance capture that is automatic & transparent Support multiple levels of representation Version and lock the executables Efficient representation and management

Types of Provenance to Capture in Workflow Execution Experiment Design Serialize the workflow schedule (XOML) Invocation Record Invocation of specific activities, events and rules Deviations from the defined schedule (shims, etc) Interaction Provenance Input variables, runtime parameters, activation inputs External services invoked, return value(s), etc Job Provenance Start/complete time, etc A workflow schedule sequential, event, rule driven An Activity What about internal state?

Architecture Overview Query and Management Interface (QMI) Provenance Storage Service Interface (PSI) Workflow Execution Provenance Storage Service (built using CLFS) Logical Logging Utility Problem Solving Environment Workflow Enactment Engine (WinWF) Client Query Library Management Routines Provenance Services Trace execution Difference analysis Reload runtime state … HPC Job Scheduler CreateJOB(XOML) ExecuteTask(JID, Act)

Implementation – extending base activity classes Activities are the basic building blocks They are the unit of execution, re-use and composition The root of entire workflow is itself an activity Composite activities contains other activities EG: Sequence, Parallel, Synchronize, Exclusive Choice, Merge,… Basic activities are steps within a workflow Activities are simply classes Properties and events are introduced to intercept and pass control to provenance capture service at runtime… Each class defines provenance persistence methods that are invoked by the workflow runtime

Workflow Execution My Experiment rt.StartWorkflow(typeof(WF1)); Instance Manager Persist Provenance 1 App calls StartWorkflow(…) WF1 Invoke1 2 Instance Manager: Loads workflow type Creates instance Enqueues WF1 with Scheduler 3 Scheduler dequeues WF1, serializes XOML calls Executor(SequentialWorkflow base) which enqueues Sequence Activity MyWF.dll Persist provenance to disk Execute until idle Create instance Execute Sequence Save SequentialWorkflow Execute Sequence Execute OnEvent1 WF1 Instance WF1 Scheduler Sequence OnEvent1 WF1 4 Dequeue Sequence & calls Executor which serializes ActRec and enqueues OnEvent1 Dequeue OnEvent1, serialize ActRec and call Executor which subscribes to event5 InstanceMgr calls Flush() on WF1 (Activity base class) to flush provenance records and gets back stream6 Instance Mgr call Provenance service passing serialized stream – Provenance Storage service saves to disk7 Base Activity Library Runtime Engine Runtime Services

Transparent Interception and Logical Logging...SEQUENCEActivityWorkflow Activity 1 Workflow Activity N Each activity is creating an operation history – a time serial stream of provenance records. Each record represents a change in operational state, such as sequence advancing, a synchronize or branch being taken, activities passing data via method calls. Replay of the log is an accurate repeated history of state changes, up to and including the “present” state Provenance Service “weaves” these records into the workflow XOML, recording LSNs for individual activities, insertions (shims), etc.

Host Process Workflow Foundation Provenance Capture Integrated into Runtime Engine and Services Base Activity Library, classes augmented with provenance capture My Experiment Runtime Services hosting flexibility - pluggable implementations (with defaults) Provenance Storage (PSI) Communication Tracking … Runtime Engine provides intrinsic behaviors to activities Tracking Infrastructure State Management Workflow Execution Provenance Management

Query Support (initial) Individual Workflow Execution Trace Display a graphical trace of the execution; Query for skipped steps, inserted steps, etc Query for the codes (activities) invoked. Query for machine execution stati Multiple Workflow Execution Traces Comparative trace (shallow, versus deep compare) Still “early days” for our query support over a workflow execution provenance trace store

An Issue to Consider … It may not possible to rerun experiment, to either validate or recreate a result because original workflow is lost (activities have been updated). Assign a version identifier (strong name) to the workflow assembly so it can be associated with the result; only retain if provenance is retained. Updating any activity in the workflow will change this version number, resulting in a new version being created. User is able to rerun the experiment by invoking workflow using fully-specified reference found in the provenance record;

Extended Windows Workflow Foundation Transparently capture execution trace leading to a result Towards a layered provenance model Initial query facility built over this provenance data This summer, evaluation and necessary extensions, analysis support Luciano Digiampietri (UniCamp/Brazil), project intern Tying provenance to code versioning In general, how to manage provenance data and code so the scientist simply doesn’t have to worry about it… An interesting data management challenge Provenance as a first class derived data item To Sum Up…

Closing Comments… Provenance presents many, many open questions, but offers so much potential… Execution provenance (sadly) is just the tip… Is this even provenance – where to draw the line? Shall we revel in complexity, or focus on the low-hanging fruit? Can’t we do both? Standards (agreements) on representation/protocols Try to reach a “tipping point” Welcome your feedback, suggestions and open to opportunities to collaborate on this problem…