Workflow Management Chris A. Mattmann OODT Component Working Group.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern.
Raptor Technical Details. Outline Workshop structured by Raptor workflow – Raptor Event model. – ICA log file parsing – ICA/MUA event storage – ICA event.
File Management Chris A. Mattmann OODT Component Working Group.
Apache Struts Technology
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Chapter 5: Bend or Break.  Write “shy” code.  Limits Visibility  Organize code into modules  Eliminates Unnecessary Interactions  Limit the number.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Object-Oriented Enterprise Application Development Tomcat 3.2 Configuration Last Updated: 03/30/2001.
Memory Management, File Systems, I/O How Multiprogramming Issues Mesh ECEN 5043 Software Engineering of Multiprogram Systems University of Colorado, Boulder.
Peoplesoft: Building and Consuming Web Services
Process-oriented System Automation Executable Process Modeling & Process Automation.
CVSQL 2 The Design. System Overview System Components CVSQL Server –Three network interfaces –Modular data source provider framework –Decoupled SQL parsing.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
© Drexel University Software Engineering Research Group (SERG) 1 Based on the paper by Philippe Kruchten from Rational Software.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
5/5/2005Toni Räikkönen Internet based data collection from enterprises using XML questionnaires and XCola engine CoRD Meeting May 11th 2005.
Dynamic Data Exchanges with the Java Flow Processor Presenter: Scott Bowers Date: April 25, 2007.
An Introduction to Software Architecture
Welcome to CIS 083 ! Events CIS 068.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Event Driven Programming
Design Patterns Phil Smith 28 th November Design Patterns There are many ways to produce content via Servlets and JSPs Understanding the good, the.
The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante National Center for Supercomputing Applications.
© 2006 IBM Corporation IBM WebSphere Portlet Factory Architecture.
J2EE Structure & Definitions Catie Welsh CSE 432
Agenda 1.Implementation of CustomerService. CustomerService wrapper SOAP → ESB internal format Abstract → Concrete XML syntax ESB internal format → HTTP.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
SOFTWARE DESIGN (SWD) Instructor: Dr. Hany H. Ammar
CYBORG Domain Independent Distributed Database Retrieval System Alok Khemka Kapil Assudani Kedar Fondekar Rahul Nabar.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
BIT 286: Web Applications Software Design Documents.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CS333 Intro to Operating Systems Jonathan Walpole.
1 CMPT 275 High Level Design Phase Modularization.
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
Model View Controller MVC Web Software Architecture.
CSC480 Software Engineering Lecture 10 September 25, 2002.
Behavioral Patterns CSE301 University of Sunderland Harry R Erwin, PhD.
DØ Offline Reconstruction and Analysis Control Framework J.Kowalkowski, H.Greenlee, Q.Li, S.Protopopescu, G.Watts, V.White, J.Yu.
Service Proforma Middleware Workshop. Notes Please complete as much of this proforma as possible – it will help make the workshop more informative & productive.
Random Logic l Forum.NET l State Machine Mechanism Forum.NET 1 st Meeting ● December 27, 2005.
Application Web Service Toolkit Allow users to quickly add new applications GGF5 Edinburgh Geoffrey Fox, Marlon Pierce, Ozgur Balsoy Indiana University.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
V7 Foundation Series Vignette Education Services.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Chapter 5:Architectural Design l Establishing the overall structure of a software.
(on behalf of the POOL team)
z/Ware 2.0 Technical Overview
Data Bridge Solving diverse data access in scientific applications
Designing For Testability
Behavioral Design Patterns
CS399 New Beginnings Jonathan Walpole.
Pentaho Reporting – Citrus edition
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Chapter 2: Database System Concepts and Architecture
GENERAL VIEW OF KRATOS MULTIPHYSICS
Analysis models and design models
An Introduction to Software Architecture
Overview of Workflows: Why Use Them?
Applying Use Cases (Chapters 25,26)
Applying Use Cases (Chapters 25,26)
Software Development Process Using UML Recap
GGF10 Workflow Workshop Summary
Prof. Onur Mutlu Carnegie Mellon University
Presentation transcript:

Workflow Management Chris A. Mattmann OODT Component Working Group

What is Workflow Management? Modeling, executing and monitoring groups of one or more Workflow Tasks Tasks could be –A script file –A java process –An external command –A call to a web service –Many more…

Workflow Workflow has many definitions –It’s typically represented as a graph –In traditional science data pipeline systems, this graph is constrained to be a sequential set of process nodes –Taxonomy of Workflow Management Systems –Workflow Patterns

The State of Things The existing CAS was able to handle sequential science data pipelines very well –It handles them as a set of individual tasks that are mapped to a product type –Tasks are kicked off on ingestion of a product Or by other tasks However, the approach and process to executing pipelines and tasks was ad-hoc –Task can kick off another task, but by communicating directly with the database to insert its “id” in the “next task” table –Tasks are only grouped by product type, so you need to have a product type to have a group of associated tasks Additionally, the approach didn’t allow for parallel execution of tasks –Tasks were put into a global queue Also tasks from different “workflows” can compete against one another because the queue is global Also control patterns are ad-hoc, does not support standard control flow

New Requirements and Drivers Workflow should be represented as a graph. This will allow for true parallelism. Workflow Management should support identified workflow patterns especially control-flow. workflow patterns –The current level of support for control-flow has to a large extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution. Data-flow should be captured. The workflow should be able to minimally hook together input and output streams between tasks. Workflow need not have any interaction with a database –What if I want to persist a workflow in XML? –Or as a flat file, or some other lightweight format

New Requirements and Drivers You can read/add to the list –Available at: low+Management low+Management Please, speak your mind!

Architectural Implications Workflow Repositories –Places to go and fetch and “abstract” workflow description from Workflow Execution Engines –Give it an abstract workflow, and let it rip Turns an abstract workflow into a “Workflow Instance” –Should allow monitoring of the workflow instance System interface –Associate abstract workflows with “events” –This way, workflows can be tied to things other than just product ingestion

Workflow Data Structures

Workflow Repository

How is this different from the existing CAS? The Workflow Repository need not be a relational Database –It could be a flat file –A (set of) XML file(s) –An object database –Factories create Workflow Repositories, which create Workflows Tasks are associated with “Workflows”, not “Product Types” –This decouples workflow from the File Management aspects of the CAS Conditions can be pre, or post –As opposed to the existing CAS where “Rules” are effectively pre-conditions on a task, and there is no concept of a post condition

How is this different from the existing CAS? Workflows are interfaces –They could be backed by a (directed graph), or by an iterator (i.e., a sequential pipeline) or by a HashMap Workflow Tasks have clearly separated out dynamic and static metadata, and they can share metadata –Dynamic metadata is passed via the Workflow Engine between all the tasks in a workflow They can all read/write to it –Static metadata is associated with each workflow task Workflow Events are captured and delivered via Workflow Listeners, which are interfaces –Many different backend implementations of Workflow Listeners

Workflow Execution Once you’ve got a Workflow, how do you execute it and turn it into a Workflow Instance? You hand it off to a Workflow Engine

Workflow Engine

What does the Workflow Engine do? Workflow Engine manages: –A configurable, extensible thread pool “Worker Threads” are used to process the Workflow Instance they are each handed –A queue of worker threads if they aren’t any available workers in the thread pool to process a Workflow –Monitoring which Workers are handling which Workflow Instances, and the state and status of each Workflow Instance Workflow Engines execute instances of Workflows

What’s the external interface to the system? Event-based –Event names come into the Workflow Manager –The Workflow Manager looks up any Workflows associated with the event name –The Workflow Manager then calls the Workflow Repository to obtain representations of the Workflow –The Workflow Manager then hands off Workflow representations to the Workflow Engine for execution Current implementation uses XML-RPC, but it’s an interface, so it could use REST/HTTP/SOAP/etc.

The Workflow Manager So, how do we put all of these things together? Well, something like: –A Workflow Manager has One or more Workflow Repositories to obtain abstract Workflow descriptions from One or more Workflow Engines to execute Workflows on One or more external interfaces

What’s implemented so far The basic components of the architecture Several implementations of the interfaces –DataSourceBased WorkflowEngine backed by a ThreadPooling infrastructure provided by Doug Lea’s java.util.concurrent package –DataSourceBased WorkflowRepository –Iterative Workflow Processor Thread, and Iterative Workflow Instance Model –External XML-RPC interface

What needs to be done? A lot! –Check out and log in with your JPL Username and Password. Navigate to “SVN”, and check out the cas- workflow component. –Modify the code –Look for bugs –Contribute! I find new bugs everyday –Feel free to talk to me about it –Create issues in JIRA ( Bug Fixes, RFIs, new features, you name it! Be sure to check out the apidocs –You can build these yourself by checking out cas-filemgr from our SVN repository, and then typing: maven site –Or you can visit: workflow/ workflow/

Questions?