A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France MTA SZTAKI, Budapest, Hungary Associated Teacher University of Vigo, Spain MTA SZTAKI Budapest, Hungary IRISA Rennes, France
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 2 Outline of the Presentation 1.Introduction Scientific Workflows The Chemical Computation Model 2. Proposal The Scientific Workflow Language The Chemical Workflow Engine Dynamicity Support 3. Validation 4. Conclusions and Future Works
3 1. Introduction This work has been performed in the context of the CoreGRID Excellence Network IRISA (Rennes): December 2007 – March 2008 SZTAKI (Budapest): April 2008 – August 2008 VIGO RENNES BUDAPEST Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support
4 1. Introduction: Scientific Workflows Scientific applications and experiments involve: Large number of operations Large data sets Complex algorithms Earth Sciences Biology Medical Image Analysis Astronomy Wheather Prediction Sub-atomic Physics
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 5 1. Introduction: Scientific Workflows Dynamicity is intrinsic to Scientific Workflows Scientists usually introduce modifications and variations in their experiments Scientific workflows are not always completely specified Data is known dynamically during execution Data is distributed and mobile The resources are not fixed, but they change during workflow execution
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 6 1. Introduction: Scientific Workflows Dynamicity Requirements (1/2) –Monitoring To observe the progress of the workflow To obtain the partial and final results –Automatic Control To support the detection of errors, problems To support the control of data values and events –Reproducibility To enable the reproduction of the execution It is important to validate the results –Smart “re-runs” To be able to re-start at an already performed stage –Version Management To support and distinguish different “attempts”
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 7 1. Introduction: Scientific Workflows Dynamicity Requirements (2/2) –User steering VCR-like: pause, play, roll-back, etc. Checkpoints –User Manipulation To be able to change the abstract workflow descriptions To be able to change the data and the parameters –Adaptation in the Workflow Language Controlled change of workflows Parametric studies –Adaptation in the Workflow Management System Support execution with different resources Support changes in task assignment to resources and services’ instances User Driven Autonomous
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 8 1. Introduction: The Chemical Computation Model Main Idea: Computation as chemical reactions Programs are conceived as chemical solutions involving a set of molecules of different types that react among them in accordance with specific reaction conditions and actions
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 9 1. Introduction: The Chemical Computation Model Molecule types: –Variables (data) –Reaction conditions and Actions (instructions) –Molecule Aggregations (pairs) –Solutions A solution is a container of molecules where chemical computations can be produced Computation: 1.A molecule with a reaction condition “matches” another molecule (or set of molecules) that satisfies its condition 2.The molecules react and the actions are performed –The matched molecules are consumed –New molecules are created 3.Return to step 1 until the solution is inert
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Introduction: The Chemical Computation Model An example: Compute the maximum value of a set of numbers –Chemical solution: Numbers: 1, 2, 7, 8, 9 Reaction condition and action: Match x, y; if x>y then replace x, y by x 1 Passive Molecule Numbers Chemical Solution Active Molecule Reaction condition and action
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Introduction: The Chemical Computation Model Main properties of the chemical computation model: Inherently concurrent Natural parallelism. No serialization is imposed Non determinism
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal Goal: To develop a workflow engine for scientific applications based on the chemical computation model and supporting dynamicity Steps: The Scientific Workflow Language The Chemical Workflow Engine The Support of Dynamicity
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language No General Accepted Scientific Workflow Language: There exists several languages Two main approaches: control-flow and data-flow Specific data operators: o SCUFL: one-to-one, all-to-all o ASKALON: large data set loops Solution Adopted: To propose a new workflow language involving the more common constructs
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language Main Features: It is an extension to Event-driven Process Chains (EPCs) Events represent the state Data Elements are related to Events (Inputs and outputs of Functions) Resources are used to process Functions Connector Types: AND/OR/XOR-split/Join, Sub-process, Loops, Data- Loops, O2O, A2A Function Connector Event Data Element Resource
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language LAPW0 Data-LOOP-split Init R1 Event1 LAPW1-K1 Event21 Event31 LAPW1-K2 Event22 Event32 LAPW1-Kn Event2n Event3n Data-LOOP-join R2 Data1 Data21 Data31 An Example: The VIEM workflow from ASKALOM
2. Proposal: The Chemical Workflow Engine Two main kinds of molecules: Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 16 Function Connector Event Data Element Resource Active MoleculesPassive Molecules Function + Event + Data Element(s) + Resource(s) Event + Data Element(s) + Resource(s) Connector + Event(s) + Data Element(s) Event(s) + Data Element(s)
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Functions evolve through 4 states: Disabled: a function not activated, not matched the input Event Enabled: not matched the input Data Elements Ready: not assigned to appropriate Resources Initiated: the function that is being performed Each state is represented by a different molecule Disabled Function + Event Disabled Function + Enabled Function Enabled Function + Data Element(s) Ready Function + Data Element(s) Ready Function + Resource(s) Initiated Function Initiated Function Event + Data Element(s) + Resource(s)
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Disabled Functions Disabled Connectors Events Data Elements Enabled Function Ready Function Resources Initiated Function Event Data Element Resource Chemical Solution Disabled Enabled Ready Initiated
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Connectors evolve through 2 states: Disabled: a connector not activated, not matched the input Event(s) Enabled: not matched the input Data Elements Each state is represented by a different molecule Disabled Connector+ Event(s) Disabled Connector + Enabled Connector Enabled Connector + Data Elements Event(s) + Data Elements
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support An HOCL Workflow Engine Disabled Functions Disabled Connectors Events Data Elements + 1 Connector Resources F.A Ev.1 D.A.1..n Resource Chemical Solution Data One-to-One Connector F.A + F.B Data A. 1,2, …, N Data B. 1,2, …, N Data C. 1,2, …, N Ev.1Ev.2 Ev.3.1 … 3.N F.B Ev.2 D.B.1..n Resource + Connector + 2 Connector + N Connector Data A.1 Data B.1 Data C.1 Ev.3.1 F.C
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Structure of the Chemical Workflow Engine: Separated in 4 sub-solutions: one for each state Transfer of molecules among sub-solutions Operations in the Workflow Engine: Compilation: the molecules representing the Disabled Functions and Connectors corresponding to the process definition are introduced Data Population: the molecules representing the Input Data Elements related with a case are introduced Resource Population: the molecules representing the available Resources are introduced Instance Creation: the molecules representing the initial Events are introduced
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Input Data CompilationData Population Instance Creation Resource Population
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Identifiers: Element Identifier: distinguishes among the several elements included in a process specification. Process Schema Identifier: distinguishes among process specifications. It has two parts: a process number and a version number. Included in Functions, Connectors and Events. Instance Identifier: distinguishes among the several instances. It includes a thread identifier (numbered Data Elements). Included in Events and Data Elements and also in Functions and Connectors in states Enabled, Ready and Initiated. Molecules can be matched if their Process Schema Identifier and Instance Identifier are the same
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: Dynamicity Support Dynamicity is supported in several ways: A workflow specification can be modified by changing the Functions and Connectors contained in the disabled sub-solution. The distinction between Event and Data Element molecules enables to separate the workflow specification from the data to be processed. Several workflow instances can be initiated and executed in parallel. Disabled molecules are not eliminated. The availability of Event molecules enables to develop a steering facility. Data Element molecules are not eliminated. This enables the development of monitoring, “smart re-runs” and provenance solutions.
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: Dynamicity Support Addendums to the Identifiers: Addendum to the Process Schema Identifier Enables to use modifying versions of an existing process specification just by including the new molecules. Addendum to the Instance Identifier Enables to use the data of another instance execution. We support the 13 change patterns proposed in [18]:
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Validation Developed in CLIPS: CLIPS provides an environment for the construction of rule-based expert systems CLIPS programming is performed by assertions and rules Assertions are used to are used to maintain information Rules specify a certain action to be performed when a conditions is satisfied To validate the CWE we used two kinds of assertions and specific rules: Active molecule assertions of two types (Function and Connector) and four possible states (Disabled, Enabled, Ready, Initiated) Passive molecule assertions of three types (Event, Data Element and Resource)
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Conclusions Summary: Scientific workflows are gaining a great momentum Dynamicity is an intrinsic need in scientific workflows A workflow engine based on the Chemical Computation Model has been conceived supporting dynamicity needs Scientific Workflow Chemical Workflow Engine CLIPS Future Work: To provide an actual validation
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Conclusions Opportunities from the Chemical Computation Model: It is parallel in nature: it facilitates the distribution of computations parallelization is obtained in a transparent way Workflows can be specified in the same way Execution of workflows is automatically parallelized Change of the role of resources: –Central “chemical solution” vs. central Workflow engine –Pull-oriented vs. Push-oriented
Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 29 Questions and Comments are welcome!!!