A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France.

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

Control-theory and models at runtime Pierre-Alain Muller 1, Olivier Barais 2, Franck Fleurey 2 1 Université de Haute-Alsace Mulhouse, France 2 IRISA /
CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.
P. Kacsuk, G. Sipos, A. Toth, Z. Farkas, G. Kecskemeti and G. Hermann P. Kacsuk, G. Sipos, A. Toth, Z. Farkas, G. Kecskemeti and G. Hermann MTA SZTAKI.
Test Logging and Automated Failure Analysis Why Weak Automation Is Worse Than No Automation Geoff Staneff
ISBN Chapter 3 Describing Syntax and Semantics.
1 Roles as a Coordination Construct: Introducing powerJava MTCoord'05, Namur, MTCoord'05 Roles as Coordination Construct: Introducing powerJava.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Software Connectors. Attach adapter to A Maintain multiple versions of A or B Make B multilingual Role and Challenge of Software Connectors Change A’s.
NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.
UvA, Amsterdam June 2007WS-VLAM Introduction presentation WS-VLAM Requirements list known as the WS-VLAM wishlist System and Network Engineering group.
About the Data-Flow Complexity of Web Processes Jorge Cardoso Department of Computer Science, University of Madeira Funchal Portugal.
ICS (072)Database Systems Background Review 1 Database Systems Background Review Dr. Muhammad Shafique.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
Describing Syntax and Semantics
1 Joint work with Antonio Bucchiarone (Fondazione Bruno Kessler - IRST, Trento) and Fabrizio Montesi (University of Bologna/INRIA, Bologna) A Framework.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 15 Slide 1 Real-time Systems 1.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
University of Kansas Electrical Engineering Computer Science Jerry James and Douglas Niehaus Information and Telecommunication Technology Center Electrical.
Scientific Workflow reusing and long term big data preservation Salima Benbernou Université Paris Descartes Project.
Enabling Workflow in UPnP Networks Andreas BobekUniversity of Rostock Faculty of Computer Science and Electrical Engineering Andreas Bobek, Hendrik Bohn,
An Introduction to Software Architecture
CS 360 Lecture 3.  The software process is a structured set of activities required to develop a software system.  Fundamental Assumption:  Good software.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
Programming in Java Unit 3. Learning outcome:  LO2:Be able to design Java solutions  LO3:Be able to implement Java solutions Assessment criteria: 
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Distribution of Student Mistakes between Three Stages of Solution Steps in Case of Action-Object-Input Solution Scheme Dmitri Lepp University of Tartu.
Introduction CS 3358 Data Structures. What is Computer Science? Computer Science is the study of algorithms, including their  Formal and mathematical.
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
Chapter 1 Program design Objectives To describe the steps in the program development process To introduce the current program design methodology To introduce.
Introduction to Software Engineering. Why SE? Software crisis manifested itself in several ways [1]: ◦ Project running over-time. ◦ Project running over-budget.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Sheet 1 DocEng’03, Grenoble, November 2003 Model Driven Architecture based XML Processing Ivan Kurtev, Klaas van den Berg University of Twente, the Netherlands.
An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.
Chapter 6: THE EIGHT STEP PROCESS FOCUS: This chapter provides a description of the application of customer-driven project management.
Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.
Declarative Programming in Java using JSetL E. PanegaiG. Rossi Dipartimento di Matematica Università di Parma Roma, Giugno 2005 Convegno Italiano.
12 Chapter 12: Advanced Topics in Object-Oriented Design Systems Analysis and Design in a Changing World, 3 rd Edition.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
CS 1120: Computer Science II Software Life Cycle Slides courtesy of: Prof. Ajay Gupta and Prof. James Yang (format and other minor modifications by by.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 4 Slide 1 Software Processes.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.
Software Development Process CS 360 Lecture 3. Software Process The software process is a structured set of activities required to develop a software.
Chapter 4 Motor Control Theories Concept: Theories about how we control coordinated movement differ in terms of the roles of central and environmental.
T EST T OOLS U NIT VI This unit contains the overview of the test tools. Also prerequisites for applying these tools, tools selection and implementation.
Towards a Benchmark for the Evaluation of LD Expressiveness and Suitability Manuel Caeiro Rodríguez
Science and Engineering Practices K–2 Condensed Practices3–5 Condensed Practices6–8 Condensed Practices9–12 Condensed Practices Developing and Using Models.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
Banaras Hindu University. A Course on Software Reuse by Design Patterns and Frameworks.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
A Chemical Approach to Distributed Computing Zsolt Németh MTA SZTAKI Computer and Automation Research Institute/CoreGrid Christian Pérez, Thierry Priol.
CHAPTER 1 Introduction BIC 3337 EXPERT SYSTEM.
Chapter 1 Introduction: Themes in the Study of Life
Object-Oriented Analysis and Design
Bulgarian Academy of Sciences
Chapter 7: Introduction to CLIPS
Introduction to Classes and Objects
[jws13] Evaluation of instance matching tools: The experience of OAEI
An Introduction to Software Architecture
Laura Bright David Maier Portland State University
Overview of Workflows: Why Use Them?
Quality-aware Middleware
Presentation transcript:

A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Manuel Caeiro Zsolt Nemeth Thierry Priol CoreGRID Post Doc IRISA, Rennes, France MTA SZTAKI, Budapest, Hungary Associated Teacher University of Vigo, Spain MTA SZTAKI Budapest, Hungary IRISA Rennes, France

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 2 Outline of the Presentation 1.Introduction Scientific Workflows The Chemical Computation Model 2. Proposal The Scientific Workflow Language The Chemical Workflow Engine Dynamicity Support 3. Validation 4. Conclusions and Future Works

3 1. Introduction This work has been performed in the context of the CoreGRID Excellence Network IRISA (Rennes): December 2007 – March 2008 SZTAKI (Budapest): April 2008 – August 2008 VIGO RENNES BUDAPEST Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support

4 1. Introduction: Scientific Workflows Scientific applications and experiments involve: Large number of operations Large data sets Complex algorithms Earth Sciences Biology Medical Image Analysis Astronomy Wheather Prediction Sub-atomic Physics

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 5 1. Introduction: Scientific Workflows Dynamicity is intrinsic to Scientific Workflows Scientists usually introduce modifications and variations in their experiments Scientific workflows are not always completely specified Data is known dynamically during execution Data is distributed and mobile The resources are not fixed, but they change during workflow execution

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 6 1. Introduction: Scientific Workflows Dynamicity Requirements (1/2) –Monitoring To observe the progress of the workflow To obtain the partial and final results –Automatic Control To support the detection of errors, problems To support the control of data values and events –Reproducibility To enable the reproduction of the execution It is important to validate the results –Smart “re-runs” To be able to re-start at an already performed stage –Version Management To support and distinguish different “attempts”

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 7 1. Introduction: Scientific Workflows Dynamicity Requirements (2/2) –User steering VCR-like: pause, play, roll-back, etc. Checkpoints –User Manipulation To be able to change the abstract workflow descriptions To be able to change the data and the parameters –Adaptation in the Workflow Language Controlled change of workflows Parametric studies –Adaptation in the Workflow Management System Support execution with different resources Support changes in task assignment to resources and services’ instances User Driven Autonomous

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 8 1. Introduction: The Chemical Computation Model Main Idea: Computation as chemical reactions Programs are conceived as chemical solutions involving a set of molecules of different types that react among them in accordance with specific reaction conditions and actions

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 9 1. Introduction: The Chemical Computation Model Molecule types: –Variables  (data) –Reaction conditions and Actions  (instructions) –Molecule Aggregations  (pairs) –Solutions A solution is a container of molecules where chemical computations can be produced Computation: 1.A molecule with a reaction condition “matches” another molecule (or set of molecules) that satisfies its condition 2.The molecules react and the actions are performed –The matched molecules are consumed –New molecules are created 3.Return to step 1 until the solution is inert

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Introduction: The Chemical Computation Model An example: Compute the maximum value of a set of numbers –Chemical solution: Numbers: 1, 2, 7, 8, 9 Reaction condition and action: Match x, y; if x>y then replace x, y by x 1 Passive Molecule Numbers Chemical Solution Active Molecule Reaction condition and action

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Introduction: The Chemical Computation Model Main properties of the chemical computation model: Inherently concurrent Natural parallelism. No serialization is imposed Non determinism

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal Goal: To develop a workflow engine for scientific applications based on the chemical computation model and supporting dynamicity Steps: The Scientific Workflow Language The Chemical Workflow Engine The Support of Dynamicity

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language No General Accepted Scientific Workflow Language: There exists several languages Two main approaches: control-flow and data-flow Specific data operators: o SCUFL: one-to-one, all-to-all o ASKALON: large data set loops Solution Adopted: To propose a new workflow language involving the more common constructs

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language Main Features: It is an extension to Event-driven Process Chains (EPCs) Events represent the state Data Elements are related to Events (Inputs and outputs of Functions) Resources are used to process Functions Connector Types: AND/OR/XOR-split/Join, Sub-process, Loops, Data- Loops, O2O, A2A Function Connector Event Data Element Resource

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Scientific Workflow Language LAPW0 Data-LOOP-split Init R1 Event1 LAPW1-K1 Event21 Event31 LAPW1-K2 Event22 Event32 LAPW1-Kn Event2n Event3n Data-LOOP-join R2 Data1 Data21 Data31 An Example: The VIEM workflow from ASKALOM

2. Proposal: The Chemical Workflow Engine Two main kinds of molecules: Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 16 Function Connector Event Data Element Resource Active MoleculesPassive Molecules Function + Event + Data Element(s) + Resource(s)  Event + Data Element(s) + Resource(s) Connector + Event(s) + Data Element(s)  Event(s) + Data Element(s)

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Functions evolve through 4 states: Disabled: a function not activated, not matched the input Event Enabled: not matched the input Data Elements Ready: not assigned to appropriate Resources Initiated: the function that is being performed Each state is represented by a different molecule Disabled Function + Event  Disabled Function + Enabled Function Enabled Function + Data Element(s)  Ready Function + Data Element(s) Ready Function + Resource(s)  Initiated Function Initiated Function  Event + Data Element(s) + Resource(s)

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Disabled Functions Disabled Connectors Events Data Elements Enabled Function Ready Function Resources Initiated Function Event Data Element Resource Chemical Solution Disabled  Enabled  Ready  Initiated

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Connectors evolve through 2 states: Disabled: a connector not activated, not matched the input Event(s) Enabled: not matched the input Data Elements Each state is represented by a different molecule Disabled Connector+ Event(s)  Disabled Connector + Enabled Connector Enabled Connector + Data Elements  Event(s) + Data Elements

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support An HOCL Workflow Engine Disabled Functions Disabled Connectors Events Data Elements + 1 Connector Resources F.A Ev.1 D.A.1..n Resource Chemical Solution Data One-to-One Connector F.A + F.B Data A. 1,2, …, N Data B. 1,2, …, N Data C. 1,2, …, N Ev.1Ev.2 Ev.3.1 … 3.N F.B Ev.2 D.B.1..n Resource + Connector + 2 Connector + N Connector Data A.1 Data B.1 Data C.1 Ev.3.1 F.C

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Structure of the Chemical Workflow Engine: Separated in 4 sub-solutions: one for each state Transfer of molecules among sub-solutions Operations in the Workflow Engine: Compilation: the molecules representing the Disabled Functions and Connectors corresponding to the process definition are introduced Data Population: the molecules representing the Input Data Elements related with a case are introduced Resource Population: the molecules representing the available Resources are introduced Instance Creation: the molecules representing the initial Events are introduced

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Input Data CompilationData Population Instance Creation Resource Population

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: The Chemical Workflow Engine Identifiers: Element Identifier: distinguishes among the several elements included in a process specification. Process Schema Identifier: distinguishes among process specifications. It has two parts: a process number and a version number. Included in Functions, Connectors and Events. Instance Identifier: distinguishes among the several instances. It includes a thread identifier (numbered Data Elements). Included in Events and Data Elements and also in Functions and Connectors in states Enabled, Ready and Initiated. Molecules can be matched if their Process Schema Identifier and Instance Identifier are the same

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: Dynamicity Support Dynamicity is supported in several ways: A workflow specification can be modified by changing the Functions and Connectors contained in the disabled sub-solution. The distinction between Event and Data Element molecules enables to separate the workflow specification from the data to be processed. Several workflow instances can be initiated and executed in parallel. Disabled molecules are not eliminated. The availability of Event molecules enables to develop a steering facility. Data Element molecules are not eliminated. This enables the development of monitoring, “smart re-runs” and provenance solutions.

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Proposal: Dynamicity Support Addendums to the Identifiers: Addendum to the Process Schema Identifier Enables to use modifying versions of an existing process specification just by including the new molecules. Addendum to the Instance Identifier Enables to use the data of another instance execution. We support the 13 change patterns proposed in [18]:

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Validation Developed in CLIPS: CLIPS provides an environment for the construction of rule-based expert systems CLIPS programming is performed by assertions and rules Assertions are used to are used to maintain information Rules specify a certain action to be performed when a conditions is satisfied To validate the CWE we used two kinds of assertions and specific rules: Active molecule assertions of two types (Function and Connector) and four possible states (Disabled, Enabled, Ready, Initiated) Passive molecule assertions of three types (Event, Data Element and Resource)

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Conclusions Summary: Scientific workflows are gaining a great momentum Dynamicity is an intrinsic need in scientific workflows A workflow engine based on the Chemical Computation Model has been conceived supporting dynamicity needs Scientific Workflow  Chemical Workflow Engine  CLIPS Future Work: To provide an actual validation

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support Conclusions Opportunities from the Chemical Computation Model: It is parallel in nature: it facilitates the distribution of computations  parallelization is obtained in a transparent way Workflows can be specified in the same way Execution of workflows is automatically parallelized Change of the role of resources: –Central “chemical solution” vs. central Workflow engine –Pull-oriented vs. Push-oriented

Manuel Caeiro / A Chemical Workflow Engine for Scientific Workflows with Dynamicity Support 29 Questions and Comments are welcome!!!