Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
Scenario: EOT/EOT-R/COT Resident admitted March 10th Admitted for PT and OT following knee replacement for patient with CHF, COPD, shortness of breath.
Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Zhongxing Telecom Pakistan (Pvt.) Ltd
AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Distributed Systems Architectures
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Towards Automating the Configuration of a Distributed Storage System Lauro B. Costa Matei Ripeanu {lauroc, NetSysLab University of British.
1 Hyades Command Routing Message flow and data translation.
David Burdett May 11, 2004 Package Binding for WS CDL.
Introduction to Algorithms 6.046J/18.401J
1 Introducing the Specifications of the Metro Ethernet Forum MEF 19 Abstract Test Suite for UNI Type 1 February 2008.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt FactorsFactors.
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Break Time Remaining 10:00.
Turing Machines.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
An Application of Linear Programming Lesson 12 The Transportation Model.
Juan Gallegos November Objective Objective of this presentation 2.
Virtual Memory II Chapter 8.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
PSSA Preparation.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Essential Cell Biology
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
1 Perceptual Control Theory (PCT) And the On-Going Evolution of Culture Release 1.0 ©April 2008 F. T. Cloak, Jr.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute of Technology (ETHZ) The 2 nd IEEE International Conference on Autonomic Computing (UCAC-05) March 15th, 2008 Seo, Dongmahn

2/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

3/47 Contents Introduction Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion

4/47 Introduction Motivation Related Work Contribution

5/47 Motivation Workflow management systems e-commerce virtual laboratories DNA sequencing scientific computing Grid computing idea of process-based Web service composition

6/47 Motivation (cont.) Workflow engines open environment unknown workload difficult to choose a centralized solution a distributed implementation of the engine problem of configuring the system in an optimal way NOT feasible solution considering the number of parameters involved the variability of the workload having a system administrator in charge of manually monitoring reconfiguring the system

7/47 Related Work Decentralization of workflow process execution important area of research support business processes lead to higher scalability introduces several problems lack of a global view over the process scalability and reliability problems per se To address the problem GOLIAT,autonomic computing techniques, self-optimizing computer systems autonomic computing principles in the context of distributed workflow engines

8/47 Contribution Goal self-tuning self-configuration capabilities self-healing capabilities

9/47 Contribution (cont.) System extension to the JOpera engine Java based service composition tool combines a workflow engine with an open architecture to provide support for Web service composition, Grid computing and specialized workflow engines flexible architecture, components Key system modules can be replicated to handle large workloads. Other modules can be paired with a backup to achieve fault tolerance. The autonomic controller can be configured by selecting different reconfiguration strategies.

10/47 Contribution (cont.) the key contributions of the paper the novel system architecture generic can be adopted by many engines operating under different models and languages the resulting scalability and fault tolerance flexible enough to support the very large loads present in computational applications and large scale Web service composition the independence of the underlying workflow model easily extensible to support many different kinds of services

11/47 Contents Introduction System Background System Background System Architecture Autonomic Capabilities System evaluation Conclusion

12/47 System Background Requirements Workload Assumptions Deployment Environment

13/47 Requirements the workflow execution engine to support autonomic behavior must feature self-configuration, self-tuning and self healing capabilities Self-configuration switching the systems configuration on the fly without manual intervention and disrupting the system requires the workflow execution engine to support dynamically and efficiently change the configuration

14/47 Requirements (cont.) self-tuning system reconfiguration to optimal given the current workload the workflow engine must give access to its internal state control algorithms can analyze current and past performance information to plan configuration changes in respose to the current workload assumption the characteristics of the workload affect the systems performance the self-tuning algorithm can optimally adapt the system to the workload by monitoring key performance indicators

15/47 Requirements (cont.) self-healing able to detect configuration changes due to external events failures of nodes recovery action requires mechanisms for detecting failures and configuration changes of the cluster to query the workflow execution state

16/47 Workload Assumptions the workload is assumed to be a collection of concurrent workflow processes a worst case scenario not deal with workload prediction issues future work

17/47 Deployment Environment [Assumption] JOpera runs on a dedicated cluster of computers can use these resources exclusively main goal of the autonomic features to ensure the optimal configuration of the cluster efficient resource utilization good allocation of the available nodes to the different system components cluster configuration is NOT static the system could be extended to use shared nodes that are also used for other purposes.

18/47 Contents Introduction System Background System Architecture System Architecture Autonomic Capabilities System evaluation Conclusion

19/47 System Architecture Workflow Execution Distributed Workflow Execution Scalable Workflow Execution

20/47 Workflow Execution Workflow processes model interactions btw different tasks by defining the data flow and control flow btw them

21/47 Distributed Workflow Execution

22/47 Scalable Workflow Execution scalability bottleneck use several layers of caching btw tuple space and threads producing and consuming tuples

23/47 Contents Introduction System Background System Architecture Autonomic Capabilities Autonomic Capabilities System evaluation Conclusion

24/47 Autonomic Capabilities Self-Tuning Information Strategy Optimization Strategy Selection Strategy Self-Configuration Reconfiguration Actions Self-Healing

25/47 Self-tuning Information Strategy detect imbalances in the systems configuration to sample the current space size Optimization Strategy to establish a configuration such that the number of navigator and dispatcher threads is balanced Selection Strategy prioritizing nodes according to how well suited they are for a configuration change

26/47 Self-Configuration a closed feedback-loop controller Reconfiguration Actions Starting Threads the JOpera API Stopping Navigator Threads migrating the state of the processes the navigator thread is working on and redirecting associated events by flushing the locally cached state into the global tuple space

27/47 Self-Configuration (cont.) Stooping Dispatcher Threads more difficult task may involve the invocation of a local application or the interaction with a remote service provider on the Web metadata kill method immediately stops all active task executions ensures all task invocations will be repeated on a differend dispatcher thread stop method immediately ceases to take tuples from the task space

28/47 Self-Healing periodically monitors the nodes of the cluster Handling Dispatcher Thread Failures the task that were managed by it are lost and have to be restarted very similar to self-configuration component kills a dispatcher Handling Navigator Thread Failures the state of the execution of the process is still the available in the global process execution state space simply removing their entries in the tuple routing table which point to the failed navigator

29/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation System evaluation Conclusion

30/47 System evaluation Experimental Setup Base line Autonomic Behavior Self-Configuration Reconfiguration Overhead Self-Healing Discussion

31/47 Experimental Setup a cluster of up to 20 nodes 1.0GHz dual P-III, 1GB of RAM, Linux (Kernel version ) and Suns Java Development Kit version one additional node the global tuple space server IBMs T-Spaces v2.1.3

32/47 Base Line two different workloads 1000 concurrent processes containing 10 parallel tasks of duration of 0 seconds (workload 0) 1000 processes containing 10 parallel tasks of duration of 20 seconds (workload 20) total 15 nodes 14 navigators and 1 dispatcher up to 14 dispatchers and 1 navigator

33/47 Base Line (cont.)

34/47 Base Line (cont.)

35/47 Autonomic Behavior Self-Configuration

36/47 Autonomic Behavior (cont.)

37/47 Autonomic Behavior (cont.)

38/47 Autonomic Behavior (cont.) Reconfiguration Overhead

39/47 Self-Healing initially to use 15 nodes to replace 5 of the nodes assigned workload consists of four peaks of 500 processes occurring every 100 seconds each of the processes consist of 10 parallel tasks of 10 seconds duration change nodes grow to 20 nodes at t=90 reduced by 5 nodes at t = 140 again by 5 nodes at t=230

40/47 Self-Healing (cont.)

41/47 Self-Healing (cont.)

42/47 Self-Healing (cont.)

43/47 Self-Healing (cont.)

44/47 Discussion to find an optimal static configuration for a given workload very difficult different characteristics lead to different optimal configurations autonomic controller was able to adapt the configuration of the workflow engine according to the variable characteristics of the workload self-healing experiment common situation in the lifetime of a cluster-based system

45/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion Conclusion

46/47 Conclusion the design of an autonomic workflow engine demonstrated its self-managing behavior and evaluated its performance show how to apply the autonomic computing paradigm to greatly simplify the deployment and the maintenance of such systems homogeneous workload more complex characteristics as part of future work

47/47