Download presentation
Presentation is loading. Please wait.
Published byMorgan Ortega Modified over 11 years ago
1
Design and Evaluation of an Autonomic Workflow Engine Thomas Heinis, Cesare Pautasso, Gustavo Alsonso Dept. of Computer Science Swiss Federal Institute of Technology (ETHZ) The 2 nd IEEE International Conference on Autonomic Computing (UCAC-05) March 15th, 2008 Seo, Dongmahn
2
2/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion
3
3/47 Contents Introduction Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion
4
4/47 Introduction Motivation Related Work Contribution
5
5/47 Motivation Workflow management systems e-commerce virtual laboratories DNA sequencing scientific computing Grid computing idea of process-based Web service composition
6
6/47 Motivation (cont.) Workflow engines open environment unknown workload difficult to choose a centralized solution a distributed implementation of the engine problem of configuring the system in an optimal way NOT feasible solution considering the number of parameters involved the variability of the workload having a system administrator in charge of manually monitoring reconfiguring the system
7
7/47 Related Work Decentralization of workflow process execution important area of research support business processes lead to higher scalability introduces several problems lack of a global view over the process scalability and reliability problems per se To address the problem GOLIAT,autonomic computing techniques, self-optimizing computer systems autonomic computing principles in the context of distributed workflow engines
8
8/47 Contribution Goal self-tuning self-configuration capabilities self-healing capabilities
9
9/47 Contribution (cont.) System extension to the JOpera engine Java based service composition tool combines a workflow engine with an open architecture to provide support for Web service composition, Grid computing and specialized workflow engines flexible architecture, components Key system modules can be replicated to handle large workloads. Other modules can be paired with a backup to achieve fault tolerance. The autonomic controller can be configured by selecting different reconfiguration strategies.
10
10/47 Contribution (cont.) the key contributions of the paper the novel system architecture generic can be adopted by many engines operating under different models and languages the resulting scalability and fault tolerance flexible enough to support the very large loads present in computational applications and large scale Web service composition the independence of the underlying workflow model easily extensible to support many different kinds of services
11
11/47 Contents Introduction System Background System Background System Architecture Autonomic Capabilities System evaluation Conclusion
12
12/47 System Background Requirements Workload Assumptions Deployment Environment
13
13/47 Requirements the workflow execution engine to support autonomic behavior must feature self-configuration, self-tuning and self healing capabilities Self-configuration switching the systems configuration on the fly without manual intervention and disrupting the system requires the workflow execution engine to support dynamically and efficiently change the configuration
14
14/47 Requirements (cont.) self-tuning system reconfiguration to optimal given the current workload the workflow engine must give access to its internal state control algorithms can analyze current and past performance information to plan configuration changes in respose to the current workload assumption the characteristics of the workload affect the systems performance the self-tuning algorithm can optimally adapt the system to the workload by monitoring key performance indicators
15
15/47 Requirements (cont.) self-healing able to detect configuration changes due to external events failures of nodes recovery action requires mechanisms for detecting failures and configuration changes of the cluster to query the workflow execution state
16
16/47 Workload Assumptions the workload is assumed to be a collection of concurrent workflow processes a worst case scenario not deal with workload prediction issues future work
17
17/47 Deployment Environment [Assumption] JOpera runs on a dedicated cluster of computers can use these resources exclusively main goal of the autonomic features to ensure the optimal configuration of the cluster efficient resource utilization good allocation of the available nodes to the different system components cluster configuration is NOT static the system could be extended to use shared nodes that are also used for other purposes.
18
18/47 Contents Introduction System Background System Architecture System Architecture Autonomic Capabilities System evaluation Conclusion
19
19/47 System Architecture Workflow Execution Distributed Workflow Execution Scalable Workflow Execution
20
20/47 Workflow Execution Workflow processes model interactions btw different tasks by defining the data flow and control flow btw them
21
21/47 Distributed Workflow Execution
22
22/47 Scalable Workflow Execution scalability bottleneck use several layers of caching btw tuple space and threads producing and consuming tuples
23
23/47 Contents Introduction System Background System Architecture Autonomic Capabilities Autonomic Capabilities System evaluation Conclusion
24
24/47 Autonomic Capabilities Self-Tuning Information Strategy Optimization Strategy Selection Strategy Self-Configuration Reconfiguration Actions Self-Healing
25
25/47 Self-tuning Information Strategy detect imbalances in the systems configuration to sample the current space size Optimization Strategy to establish a configuration such that the number of navigator and dispatcher threads is balanced Selection Strategy prioritizing nodes according to how well suited they are for a configuration change
26
26/47 Self-Configuration a closed feedback-loop controller Reconfiguration Actions Starting Threads the JOpera API Stopping Navigator Threads migrating the state of the processes the navigator thread is working on and redirecting associated events by flushing the locally cached state into the global tuple space
27
27/47 Self-Configuration (cont.) Stooping Dispatcher Threads more difficult task may involve the invocation of a local application or the interaction with a remote service provider on the Web metadata kill method immediately stops all active task executions ensures all task invocations will be repeated on a differend dispatcher thread stop method immediately ceases to take tuples from the task space
28
28/47 Self-Healing periodically monitors the nodes of the cluster Handling Dispatcher Thread Failures the task that were managed by it are lost and have to be restarted very similar to self-configuration component kills a dispatcher Handling Navigator Thread Failures the state of the execution of the process is still the available in the global process execution state space simply removing their entries in the tuple routing table which point to the failed navigator
29
29/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation System evaluation Conclusion
30
30/47 System evaluation Experimental Setup Base line Autonomic Behavior Self-Configuration Reconfiguration Overhead Self-Healing Discussion
31
31/47 Experimental Setup a cluster of up to 20 nodes 1.0GHz dual P-III, 1GB of RAM, Linux (Kernel version 2.4.22) and Suns Java Development Kit version 1.4.2 one additional node the global tuple space server IBMs T-Spaces v2.1.3
32
32/47 Base Line two different workloads 1000 concurrent processes containing 10 parallel tasks of duration of 0 seconds (workload 0) 1000 processes containing 10 parallel tasks of duration of 20 seconds (workload 20) total 15 nodes 14 navigators and 1 dispatcher up to 14 dispatchers and 1 navigator
33
33/47 Base Line (cont.)
34
34/47 Base Line (cont.)
35
35/47 Autonomic Behavior Self-Configuration
36
36/47 Autonomic Behavior (cont.)
37
37/47 Autonomic Behavior (cont.)
38
38/47 Autonomic Behavior (cont.) Reconfiguration Overhead
39
39/47 Self-Healing initially to use 15 nodes to replace 5 of the nodes assigned workload consists of four peaks of 500 processes occurring every 100 seconds each of the processes consist of 10 parallel tasks of 10 seconds duration change nodes grow to 20 nodes at t=90 reduced by 5 nodes at t = 140 again by 5 nodes at t=230
40
40/47 Self-Healing (cont.)
41
41/47 Self-Healing (cont.)
42
42/47 Self-Healing (cont.)
43
43/47 Self-Healing (cont.)
44
44/47 Discussion to find an optimal static configuration for a given workload very difficult different characteristics lead to different optimal configurations autonomic controller was able to adapt the configuration of the workflow engine according to the variable characteristics of the workload self-healing experiment common situation in the lifetime of a cluster-based system
45
45/47 Contents Introduction System Background System Architecture Autonomic Capabilities System evaluation Conclusion Conclusion
46
46/47 Conclusion the design of an autonomic workflow engine demonstrated its self-managing behavior and evaluated its performance show how to apply the autonomic computing paradigm to greatly simplify the deployment and the maintenance of such systems homogeneous workload more complex characteristics as part of future work
47
47/47
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.