Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory.

Similar presentations


Presentation on theme: "Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory."— Presentation transcript:

1 Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University

2 2 Challenges Emerging scientific applications are –Distributed, heterogeneous, long-running, dynamic Changing user requirements Changing problem domains Changing context environments Emerging execution environments are also –Distributed, heterogeneous, dynamic Changing workload and communication capabilities

3 3 Solution Applications should be aware of changes in application/system state and execution context, and respond to them. –i.e., applications should be self-managing or autonomic However, this requires a programming system that can support the development and execution of such autonomic self-managing applications. –Extend computational elements (objects, components, and services) to support autonomic behaviors –Define dynamic composition (interactions) of autonomic elements that responds to changing user requirements and execution context –Provide a runtime infrastructure to achieve self-management

4 4 Outline Challenges and solution Conceptual model of Accord Prototype implementation based on CCA Ccaffeine framework Illustrative applications

5 5 Overview of Accord Programming System Accord supports –Dynamic specification of adaptation behaviors in rules –Runtime enforcement of adaptation behaviors by invoking sensors and actuators –Runtime conflict detection and resolution Key contributions –Accord provides programming abstractions to define the control port –Accord enables applications to be context-aware and self- managing –Accord enables element behavior adaptation and interaction adaptation at runtime

6 6 Autonomic Element Element Manager Functional Port Autonomic Element Control Port Operational Port Element Manager Event generation Actuator invocation Other Interface invocation Internal state Contextual state Rules Computational Element

7 7 The Accord Runtime Infrastructure Application workflow Composition manager Application strategies Application requirements Compositio n rules Component rules

8 8 CCA and Ccaffeine Framework P0P1P2P3 Components: Blue, Green, Red Framework: Gray Different components in same process “talk to each” other via ports and the framework Same component in different processes talk to each other through their favorite communications layer (i.e. MPI, PVM, GA) Each process loaded with the same set of components wired the same way Note: this slide is taken from CCA tutorial – www.cca-forum.org The characteristics of scientific applications These applications are component- based. The execution of these applications typically consists of a series of computational phases.

9 9 Accord-CCA: Extend Ccaffeine to Enable Self-Management Behaviors Controllable component Component manager Composition manager Driver Ccaffeine framework + TAU C1 C2 C3 C4

10 10 Manager Components Component managers provide component-level adaptations via –Adapting the runtime behaviors of individual component based on component rules –Dynamically replacing components based on composition rules Composition managers provide application-level adaptations via –Coordinating component managers’ behaviors TAU RulePort events C2 C3

11 11 Rule Rule { on events; when conditions; do actions; } component or system events component or system sensors component or system actuators

12 12 The Rule Enforcement Engine Batch condition inquiry Condition evaluation in parallel Conflict detection and resolution Reconciliation Batch action invocation Context Internal state of elements Pre- condition Post- condition Sensor-actuator conflict: Detection: Execution of some rules will change the pre-condition Resolution: Disable these rules Actuator-actuator conflict: Detection: The post-condition contains multiple Resolution: Relax rule condition until no actuators are invoked with different values by incrementally deleting sensors in a user-specified sequence

13 13 Sensor-Actuator Conflict Rule1: When high precision and input within range1 select algorithm1 Rule2: When input within range2 Select low precision Pre-condition: high precision Input within range3 Post-condition: Rule1: algorithm1 Rule2: low precision Post-condition: algorithm1 range1 ∩ range2 = range3

14 14 Actuator-Actuator Conflict Rule1: When high precision when input within range 1: select algorithm1 when input within range 2: select algorithm2 When low precision when input within range 2: select algorithm1 when input within range 1: select algorithm2 Rule2: When low memory space: select algorithm 1 else: select algorithm 2 Pre-condition: high precision Input within range 2 low memory space Post-condition: Rule1: algorithm2 Rule2: algorithm1 Relax the precondition by deleting precision Post-condition: Rule1: algorithm1 or 2 Rule2: algorithm1 Post-condition: algorithm1

15 15 Reconciliation C1 C2 Node x C1 C2 Node y C1 C2 Node z Algorithm 1 Algorithm 2 C3 C4 Case1: If the replacement on node z has a high priority and the other two have a low priority: propagate the replacement with C4. If multiple high priority replacements: error. Case2: If all the replacements have a low priority, the replacement with highest performance gain will be propagated.

16 16 The Self-managing CH 4 Ignition Simulation: Self-optimizing Via Component Adaptation Component Manager Rule Generator Export sensor “temperature” and actuator “algorithm” InitializerExecutorCvode Thermo Chemistry Ref A set of algorithms is provided to simulate a set of reaction processes. Some algorithms may not work at some temperatures. Further, these algorithms demonstrate different performance levels (execution time) at the same temperature. So algorithms have to be dynamically selected to avoid application crash and/or optimize application execution.

17 17 The Self-managing Shock Simulation: Self- optimizing Via Component Replacement Component Manager IF cache miss of GodunovFlux > value THEN REPLACE GodunovFlux EFMFlux Performance toolkit (TAU) 2. collect cache miss of GodunovFlux 3. evaluate the rule GodunovFlux EFMFlux 1. register cache miss event 4. replace GodunovFlux with EFMFlux EFMFlux will be used from the next computation

18 18 The Self-managing Shock Simulation: Self- optimizing Via Component Adaptation AMRMesh Component Manager 1. export actuator “algorithm” IF bandwidth < threshold THEN algorithm x x y Performance toolkit (TAU) 3. collect current bandwidth 5. invoke algorithm with x Algorithm x will be used from the next computation 4. evaluate the rule 2. register communication bandwidth

19 19 The Self-managing Shock Simulation: Self- healing Via Component Replacement Component Manager IF GodunovFlux error THEN REPLACE GodunovFlux EFMFlux 2. evaluate the rule GodunovFlux EFMFlux 1. register execution error as a sensor 3. replace GodunovFlux with EFMFlux

20 20 Conclusion The distribution, heterogeneity, and dynamism of emerging environments and applications impose new requirements on programming systems –To support development and execution of autonomic self- managing applications Accord programming system extends CCA Ccaffeine framework to meet the requirements –Extends CCA components with component managers to autonomic components –Provides a runtime infrastructure to enforce adaptation behaviors and detect/resolve runtime conflicts

21 Additional Slides

22 22 Centralized vs Decentralized Reconciliation Centralized approach: one instance collects proposals from other instances and propogates reconciliation result –Converging rate = O(n) –Low scalability –Not robust Decentralized approach: each instance only communicates with its neighbors to achieve local consensus –Converging rate = O(lg n) –High scalability –Robust Problems to be solved –Local rules used by individual component instances –How to define neighbors


Download ppt "Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory."

Similar presentations


Ads by Google