Use of Performance Prediction Techniques for Grid Management Junwei Cao University of Warwick April 2002
Outline Introduction Performance Prediction Local Grid Scheduling Global Grid Management A Case Study Conclusions & Future Work
Introduction Glossary Overview Related Works
Glossary Grids – computational grids Resources – multiprocessors or clusters Applications (tasks) – MPI & PVM parallel programs Users – developers and end users Agents, Requests & Services Performance – execution time
Overview Grid Users Grid Resources Global Grid Management Local Grid Scheduling Application Tools Performance Evaluation Engine Resource Tools
Related Works Performance evaluation POEMS, AppLeS, … Local grid schedulers Condor, LSF, Ninf, Nimrod, … Global grid management Globus, Legion, DPSS, …
Performance Prediction Methodology Implementation
PACE Methodology Application Layer Hardware Layer Subtask Layer Parallel Template Layer Model Parameters Predicted Execution Time acts as the entry point to the performance study describes the sequential parts within an application describes the parallel characteristics of subtasks characterises the comm. and comp. abilities of a particular system
PACE Toolkit Application Tools Resource Tools Evaluation Engine Source Code Analysis Object Editor Object Library PSL Compiler CPU Network (MPI, PVM) Cache (L1, L2) HMCL Compiler
Summary Advantages Reasonable accuracy Rapid evaluation time Easy cross-platform comparisons Limitations Application source codes required Static resource configurations
Local Grid Scheduling Algorithms Implementation (Titan)
FIFO Algorithm Processor 1 Processor 2 Processor 3 Processor 4 Processor 5 Processor 6 Processor 7 Processor 8 2 n -1
Genetic Algorithm Heuristic Evolutionary Near-optimal: Makespan Idletime Deadlines
Titan Implementation Communication Module PACE Evaluation Engine Task Management GA Scheduling Resource MonitoringTask Execution RequestsResultsService
Global Grid Management Methodology Implementation (ARMS) Metrics
Agent-based Methodology Agent structure Communication layer Decision-making layer Local management layer Agent hierarchy Service advertisement Service discovery Agent Capability Tables A AA AA User
Optimisation Strategies A AA AA M Advertisement Data-push & data-pull Periodic & event-driven Discovery Local services Services in ACTs Lower or upper agents Optimisation Modelling Simulation User
ARMS Implementation A AA AA M TTTTT Service information PACE models Makespan Request information Application binary PACE model Deadline Matchmaking Estimation (FIFO) Deadline User
Metrics Average advance time of application execution completions (compared to required deadlines) Average processor utilisation rate – utilisedtime/totaltime Load balancing level – mean square deviation of processor utilisation rates Average discovery speed – #req./#disc.conn. Average discovery efficiency – #req./(#disc.conn.+ #adver.conn.)
A Case Study Design Demonstrations Results
Experiment Design S 1 (SGIOrigin2000, 16) S 2 (SGIOrigin2000, 16) S 4 (SunUltra10, 16) S 3 (SunUltra10, 16) S 5 (SunUltra5, 16) S 6 (SunUltra5, 16) S 12 (SunSPARCstat ion2, 16) S 11 (SunSPARCstat ion2, 16) S 8 (SunUltra1, 16) S 7 (SunUltra5, 16) S 10 (SunUltra1, 16) S 9 (SunUltra1, 16) sweep3d fft improc closure jacobi memsort cpi
Experiment 1 FIFO
Experiment 1 FIFO
Experiment 2 GA
Experiment 3 GA
Application Execution Both GA and agents contribute towards the improvement in application executions.
Resource Utilisation S11 & S12 benefit mainly from the GA. S1 & S2 benefit mainly from agents.
Load Balancing The GA contributes more to local grid load balancing. Agents contribute more to global grid load balancing.
Discovery Speed & Efficiency No advertisement: Low speed Low efficiency Reasonable advertisement: High speed High efficiency Too much advertisement: Very high speed Very low efficiency Discovery speed (*100) Discovery efficiency (*100)
Conclusions & Future Work
Conclusions Performance prediction capabilities are essential to grid management. Genetic algorithm is applied for local grid scheduling. Global grid management is achieved using an agent-based methodology. Agent-based framework is scalable, flexible, extensible and easy for further enhancement.
Future Work Impact of prediction accuracy on grid management and scheduling Transaction-based application performance modelling Integration with Globus and NWS More than discovery, enabling negotiation and coordination
JAG: Java Agents for Grids Agent Coordination Agent Negotiation Agent Discovery Agent Communication Knowledge Representation Performance Prediction Application Scheduling Information Service Grid Monitoring PACE AppLeS Globus NWS ……
Questions are welcome …