Euro-Par 2008, Las Palmas, 27 August DGSim : Comparing Grid Resource Management Architectures Through Trace-Based Simulation Alexandru Iosup, Ozan Sonmez, and Dick Epema PDS Group Delft University of Technology The Netherlands
Euro-Par 2008, Las Palmas, 27 August A Grid Research Toolbox Hypothesis: (a) is better than (b). DGSim For scenario 1, …
Euro-Par 2008, Las Palmas, 27 August A Grid Research Toolbox Hypothesis: (a) is better than (b). DGSim For scenario 1, …
Euro-Par 2008, Las Palmas, 27 August The Problem with Grid Simulations Three decades of writing simulators in computer science → writing the simulator is not the problem The problem: getting from solution design to experimental results with an automated simulation tool Experimental setup Tool to generate realistic experimental setups Experiment support for grid resource management Tool to manage large numbers of related simulations Performance Not the simulation time (decades of optimizations there) Tool proved to work with large simulations (number of resources, workload size, etc.)
Euro-Par 2008, Las Palmas, 27 August Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work
Euro-Par 2008, Las Palmas, 27 August The DGSim Framework Name, Goal, and Challenges DGSim = Delft Grid Simulator Simulate various grid resource management architectures Multi-cluster grids Grids of grids (THE grid) Challenges Many types of architectures Generating and replaying grid workloads Management of the simulations Many repetitions of a simulation for statistical relevance Simulations with many parameters Managing results (e.g., analysis tools) Enabling collaborative experiments Two GRM architectures
Euro-Par 2008, Las Palmas, 27 August The DGSim Framework Overview Discrete-Event Simulator
Euro-Par 2008, Las Palmas, 27 August The DGSim Framework Model Details: Inter-Operation Architectures Hybrid hierarchical/ decentralized Decentralized Hierarchical IndependentCentralized
Euro-Par 2008, Las Palmas, 27 August The DGSim Framework Model Details: Resource Dynamics & Evolution Resource dynamics Short-term changes in resource availability status Resource evolution Long-term changes in number & … of resources A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, 2007.
Euro-Par 2008, Las Palmas, 27 August The DGSim Framework Workloads: Generation and Model(s) Parallel jobs Adapting the Lublin-Feitelson model to grids Bags-of-Tasks: groups of independent single-processor tasks Validated with seven long-term grid traces A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, Workload Generation Generate synthetic workload with realistic characteristics Iterative workload generation: incur specified load on a grid
Euro-Par 2008, Las Palmas, 27 August Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work
Euro-Par 2008, Las Palmas, 27 August DGSim Validation Functional Validation Functional validation (simple scenario) Workload = 100 jobs ct. size 10,000 arrive at t=0 System: grid scheduler over one 10-resource cluster resource = 1 work unit/second, information delay = s
Euro-Par 2008, Las Palmas, 27 August DGSim Validation Real vs. Simulated DAS-3 Multi-Cluster Grid Simulator setup Application: synthetic parallel, communication-intensive (all-gather) Measured: runtime for various configurations (co-allocation) System: heterogeneous clusters, Koala co-allocating scheduler Workload: 300 jobs, submitted over a period of 6 hours All jobs submitted through central cluster gateways Results Scheduling algorithm leads to similar results in real and simulated environments → can use simulator for analyzing scheduling trends Under-estimation of waiting time (failures lead to more contention)
Euro-Par 2008, Las Palmas, 27 August Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work
Euro-Par 2008, Las Palmas, 27 August DGSim Examples Sample 1/3 Investigate mechanisms for inter-operating grids New mechanism: DMM Trace-based performance evaluation through simulations Real and model-based traces Largest trace: 1.4M jobs Simulate Grid’5000+DAS-2 Explored a design space of over 1 million design points A. Iosup, D.H.J.Epema, T. Tannenbaum, M. Farrellee, M. Livny, Inter-Operating Grids through Delegated MatchMaking, ACM/IEEE SuperComputing, 2007.
Euro-Par 2008, Las Palmas, 27 August DGSim Examples Sample 2/3 What is the performance impact of the dynamic grid resource availability? Four models for grid resource availability information Trace-based performance evaluation through simulations Real traces Simulate Grid’5000 KA = AMA > HMA >> SA A. Iosup, M. Jan, O. Sonmez, and D.H.J. Epema, On the Dynamic Resource Availability in Grids, IEEE/ACM Grid, Resource availability StaticDynamic Availability Information Delay On-Time (0) Short period Long period SAKA AMA HMA Avg. Norm. G’put. [cpuseconds/day/proc] Goodput decreases with intervention delay Model SAKAAMA 60s AMA 1h HMA 1w HMA 1mo HMA Never
Euro-Par 2008, Las Palmas, 27 August DGSim Examples Sample 3/3 Analyze performance of bag- of-tasks scheduling algorithms Information availability framework: Known, Unknown, Historical records Trace-based performance evaluation through simulations Real and model-based traces Simulate Grid’5000+DAS Evaluated 8 scheduling algorithms Explored a design space of over 2 million design points A. Iosup, O.O. Sonmez, S. Anoep, D.H.J.Epema, The Performance of Bags-of-Tasks in Large-Scale Distributed Computing Systems, IEEE HPDC, Task Information Resource Information KHU K H U ECT, FPLT FPFECT-P DFPLT, MQD STFR RR, WQR
Euro-Par 2008, Las Palmas, 27 August Outline 1.Problem Statement 2.The DGSim Framework 3.DGSim Validation 4.DGSim Examples 5.Future Work
Euro-Par 2008, Las Palmas, 27 August Conclusion and Future Work The DGSim framework Tool to generate realistic experimental setups Tool to manage large numbers of grouped simulations Tool proved to work with large simulations Validated underlying models and assumptions Resource dynamics and evolution model Workload model Comparing grid resource management architectures Proven in various settings Future work More scenarios Library of ready-to-use scenarios
Euro-Par 2008, Las Palmas, 27 August Thank you! Questions? Remarks? Observations? Contact: [google “Iosup“] Web sites: ohttp:// : VL-e project ohttp:// : PDS group articles & software