University of Dortmund June 30, On Grid Performance Evaluation using Synthetic Workloads JSSPP 2006 Alexandru Iosup, Dick Epema PDS Group, ST/EWI, TU Delft Carsten Franke, Alexander Papaspyrou, Lars Schley, Baiyi Song, and Ramin Yahyapour UniDo
University of Dortmund June 30, Outline A Brief Introduction to Grid Computing On Grid Performance Evaluation Experimental Environments Performance Indicators General Workload Modeling Grid-Specific Workload Modeling The GrenchMark Framework Future Work Conclusions
University of Dortmund June 30, A Brief Introduction to Grid Computing Typical grid environment Applications [!] Unitary, composite Data Resources Compute (Clusters) Storage (Dedicated) Network Virtual Organizations, Projects Groups, Users Grids vs. parallel production environments Dynamic Heterogeneous Very large-scale (world) No central administration → Most resource management problems are NP-hard
University of Dortmund June 30, Experimental Environments Real-World Testbeds Real-World Testbed DAS, NorduGrid, Grid3/OSG, Grid’5000… Pros True performance, also shows “it works!” Infrastructure in place Cons Time-intensive Exclusive access (repeatability) Controlled environment problem (limited scenarios) Workload structure (little or no realistic data) What to measure (new environment)
University of Dortmund June 30, Experimental Environments Simulated and Emulated Testbeds Simulated and Emulated Testbeds GridSim, SimGrid, GangSim, MicroGrid … Essentially trade-off precision vs. speed Pros Exclusive access (repeatability) Controlled environment (unlimited scenarios) Cons Synthetic Grids: What to generate? How to generate? Clusters, Disks, Network, VOs, Groups, Users, Applications, etc. Workload structure (little or no realistic data) What to measure (new environment) Validity of results (accuracy vs. time)
University of Dortmund June 30, Grid Performance Evaluation Current Practice Performance Indicators Define my own metrics, or use U and AWT/ART, or both Workload Structure Run my own workload, or use traces that are not validated by peer researchers; do not make comparisons! Run benchmarks from typical parallel production environments Mostly all users are created equal assumption Need a common performance evaluation framework for Grid
University of Dortmund June 30, Grid Performance Evaluation Current Issues Performance Indicators What should be the metrics for the new environment? Workload Structure Which general aspects could be important? Which Grid-specific aspects need to be addressed? Need a common performance evaluation framework for Grid
University of Dortmund June 30, Performance Indicators Time-, Resource-, and System-Related Metrics Traditional: utilization, A(W)RT, A(W)WT, A(W)SD New: waste, fairness (or service quality reliability) Workload Completion and Failure Metrics “ In Grids, functionality may be even more important than performance ” Workload Completion (WC) Task and Enabled Task Completion (TC, ETC) System Failure Factor (SFF)
University of Dortmund June 30, General Aspects for Workload Modeling User/Group/VO model Detailed modeling for top-5/10 users, then clustering (Use squash area to group) Submission patterns Yearly, monthly, weekly, daily Do daily patterns exist? (Are Grids truly global?) Temporal patterns Repeated submission (batches of jobs) Job dependencies (composite applications common in Grid(?)) Feedback Empiric rules (don’t submit jobs when system busy). But, reactive submission tools, co-allocators, evolving applications, etc.
University of Dortmund June 30, Grid-Specific Workload Modeling Computation Management Processor co-allocation Fixed, non-fixed, semi-fixed jobs Job flexibility and composition Moldable, evolvable, flexible, etc. Batches, workflows, other dependecies Other aspects Background load: define top jobs (by consumption), model the rest as background load Project stage
University of Dortmund June 30, Grid-Specific Workload Modeling Data and Network Management Clearly Defined I/O Requirements Files, streams, others Data location and size Replicas Replica location Other aspects HDD occupancy Clearly Defined Network Requirements Bandwidth, latency Communication pattern Special Situations Dedicated paths, other QoS Other aspects Background load
University of Dortmund June 30, Grid-Specific Workload Modeling Locality/Origin Management Job issuer and execution site Not all VOs are created equal ! Two-level view: Which VO generates the next job? Within a VO, which user generates the next job? Three-level view, Multi-level view (Project, VO, Group, User) (Usage) Service Level Agreements Use my system 50% for 7 days, or 20% for 30 days Dedicated paths, other QoS Other aspects Background load pertaining to same (u)SLA
University of Dortmund June 30, Grid-Specific Workload Modeling Failure Modeling Error level Infrastructure Middleware Application User Fault tolerance scheme for submitted jobs Catch the system feedback into the model Other aspects Cascading errors
University of Dortmund June 30, Grid-Specific Workload Modeling Economic Models Utility Resource utility Application utility Pricing policies Time-dependent pricing: pay cheaper on off-peak hours Load-dependent pricing: pay cheaper for unused resources Package pricing: pay cheaper for bundles of resources Trust-building pricing: pay cheaper as old users Other aspects Available information Penalty / user satisfaction
University of Dortmund June 30, GrenchMark: a Framework for Analyzing, Testing, and Comparing grids What’s in a name? grid benchmark → working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead What’s it about? A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads A set of metrics for analyzing grid settings A set of representative grid applications Both real and synthetic Easy-to-use tools to create synthetic grid workloads Flexible, extensible framework
University of Dortmund June 30, GrenchMark Overview: Easy to Generate and Run Synthetic Workloads
University of Dortmund June 30, … but More Complicated Than You Think Workload structure User-defined and statistical models Dynamic jobs arrival Burstiness and self-similarity Feedback, background load Machine usage assumptions Users, VOs Metrics A(W) Run/Wait/Resp. Time Efficiency, MakeSpan Failure rate [!] (Grid) notions Co-allocation, interactive jobs, malleable, moldable, … Measurement methods Long workloads Saturated / non-saturated system Start-up, production, and cool-down scenarios Scaling workload to system Applications Synthetic Real Workload definition language Base language layer Extended language layer Other Can use the same workload for both simulations and real environments GrenchMark may become a vehicle for proving (performance indicators, workload modeling) research in dynamic, heterogeneous, very large-scale environments
University of Dortmund June 30, GrenchMark: Iterative Research Roadmap
University of Dortmund June 30, GrenchMark: Iterative Research Roadmap Simple functional system A.Iosup, J.Maassen, R.V.van Nieuwpoort, D.H.J.Epema, Synthetic Grid Workloads with Ibis, KOALA, and GrenchMark, CoreGRID IW, Nov 2005.
University of Dortmund June 30, GrenchMark: Iterative Research Roadmap Open- GrenchMark Community Effort This work Complex extensible system A.Iosup, D.H.J.Epema, GrenchMark: A Framework for Analyzing, Testing, and Comparing Grids, IEEE CCGrid'06, May 2006.
University of Dortmund June 30, Performance Evaluation of Grid Systems - need a common performance evaluation framework for grids - need real grid traces (scheduling, accounting, monitoring, etc.) - need more research on workload modeling and performance indicators Performance indicators - failure metrics as important as traditional performance metrics Workload modeling - generic workload modeling needs validation based on real grid traces - computation/data/network management - locality/origin management - failure modeling - economic models GrenchMark - generic tool for the whole community - generates diverse grid workloads - easy-to-use, flexible, portable, extensible, … Take home message
University of Dortmund June 30, Thank you! Questions? Remarks? Observations? All welcome! GrenchMark
University of Dortmund June 30,
University of Dortmund June 30, Representative Grid applications (3/4) Composite: DAG-based DAG-based applications Real DAG Chain of tools Try to model real or predicted (use) cases Input Output User task Linker Identity (one task’s output = other’s input, unmodified) App1 > Linker1 > App2 > Final result > out_1-2.dat param1.in out_1-1.dat huge-data.out perf2.dat param2.insome-list.in > out2.res l1p.dat perf1.dat