Download presentation
Presentation is loading. Please wait.
Published byRichard Webb Modified over 9 years ago
1
System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory
2
Scientific Supercomputer Workload Long running batch jobs (hours) Typically 64 nodes per job Often long list of queued jobs Job turnaround maybe days
3
Motivations –Ability to fully utilize a large computer is almost as important as the speed of the computer. –Large capability mainframes rarely have idle cycles - need to maximize users’ productivity. –Need a way to measure potential day-to-day utilization. –No metric to gauge configuration changes other than anecdotal. –Increased complexity of scheduling with parallel platforms A test to assess system capabilities & configuration effects on utilization Effective System Performance (ESP)
4
Parallel Job Scheduling Optimization problem in packing with space (processor) and time constraints Dynamic situation Tradeoffs in turnaround, utilization & fairness
5
Scheduling Strategies Job Queue Hole Order of Submission Best-Fit-First Scan queue for best fit First-Come-First-Serve Wait for right size hole Starvation of large jobs May idle system Respects submission order
6
Key OS System Capabilities Swapping / Gang-scheduling Job migration / compaction Priority preemption Backfill Disjoint partitions Checkpoint / restart Dynamically adjustable queue structures
7
ESP Design Goals & Attributes Transferable metric(s) / Valid comparisons Reproducible Easily interpreted results Portable Platform size and speed independent Capture essence of real workload Compact and easily distributed Easy to run (< 12 hours) Automated / no human intervention Focus on utilization / factor out CPU speed Test responsiveness & adaptability of scheduler
8
ESP Design Start with throughput test Profile of jobs determined by historical accounting data Find applications with appropriate size and time Use two full configuration jobs to encapsulate change of operational mode (e.g. interactive to batch) Submit jobs in three blocks in pseudo-random order
9
ESP Test Schematic time <12 hours full config #1full config #2 regular jobs >10% regular jobs shutdown/ reboot (opt) regular jobs Vanilla variant (throughput)
10
Individual Applications in Jobmix
11
Jobmix Application Elapsed Times T3E SP Increasing Partition Size
12
Platforms Tested Cray T3E –512 processors –450 MHz Alpha EV56 –Microkernel MPP OS –NQS & Global Resource Mgr –Oversubscription possible –BFF strategy w/ dynamic queue configs IBM SP –512 processors –200 MHz Power3 –Semiautonomous Monolithic OSes –Loadleveller batch queues –FCFS w/ backfill (backfill disabled in 1st attempt)
13
T3E Chronology (with swap) Insufficient work; Tailend dilemma Starvation of large jobs Normalized = Elapsed / Theoretical Min
14
T3E Chronology (without swap) Slight decrease in utilization w/o swap capability Higher overall efficiency - significant overhead w/ swap
15
SP Chronology Waiting for machine to idle
16
Queue Wait Times (normalized) Jobs sorted by Partition Size & Submit Time T3E Swap T3E NoSwap SP BFF - larger jobs = longer wait FCFS - less dependence on size Swap permits more simultaneous jobs running = shorter wait times Idling twice causes 3 distinct regimes of wait times
17
Restoring Backfill on the SP Recognized that backfill is the standard mode for Loadleveller Have problems with backfill and ESP stipulations However… interesting data from invalid testshot
18
Backfill Effect I (Chronology) SP FCFS SP FCFS w/ backfill Highly efficient, but violates test Need to selectively backfill
19
Backfill Effect II (Queue Wait Times) SP FCFS SP FCFS w/ backfill
20
Backfill and Flaw in ESP test FC job submitted All jobs finish except one Guaranteed FC runtime time Backfill is working as expected but long-running job negates effect of reservation time - need finer granularity jobs Stipulation for FC jobs? 1. Run immediately (possibly premature termination of running jobs) T3E 2. Run after current jobs finish SP w/ backfill 3. No further jobs launched until FC finishes SP
21
Further Design Issues How to end the test? Possible to use backfill (globally or selectively)? Can we formulate a turnaround metric? Scalability in size and speed Finer granularity of jobs cf. overall test Perhaps need additional vanilla throughput test to evaluate purely scheduler performance
22
Conclusions & Observations SP - Can achieve very high utilization with backfill and no topology constraints SP -Lack of adaptability with dynamic workload - run ASAP mode T3E - Swapping with high overhead degrades utilization T3E - Can adapt to dynamic workload requirements
23
Ongoing and Future Work Scheduled test run on 512-way Origin 2K & Compaq SC Vanilla throughput runs on T3E and SP Redesign for next version of ESP Distribute ESP to other interested sites
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.