Download presentation
Presentation is loading. Please wait.
Published byClarissa Stephens Modified over 8 years ago
1
1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented by: Huaxia Xia CSAG, CSE of UCSD
2
2 of 17 Introduction Power-aware Computing HPC Uses Large-scale Systems, Has High Power Consumption Two extremes: Performance-at-all-costs Low-performance but more energy efficient This paper targets to save energy with little performance penalty
3
3 of 17 Related Work Server/Desktop Systems Minimize the number of servers needed to handle the load, and set other servers into low-energy state (standby or power-off) Set node voltage independently Disk: Modulate the speed of disks dynamically Improve cache policy Aggregate disk accesses to have burst requests Mobile Systems Energy-aware OS Voltage-changeable CPU Disk spindown Memory Network
4
4 of 17 Assumptions HPC Applications Performance is the Primary Concern Highly Regular and Predictable CPU has Multiple “ Gears ” Variable Frequency Variable Voltage CPU is a Major Power Consumer Energy consumption of disks/memory/network is not considered
5
5 of 17 Methodology: Profile-Directed 1. Get Program Trace 2. Divide the Program into Blocks 3. Merge the Blocks into Phases 4. Search the Best Gear for Each Phase Heuristically
6
6 of 17 Divide Codes into “Blocks” Rule 1: Any MPI operation demarcates a block boundary. Rule 2: If the memory pressure changes abruptly, a block boundary occurs at this change. Use operations per miss (OPM) as a measure of the memory pressure
7
7 of 17 Merge “Blocks” into “Phases” Two adjacent blocks are merged into a phase if their corresponding memory pressure is within the same threshold OPM in Trace of LU (Class C):
8
8 of 17 Data Collection Use MPI-jack Intercept any MPI call transparently Can execute arbitrary codes before/after an intercepted call Insert pseudo MPI calls at non-MPI phase boundaries Collect information of time, operations, L2 misses Question: Mutual Dependence? Trace data Block boundaries
9
9 of 17 Solution Search (1) Metrics: Energy-Time Tradeoff Normalized energy and time Total system energy A larger negative number indicates a near vertical slope and a significant energy saving Question: How to measure energy consumption accurately?
10
10 of 17 Solution Search (2) Phase Prioritization Sort the phases in the order of OPM (low high) Question: why is sorting necessary? “Novel” Heuristic Search Find the local optimal gear for each phase one by one Running time is at most n×g
11
11 of 17 Solution Search (3)
12
12 of 17 Experiments 10 AMD Athlon-64 CPUs Frequency-scalable: 800-2000MHz Voltage-scalable: 0.9-1.5V 1GB main memory 128KB L1 cache, 512KB L2 cache 100Mb/s network CPU Consumes 45-55% of Overall System Energy Benchmarks: NAS Parallel Benchmarks (NPB)
13
13 of 17 Results: Multiple Gear Benefit IS: 16% energy saving with 1% extra time BT: 10% energy saving with 5% extra time MG: 11% energy saving with 4% extra time
14
14 of 17 Results: Single Gear Benefit The order of phases matters! CG: 8% energy saving with 3% extra time SP: 15% energy saving with 7% extra time
15
15 of 17 Results: No Benefit
16
16 of 17 Conclusions and Future Work Use Profile-directed Method to Achieve Good Energy-Time Tradeoff for HPC Applications Future work: Enhance profile-directed techniques Consider Inter-node bottlenecks Automate the entire process
17
17 of 17 Discussion How important is power consumption to HPC? 10% energy ? 5% time Is Profile-directed method practical? Effective for applications that run repeatedly How much degree of automatic? Is OPM (Operations Per Miss) a good metric to find phases? Key Purpose: to identify CPU utilization Other options: Instructions Per Second, CPU Usage Is OPM a good metric to sort phases?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.