Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented.

Similar presentations


Presentation on theme: "1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented."— Presentation transcript:

1 1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented by: Huaxia Xia CSAG, CSE of UCSD

2 2 of 17 Introduction Power-aware Computing HPC Uses Large-scale Systems, Has High Power Consumption Two extremes: Performance-at-all-costs Low-performance but more energy efficient This paper targets to save energy with little performance penalty

3 3 of 17 Related Work Server/Desktop Systems Minimize the number of servers needed to handle the load, and set other servers into low-energy state (standby or power-off) Set node voltage independently Disk: Modulate the speed of disks dynamically Improve cache policy Aggregate disk accesses to have burst requests Mobile Systems Energy-aware OS Voltage-changeable CPU Disk spindown Memory Network

4 4 of 17 Assumptions HPC Applications Performance is the Primary Concern Highly Regular and Predictable CPU has Multiple “ Gears ” Variable Frequency Variable Voltage CPU is a Major Power Consumer Energy consumption of disks/memory/network is not considered

5 5 of 17 Methodology: Profile-Directed 1. Get Program Trace 2. Divide the Program into Blocks 3. Merge the Blocks into Phases 4. Search the Best Gear for Each Phase Heuristically

6 6 of 17 Divide Codes into “Blocks” Rule 1: Any MPI operation demarcates a block boundary. Rule 2: If the memory pressure changes abruptly, a block boundary occurs at this change. Use operations per miss (OPM) as a measure of the memory pressure

7 7 of 17 Merge “Blocks” into “Phases” Two adjacent blocks are merged into a phase if their corresponding memory pressure is within the same threshold OPM in Trace of LU (Class C):

8 8 of 17 Data Collection Use MPI-jack Intercept any MPI call transparently Can execute arbitrary codes before/after an intercepted call Insert pseudo MPI calls at non-MPI phase boundaries Collect information of time, operations, L2 misses Question: Mutual Dependence? Trace data  Block boundaries

9 9 of 17 Solution Search (1) Metrics: Energy-Time Tradeoff Normalized energy and time Total system energy A larger negative number indicates a near vertical slope and a significant energy saving Question: How to measure energy consumption accurately?

10 10 of 17 Solution Search (2) Phase Prioritization Sort the phases in the order of OPM (low  high) Question: why is sorting necessary? “Novel” Heuristic Search Find the local optimal gear for each phase one by one Running time is at most n×g

11 11 of 17 Solution Search (3)

12 12 of 17 Experiments 10 AMD Athlon-64 CPUs Frequency-scalable: 800-2000MHz Voltage-scalable: 0.9-1.5V 1GB main memory 128KB L1 cache, 512KB L2 cache 100Mb/s network CPU Consumes 45-55% of Overall System Energy Benchmarks: NAS Parallel Benchmarks (NPB)

13 13 of 17 Results: Multiple Gear Benefit IS: 16% energy saving with 1% extra time BT: 10% energy saving with 5% extra time MG: 11% energy saving with 4% extra time

14 14 of 17 Results: Single Gear Benefit The order of phases matters! CG: 8% energy saving with 3% extra time SP: 15% energy saving with 7% extra time

15 15 of 17 Results: No Benefit

16 16 of 17 Conclusions and Future Work Use Profile-directed Method to Achieve Good Energy-Time Tradeoff for HPC Applications Future work: Enhance profile-directed techniques Consider Inter-node bottlenecks Automate the entire process

17 17 of 17 Discussion How important is power consumption to HPC? 10% energy  ?  5% time Is Profile-directed method practical? Effective for applications that run repeatedly How much degree of automatic? Is OPM (Operations Per Miss) a good metric to find phases? Key Purpose: to identify CPU utilization Other options: Instructions Per Second, CPU Usage Is OPM a good metric to sort phases?


Download ppt "1 Using Multiple Energy Gears in MPI Programs on a Power- Scalable Cluster Vincent W. Freeh, David K. Lowenthal, Feng Pan, and Nandani Kappiah Presented."

Similar presentations


Ads by Google