Download presentation
Presentation is loading. Please wait.
Published byNaomi Pearson Modified over 9 years ago
1
1 High-Performance, Power-Aware Computing Vincent W. Freeh Computer Science NCSU vin@csc.ncsu.edu
2
2 Acknowledgements Students Mark E. Femal – NCSU Nandini Kappiah – NCSU Feng Pan – NCSU Robert Springer – Georgia Faculty Vincent W. Freeh – NCSU David K. Lowenthal – Georgia Sponsor IBM UPP Award
3
3 The case for power management in HPC Power/energy consumption a critical issue Energy = Heat; Heat dissipation is costly Limited power supply Non-trivial amount of money Consequence Performance limited by available power Fewer nodes can operate concurrently Opportunity: bottlenecks Bottleneck component limits performance of other components Reduce power of some components, not overall performance Today, CPU is: Major power consumer (~100W), Rarely bottleneck and Scalable in power/performance (frequency & voltage) Power/performance “gears”
4
4 Is CPU scaling a win? Two reasons: 1.Frequency and voltage scaling Performance reduction less than Power reduction 2.Application throughput Throughput reduction less than Performance reduction Assumptions CPU large power consumer CPU driver Diminishing throughput gains performance (freq) power application throughput performance (freq) (1) (2) CPU power P = ½ CVf 2
5
5 AMD Athlon-64 x86 ISA 64-bit technology Hypertransport technology – fast memory bus Performance Slower clock frequency Shorter pipeline (12 vs. 20) SPEC2K results 2GHz AMD-64 is comparable to 2.8GHz P4 P4 better on average by 10% & 30% (INT & FP) Frequency and voltage scaling 2000 – 800 MHz 1.5 – 1.1 Volts
6
6 LMBench results LMBench Benchmarking suite Low-level, micro data Test each “gear” GearFrequency (Mhz)Voltage 020001.5 118001.4 216001.3 314001.2 412001.1 68000.9
7
7 Operations
8
8 Operating system functions
9
9 Communication
10
10 Energy-time tradeoff in HPC Measure application performance Different than micro benchmarks Different between applications Look at NAS Standard suite Several HP application Scientific Regular
11
11 Single node – EP CPU bound: Big time penalty No (little) energy savings +11% -2% +45% +8% +150% +52% +25% +2% +66% +15%
12
12 Single node – CG +1% -9% +10% -20% Not CPU bound: Little time penalty Large energy savings
13
13 Operations per miss Metric for memory pressure Must be independent of time Uses hardware performance counters Micro-operations x86 instructions become one or more micro-operations Better measure of CPU activity Operations per miss (subset of NAS) Suggestion: Decrease gear as ops/miss decreases EPBTLUMGSPCG 84479.673.570.649.58.60
14
14 Single node – LU +4% -8% +10% -10% Modest memory pressure: Gears offer E-T tradeoff
15
15 Ops per miss, LU
16
16 Results – LU Shift 0/1 +1%, -6% Auto shift +3%, -8% Gear 1 +5%, -8% Gear 2 +10%, -10% Shift 1/2 +1%, -6% Shift 0/2 +5%, -8%
17
17 Bottlenecks Intra-node Memory Disk Inter-node Communication Load (im)balance
18
18 Multiple nodes – EP S 2 = 2.0 S 4 = 4.0 S 8 = 7.9 Perfect speedup: E constant as N increases E = 1.02
19
19 Multiple nodes – LU S 2 = 1.9 E 2 = 1.03 S 4 = 3.3 E 4 = 1.15 S 8 = 5.8 E 8 = 1.28 Good speedup: E-T tradeoff as N increases S 8 = 5.3 E 8 = 1.16 Gear 2
20
20 Multiple nodes – MG S 2 = 1.2 E 2 = 1.41 S 4 = 1.6 E 4 = 1.99 S 8 = 2.7 E 8 = 2.29 Poor speedup: Increased E as N increases
21
21 Normalized – MG With communication bottleneck E-T tradeoff improves as N increases
22
22 Jacobi iteration Can increase N decrease T and decrease E
23
23 Future work We are working on inter-node bottleneck
24
24 Safe overprovisioning
25
25 The problem Peak power limit, P Rack power Room/utility Heat dissipation Static solution, number of servers is N = P/P max Where P max maximum power of individual node Problem Peak power > average power (P max > P average ) Does not use all power – N * (P max - P average ) unused Under performs – performance proportional to N Power consumption is not predictable
26
26 Safe over provisioning in a cluster Allocate and manage power among M > N nodes Pick M > N Eg, M = P/P average MP max > P P limit = P/M Goal Use more power, safely under limit Reduce power (& peak CPU performance) of individual nodes Increase overall application performance time power P max P average P(t) time power P limit P average P(t) P max
27
27 Safe over provisioning in a cluster Benefits Less “unused” power/energy More efficient power use More performance under same power limitation Let P be performance Then more performance means: M P * > N P Or P * / P > N/M or P * / P > P limit /P max time power P max P average P(t) time power P limit P average P(t) P max unused energy
28
28 When is this a win? When P * / P > N/M or P * / P > P limit /P max In words: power reduction more than performance reduction Two reasons: 1.Frequency and voltage scaling 2.Application throughput performance (freq) power application throughput P * / P < P average /P max P * / P > P average /P max performance (freq) (1) (2)
29
29 Feedback-directed, adaptive power control Uses feedback to control power/energy consumption Given power goal Monitor energy consumption Adjust power/performance of CPU Paper: [COLP ’02] Several policies Average power Maximum power Energy efficiency: select slowest gear (g) such that
30
30 Implementation Components Two components Integrated into one daemon process Daemons on each node Broadcasts information at intervals Receives information and calculates P i for next interval Controls power locally Research issues Controlling local power Add guarantee, bound on instantaneous power Interval length Shorter: tighter bound on power; more responsive Longer: less overhead The function f(L 0, …, L M ) Depends on relationship between power-performance interval (k) PikPik Individual power limit for node i
31
31 Results – fixed gear 0 1 2 3 4 5 6
32
32 Results – dynamic power control 0 1 2 3 4 5 6
33
33 Results – dynamic power control (2) 0 1 2 3 4 5 6
34
34 Summary
35
35 End
36
36 Summary Safe over provisioning Deploy M > N nodes More performance Less “unused” power More efficient power use Two autonomic managers Local: built on prior research Global: new, distributed algorithm Implementation Linux AMD Contact: Vince Freeh, 513-7196, vin@csc.ncsu.edu
37
37 Autoshift
38
38 Phases
39
39 Allocate power based on energy efficiency Allocate power to maximize throughput Maximize number of tasks completed per unit energy Using energy-time profiles Statically generate table for each task Tuple (gear, energy/task) Modifications Nodes exchange pending tasks P i determined using table and population of tasks Benefit Maximizes task throughput Problems Must avoid starvation
40
40 Memory bandwidth
41
41 Power management –ICK: need better 1 st slide What Controlling power Achieving desired goal Why Conserve energy consumption Contain instantaneous power consumption Reduce heat generation Good engineering
42
42 Related work: Energy conservation Goal: conserve energy Performance degradation acceptable Usually in mobile environments (finite energy source, battery) Primary goal: Extend battery life Secondary goal: Re-allocate energy Increase “value” of energy use Tertiary goal: Increase energy efficiency More tasks per unit energy Example Feedback-driven, energy conservation Control average power usage P ave = (E 0 – E f )/T E0E0 EfEf T power freq
43
43 Related work: Realtime DVS Goal: Reduce energy consumption With no performance degradation Mechanism: Eliminate slack time in system Savings E idle with F scaling Additional E task –E task ’ with V scaling T P E task deadline P max T P E task ’ deadline P max E idle
44
44 Related work: Fixed installations Goal: Reduce cost (in heat generation or $) Goal is not to conserve a battery Mechanisms Scaling Fine-grain – DVS Coarse-grain – power down Load balancing
45
45 Single node – MG
46
46 Single node – EP
47
47 Single node – LU
48
48 Power, energy, heat – oh, my Relationship E = P * T H E Thus: control power Goal Conserve (reduce) energy consumption Reduce heat generation Regulate instantaneous power consumption Situations (benefits) Mobile/embedded computing (finite energy store) Desktops (save $) Servers, etc (increase performance)
49
49 Power usage CPU power Dominated by dynamic power System power dominated by CPU Disk Memory CPU notes Scalable Driver of other system Measure of performance performance (freq) power CMOS dynamic power equation: P = ½CfV 2
50
50 Power management in HPC Goals Reduce heat generation (and $) Increase performance Mechanisms Scaling Feedback Load balancing
51
51 Single node – MG +6% -7% +12% -8% Modest memory pressure: Gears offer E-T tradeoff
52
52 Power management vs. energy conservation Power management is mechanism Energy conservation is policy Two elements Energy efficiency Ie, Decrease energy consumed per task (Instantaneous) power consumption Ie, Limit maximum Watts used Power-performance tradeoff Less power & less performance Ultimately energy-time Power management 2GHz 800MHz AMD system 6 gears
53
53 Autonomic managers Implementation uses two autonomic managers Local – power control Global – power allocation Local Uses prior research project (new implementation) Requires new policy Daemon process Reads power meter Adjusts processor performance gear (freq) Global At regular intervals Collects appropriate information from all nodes Allocates power budget for next quantum Optimize for one of several objectives
54
54 Example: Load imbalance Uniform allocation of power P i = P limit = P/M, for node i Not ideal if nodes unevenly loaded Tasks execute more slowly on busy nodes Lightly loaded nodes may not use all power Allocate power based on load* At regular intervals, nodes exchange load information Each computes individual power limit for next interval (k) *Note: Load is one of several possible objective functions. individual power limit for node i at interval k Ensure:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.