Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell

2 Outline Motivation Related Research Architecture Experimental Evaluation Extensions Summary and Future work

3 Motivation Power consumption is becoming a limiting factor with scaling of technology to smaller feature sizes Mobile/battery-powered computing applications Thermal issues in high end servers Low Power Design is not enough: Power- and Energy-Aware Design Adapt to non-uniform application behavior Only use as many resources as required by application This talk : Exploit processor-memory performance gap to save power, with limited performance degradation

4 Related Research Reducing power dissipation in on-chip caches Reducing instruction cache leakage power dissipation [Powell et al, TVLSI ‘01] Reducing dynamic power in set-associative caches and on-chip buffer structures [Dropsho et al, PACT ‘02] Reducing power dissipation of CPU core Compiler-directed dynamic voltage scaling of CPU core [Hsu, Kremer, Hsiao. ISLPED ‘01]

5 Target Application Class: Memory-Bound Applications Memory-bound applications Limited by memory system performance CPU @ V dd CPU @ V dd /2 Single-issue in-order processors Limited overlap of main memory access and computation

6 Power-Performance Tradeoff Detect memory-bound execution phases Maintain sufficient information to determine compute / stall time ratio Pros Scaling down CPU core voltage yields significant energy savings (Energy  V dd 2 ) Cons Performance hit (Delay  V dd )

7 Power Adaptation Unit (PAU) Maintains information to determine ratio of compute to stall time Entries allocated for instructions which cause CPU stalls Intuitively, one table entry required per program loop [From S-M et al, PACS 2002] Fields: State (I, A, T, V) # instrs. executed (NINSTR) Distance b/n stalls (STRIDE) Saturating ‘Quality’ counter (Q)

8 PAU Table Entry State Machine If CPU at-speed, slow it down ∂ = 0.01 STRIDE + NINSTR NINSTR Slowdown factor, ∂, for a target 1% performance degradation:

9 Example for (x = 100;;) { if (x- - > 0) a = i; b = *n; c = *p++; } PAU table entries created for each assignment After 100 iterations, assignment to a stops Entries for b or c can take over immediately

10 Experimental Methodology Simulated PAU as part of a single-issue embedded processor Used Myrmigki simulator [S-M et al, ISLPED 2001] Models Hitachi SH RISC embedded processor 5 stage in-order pipeline 8K unified L1, 100 cycle latency to main memory Empirical instruction power model, from SH7708 device Voltage scaling penalty of 1024 cycles, 14uJ Investigated effect of PAU table size on performance, power Intuitively, PAU table entries track program loops with repeated stalls

11 Effect of Table Size on Energy Savings Single-entry PAU table provides 27% reduction in energy, on average Scaling up to a 64-entry PAU table only provides additional 4%

12 Effect of Table Size on Performance Single-entry PAU table incurs 0.75% performance degradation, on avg. Large PAU table, leads to more aggressive behavior, increased penalty

13 Overall Effect of Table Size : Energy-Delay product Considering both performance and power, there is little benefit from larger PAU table sizes

14 Extending the PAU structure Multiprogramming environments Superscalar architectures Slowdown factor computation

15 PAU in Multiprogramming Environments Only a single entry necessary per application Amortize mem.-bound phase detection Would be wasteful to flush PAU at each context switch (~10ms) Extend PAU entries with an ID field: CURID and IDMASK fields written to by OS

16 PAU in Superscalar Architectures Dependent computations are ‘stretched out’ FUs with no dependent instructions unduly slowed down Maintain separate instruction counters per FU: Drawback : Requires ability to run FUs in core at different voltages CPU @ V dd CPU @ V dd /2

17 Slowdown factor computation Computation only performed on application phase change Hardware solution would be wasteful Solution : computation by software ISR Compute ∂, lookup discrete V dd /Freq. by indexing into a lookup table Similar software handler solution proposed in [Dropsho et al, 2002]

18 Summary & Future Work PAU : Hardware identifies program regions (loops) with compute / memory stall mismatch Due to nature of most programs, even single entry PAU is effective : can achieve 27% energy savings with only 0.75% perf. Degradation Proposed extensions to PAU architecture Future work Evaluations with smaller miss penalties Implementation of proposed extensions More extensive evaluation of implications of applications

19 Questions

20 Extended PAU structure Extensions to original PAU design shown shaded light blue

21 Role of PAU Control operating voltage of CPU core or memory Level of granularity for scaling can’t be too small Assumes Memory and CPU core can be run at different voltages

22 PAU - Power Adaptation Unit Hardware structure detects memory- and CPU-bound application regions Determines Determines how much to scale down voltage of processor core Determination is made with a performance criterion (e.g. max 1% slowdown)

23 Slowdown Factor for Single-Issue (no overlap) Must define acceptable slowdown (e.g. 1%) < 0.01 T new -T old T old < 0.01 ∂ no-overlap T cpu -T cpu T mem + T cpu ∂ no-overlap < 0.01 STRIDE + NINSTR NINSTR

24 Slowdown Factor for Multiple-Issue (ideal overlap) Ideal case: Computation and memory stalls can be perfectly overlapped Assuming any reduction from peak IPC is due to memory stalls ‘Stretch’ computations to match performance of memory ∂ ideal+overlap = Peak IPC NINSTRS STRIDE

25 Effect of Table Size on Energy Consumption

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Similar presentations

Presentation on theme: "Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell.

Similar presentations

Presentation on theme: "Hardware Architectures for Power and Energy Adaptation Phillip Stanley-Marbell."— Presentation transcript:

Similar presentations

About project

Feedback