Download presentation
Presentation is loading. Please wait.
Published byMadeleine Lynch Modified over 9 years ago
1
Speculative Software Management of Datapath-width for Energy Optimization G. Pokam, O. Rochecouste, A. Seznec, and F. Bodin IRISA, Campus de Beaulieu 35042 Rennes Cedex, France
2
2 Context Embedded applications use to operate on 8-/16-bit data > 50% of program instructions in some case New opportunities for energy reduction … clock-gating at finer granularity, i.e. operand level
3
3 Exploiting narrow-width operands Dynamic approachCompiler approach 1. cycle-by-cycle operand gating 2. complex hardware mechanisms required 1. based on static data flow analysis 2. must be overly conservative to preserve program correctness Brooks, et al. HPCA-99Stephenson, et al. PLDI 2000
4
4 Our approach Don’t want to pay the cost of a hardware scheme to detect when to clock-gate Don’t want to rely on static data flow analysis to discover bit-width ranges Dynamic approachCompiler approach narrow-width execution mode is speculative : exception management allows to recover to the correct mode Take advantage of dynamic approach to expose dynamic narrow-width operands to the compiler (via profiling) Use compiler approach to switch from normal to narrow-width mode and vice-versa (via a reconfiguration instruction)
5
5 Bit-width distribution analysis Cumulative distribution [Powerstone benchmarks] one operandtwo operands Narrow-width operands occurrence
6
6 Bit-width distribution analysis Dynamic distribution of narrow-width operands at basic block level (adpcm)
7
7 Outline Motivation Micro-architectural support Narrow-width regions formation Simulation platform Evaluation Conclusions
8
8 Register file model We address a new dimension: –reduce register file activity by reducing register file width We propose the byte-slice register file approach Tag bits Slice enable signal Row decoder 8bits 16bits 32bits 01 11 00110110 00110110 11000011 1. logically splitted 11110110 Prior work to reduce the energy consumption in register file –limited port connectivity –limited number of registers 2. low-power mode via drowsy technique (allows to preserve register cells content) Flautner et al. ISCA-29 0110010110
9
9 Reconfigurable data-path data-path resizable to accommodate to the bit- width execution mode (via clock-gating) –pipeline latches –ALU clock-gating at coarser granularity Slice-enable signal (8/16/32 mode) Write-back (8/16/32 mode) Bypass (8/16/32 mode) (8/16/32 mode) ALU LSU
10
10 Exception management Data-path width misprediction may occur due to a dynamic event Simple recovery scheme –the tag bits indicate the true data-width –upon a misprediction: trigger an exception recover to the correct execution mode
11
11 Address instructions Special care must be taken with address instructions –separate address calculation from memory access Use of dedicated registers for address computation –accumulator registers with additional ISA support (see paper for details)
12
12 Outline Motivation Micro-architectural support Narrow-width regions formation Simulation platform Evaluation Conclusions
13
13 A two steps process machine input data sets annotated.s file address transformation modified.s file Step 1 Step 2
14
14 Profiling Bit-width characteristics of selected regions 32 bits otherLD/ST with 32 bits8/16 bits Narrow-width operands 0% 20% 40% 60% 80% 100% weight of regions in program
15
15 Address instructions transformation Problem transform memory instructions into equivalent accumulator- based instructions add1 A graph partitioning formulation: –G, DDG of a BB – iff there is def-use relation between n and m load add2 add1 add -> Rx mov Rx -> ACC LDACC Ry add2 Select (n,m) such that n has a 32-bit width operand and m is a LD/ST instr Replace m with accumulator- based instructions Minimize cut-size, number of instructions to move data from regfile to accumulators
16
16 Instructions reordering Problem: –reorder instructions in a basic block such that operations with 32-bits operands are move around 8/16 bits operations
17
17 Outline Motivation Micro-architectural support Narrow-width regions formation Evaluation Conclusions
18
18 Lx processor platform –in-order –4-issue width –64 32-bit GPR –8 1-bit CBR –6 stages pipeline –4 ALUs, 1 LSU –2 MULs Simulation platform Tools –CACTI : register file energy access –HotLeakage: leakage energy
19
19 Analytical energy model Dynamic energy Static energy CACTI to determine HotLeakage to determine
20
20 Summary of results IPC degradation with varying misprediction penalty and varying bit-width convergence
21
21 Summary of results Dynamic energy reduction
22
22 Summary of results Register file static energy savings
23
23 Outline Motivation Micro-architectural support Narrow-width regions formation Evaluation Conclusions
24
24 Conclusions Contribution to power-aware compilation –speculative management of processor data-path in software –simple exception management scheme to repair a software misprediction Evaluation results –17% data-path dynamic energy savings –22% register file static energy savings –performance impact varies with implementation cost of the recovery scheme Future work –evaluation with larger granularity (e.g. trace) can reduce number of mispredictions can reduce amount of reconfiguration instructions
25
Thanks ! Questions …
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.