Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July.

Similar presentations


Presentation on theme: "1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July."— Presentation transcript:

1 1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9

2 2 Motivation n Source of complexity on high- performance VLIW processors :  hardware duplication many FUs of different types (ALUs, LSUs, FPUs, BR, etc.) need large register file n Power growth factor compiler architecture complexity

3 3 Motivation n Assume a fixed ; does compiling for higher ILP results in dissipating less power ? n Which issues (architecture, software, etc.) affect power when compiling for ILP ? Try to figure out what happens analytically !

4 4 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

5 5 Metric n Performance to energy ratio (PTE) [Gonzales, R. et al.] : nb. of oper. per Basic Block : average nb. of oper. per bundle : energy per Basic Block higher is better

6 6 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

7 7 Energy Model n The execution of a bundle dissipates an energy : n Consider loop intensive kernels … Energy base cost Energy due to execution of bundle Energy due to D-cache misses Energy due to I-cache misses

8 8 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

9 9 Analysis n Use as a lever for power exploration n Assume R is a CFG region to be transformed into an ILP region H a sufficient condition for this is given by

10 10 Analysis n Idea:  keep track of IPC values that improve energy efficiency  solve the PTE inequality at : u : avg. #oper. in transformed region u : avg. #oper. in the CFG region R

11 11 Analysis where f : exec. freq. N : # of oper. n : # of bundles s : # stall due to dmiss m : #of BB in region C is a measure of extra work! Shape of ILPtransform function depends on sign of C

12 12 vs. C < 0: exponential shape means high extra work! dependence height mismatch resource contention C = 0 linear shape negligible extra work C > 0 Optimal scenario Logarithmic shape e.g. Hyperblock: Compensation code e.g. Hyperblock: Instruction merging

13 13 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

14 14 Hyperblock framework n predication model via the select instruction slct dest = cond, src1, src2 n only hammock regions are considered n single entry – single exit hyperblock

15 15 Transformation heuristic 1. build the loop tree 2. traverse the loop tree from innermost to outermost loop 3. evaluate profit for each candidate loop region 4. propagate profit to CFG after transformation

16 16 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

17 17 Platform n Lx Platform from STMicroelectronics 4-issue VLIW machine 64 GPRs, 8 CBRs 4 ALUs, 1 LD/ST, 2 MULs, 1 BU n Instruction-based energy model from STMicroelectronics n Lx compiler prefetch disabled only scalar optimizations (-O2)

18 18 Methodology n Post-pass optimization absciss SALTO Lx Compiler.s file Instrumentation: BB frequency Dmiss per BB Hyperblock formation Hyperblock optimization instr. promotion instr. merging instr. renaming source phase 1 phase 2 original CFG selective hyperblock all hyperblock

19 19 Results negligible IPC improvement relative larger increase of operation count and static schedule length ?

20 20 Agenda n Motivation n Used metrics n Energy model n Tradeoff analysis n Hyperblock example n Experiments n Conclusions

21 21 Conclusions n Analytical scheme to understand the impact of ILP compilation on energy n Heuristic shows 17% energy-delay improvement on a restricted hyperblock scheme è programs suffer from limited ILP which quickly turns into wasted energy è need to go beyond compiler-centric approaches in order to overcome ILP limitations n What is missing: impact of post-optimization passes has not been determined only a restricted hyperblock scheme has been evaluate

22 22 Thanks!


Download ppt "1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July."

Similar presentations


Ads by Google