Power and Frequency Analysis for Data and Control Independence in Embedded Processors Farzad Samie Amirali Baniasadi Sharif University of Technology University of Victoria
This Work Goal Power and frequency analysis for control independent and data independent instructions in embedded processors Motivation Embedded processors are becoming complex Modern embedded processors use speculation Mis-speculation causes performance and power penalty Power is a major concern in embedded processors Save power and gain performance 2
This Work (cont.) Our Approach Reducing wasted energy and time in mispredictions. How? Identify and bypass Control Independent (CI) and Data Independent (DI) instructions. CIs: Instruction executing independent of branch outcome. CI-DI: CI Instructions executing with the same operands. Key Result: 12% processor energy reduction. 3
Background Branch Prediction 4 Branch Predictor Branch History Program Counter Predicted direction Predicted target address
Wrong Path (squashed) ?? Background (cont.) 5 I1I1 I2I2 I3I3 I4I4 I7I7 I8I8 I9I9 I5I5 I6I6 Branch Inst. Not taken Misprediction Detection Taken Right Path I9I9 I8I8 I7I7 I 12 I 11 I 10 Control Independent Instructions (CIs)
Background (cont.) 6 R 1 ←R 1 +R 2 Not takenTaken R 4 ←R 1 If (R4=0) R 2 ←R 4 -R 1 R 5 ←R 2 -R 3 R 3 ←0 R 5 ←R 4 +1 R 1 ←R 1 -1 R 3 ←0 R 4 ←R 6 +R 4 R 1 ←R 4 +R 1 R 5 ←R 5 -2 R 3 ←R 3 -R 4 Data Independent (CI-DI) Data Dependent (CI-DD) Data Independent (CI-DI) R 1 ←R 1 -1 R 5 ←R 2 -R 3 R 5 ←R 4 +1
CI-DI vs. CI-DD Bypassing CI-DIs saves more energy No need to read operands/execute again Bypassing CI-DIs provides higher performance Not need to waste time for reading operand/executing 7 FetchIssueDispatchExecute Write Back CI-DD CI-DI
Methodology Modified SimpleScalar Wattch for power measurement MiBench: Embedded Benchmark Suite 8
Distribution Wrong Path: 12%, CI: 5%, CI-DI: 2% 9
CI Power Reduction in Different Units Max: branch predictor unit, Min: instruction cache 10
CI Power Reduction in Stages 11 Rijndael: low misprediction low wrong path low CIs
Power Sensitivity to RUU size 12 CI CI-DI Higher power dissipation for bigger RUU sizes
Power Sensitivity to Execution Bandwidth 13 CI CI-DI Higher power dissipation for wider execution bandwidth
Power Sensitivity to Branch Predictor Size 14 Little sensitivity to branch predictor size
Related Work Rotenberg et. al: studied control independence in superscalar processors, HPCA99. Collins et. al: suggested mechanism to predict re-convergent point, Micro04. Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92. Gandhi et. al: recover selected branch mis-prediction, HPCA04. 15
Conclusion Categorize CI to CI-DI and CI-DD Potential power saving for bypassing CI and CI-DI instructions up-to 12% High sensitivity to RUU size High sensitivity to execution bandwidth Little sensitivity to branch predictor size 16
Question Thank you 17