On-demand solution to minimize I-cache leakage energy Group members: Chenyu Lu and Tzyy-Juin Kao
Motivation High power dissipation causes thermal problems, such as higher packaging, power delivery and cooling costs In 70nm technology, leakage may constitute as much as 50% of total energy dissipation Use the super-drowsy leakage saving technique Lower the supply voltage to a level (0.25V) near the threshold voltage (0.2V) Data can still maintain but can not be accessed Require one cycle penalty to wake up from the saving mode to the active mode Use the on-demand wakeup policy on the I-cache Only the cache lines currently in use need to be awake Accurately predict the next cache line by using the branch predictor On most branch mispredictions, the extra wakeup stage is overlapped with the misprediction recovery
Overview Super-drowsy cache line Wakeup prediction policy A Schmitt trigger inverter controls the voltage of the cache line at the leakage saving mode Replace multiple supply voltage sources Wakeup prediction policy enables on-demand wakeup The branch predictor already identifies which line need to be woken up No additional wakeup-prediction structure is needed
Methodology Leakage energy = drowsy_energy + active_energy + turn_on_energy Monitoring every cycle in sim-outorder: active_lines & turn_on Add a wake_bit to every block: 0: means it’s in drowsy mode this cycle 1: means it’s in active mode this cycle 2: means it’s in active mode this cycle and the next cycle 3: means it in drowsy mode this cycle and will be in active mode next cycle Update the wake_bit and count the active_lines every cycle using Update_wakeup() Change the wake_bit every instruction fetch using fetch_line() Improved strategy Interval * Active_Power < Interval * Drowsy_Power + Turn_On_Energy Speculate with a list of recently-accessed cache lines
Results
Change block size
Change interval
Future Work One cycle extra latency when target address misprediction (0.08% performance drop according to the paper) Apply On demand policy on data cache No prediction Extra latency can be hidden by locality and out of order execution