CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998
Cost of Speculation Mispredict rates
Pipeline Gating Low confidence branches throttle instr fetch until they are resolved Pipeline gating usually lasts for fewer than five cycles
Metrics SPEC (specificity): fraction of all mispredicted branches detected as low-confidence by the confidence estimator (coverage) PVN (predictive value of a negative test): probability of a low-confidence branch being incorrectly branch-predicted (accuracy)
Confidence Estimators Perfect: to gauge potential benefits Static: branches that have low prediction rates JRS: if a branch has yielded N successive correct predictions, it has high confidence Saturating counters: unbiased counter value or disagreement in two predictors low confidence Distance: mpreds are clustered, hence the first 4 branches after a mispredict have low confidence
SPEC and PVN It is easier to achieve a high SPEC value than PVN A high PVN value can be achieved by using N low-confidence branches to invoke gating – if PVN is 30%, re-defining low-confidence as two low-confidence branches increases PVN to 51% SPEC (coverage): mispred branches detected by low-confidence estimator PVN (accuracy): % of low-confidence branches that are branch mpreds
Perfect
Gating Results
Results Can gating improve performance? – only if cache pollution is significant Less than 1% performance loss and up to 38% reduction in extra work Energy consumption could go up – some work is independent of number of executed instrs (clock distribution) – incr. execution time can incr. Energy Pipeline gating should reduce power consumption
Results
CS 7810 Lecture 13 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001
Leakage Power Trends Circuit delay 1/(V – V th ) Leakage num transistors (incr) supply voltage (decr) (exp) low thresh. voltage (incr) L1 and L2 caches are the biggest contributors (high transistor budgets)
V dd -Gating Leakage can be reduced by gating off the supply voltage to the circuit When applied to a cache, the contents of the SRAM cell are lost Cache decay: apply Vdd-gating when you do not care about cache contents
Lifetime of a Cache Line
Overheads Hardware to determine when to decay Introduces additional cache misses Normalized cache leakage power = Activeratio (fraction of cache that is powered on) + (Counter overhead : Leak) x activity + (L2 access energy : Leak) x num-misses Increased execution time (< 0.7%) L2 access/leakage ratio is ~9
Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips: Optimal: $100 $200 $300 $400 $400 $400 Heuristic: $100 $200 $300 $800 $800 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far
Tracking Dead Time Each line has a 2-bit counter that gets reset on every access and gets incremented every 2500 cycles through a global signal (negligible overhead) After 10,000 clock cycles, the counter reaches the max value and triggers a decay Adaptive decay: Start with a short decay period; if you have a quick miss, double the period; if there is no miss, halve the period
Results
Overheads
Other Results L2 cache is equally suitable to decay techniques -- lifetimes are scaled by a factor of 10, an extra miss also costs a lot more For their experiments, there is little interference from multiprogramming Some instructions can easily be identified as last touches to a cache block – potential for early cache decay Can this apply to bpred, register file?
Title Bullet