Download presentation
Presentation is loading. Please wait.
1
1 Drowsy Caches Simple Techniques for Reducing Leakage Power Krisztián Flautner Nam Sung Kim Steve Martin David Blaauw Trevor Mudge krisztian.flautner@arm.com kimns@eecs.umich.edu stevenmm@eecs.umich.edu blaauw@eecs.umich.edu tnm@eecs.umich.edu
2
2 Motivation On-chip caches responsible for 15%~20% of the total power leakage power can exceed 50% of total cache power according to our projection using Berkeley Predictive Models Ever increasing leakage power as feature size shrinks V t scales down exponential increase in leakage power
3
3 Processor power trends Based on ITRS roadmap and transistor count estimates. Total power in this projection cannot come true.
4
4 An observation about data caches L1 data caches Working set: fraction of cache lines accessed in a time window. Window size = 2000 cycles. Only a small fraction of lines are accessed in a window. Working set of current window Working set of current + 1, 8, and 32 previous windows
5
5 The Drowsy Cache approach Optimize across circuit-microarchitecture boundary: –Use of the appropriate circuit technique enables simplified microarchitectural control. Requirement: state preservation in low leakage mode. Instead of being sophisticated about predicting the working set, reduce the penalty for being wrong. Algorithm: Periodically put all lines in cache into drowsy mode. When accessed, wake up the line.
6
6 Access control flow – Awake tags Awake tag match Line wake upLine access Memory Awake tag miss Replacement Line wake up Awake tags Hit Miss Drowsy hit / miss adds at most 1 cycle latency Access to awake line is not penalized
7
7 Drowsy tags implementation is more complicated Is the complexity worth it? –Tags use about 7% of data bits (32 bit address) –Only small incremental leakage reduction Worst case: 3 cycle extra latency Access control flow – Drowsy tags Awake tag match Line wake upLine access Memory Awake tag miss Replacement Line wake up Drowsy tags Hit Miss Tag wake up Unneeded tags and lines back to drowsy
8
8 Low-leakage circuit techniques CircuitProsCons Gated-V DD Largest leakage reduction Fast mode switching Easy implementation Loses cell state ABB-MTCMOS Retains cell stateSlow mode switching DVS Retains cell state Fase mode switching More power reduction than ABB More SEU noise susceptible
9
9 Drowsy memory using DVS Low supply voltage for inactive memory cells –Low voltage reduces leakage current too! –Quadratic reduction in leakage power leakage path supply voltage for drowsy mode supply voltage for normal mode P = I V
10
10 Leakage reduction using DVS High-V t devices for access transistors reduce leakage power increase access time of cache Right Trade-off point 91% leakage reduction 6% cycle time increase Projections for 0.07μm process
11
11 Drowsy cache line architecture
12
12 Energy reduction Projections for 0.07μm process High leakage: lines have to be powered up when accessed. Drowsy circuit –Without high v t device (in SRAM): 6x leakage reduction, no access delay. –With high v t device: 10x leakage reduction, 6% access time increase. Drowsy
13
13 1 cycle vs. 2 cycle wake up Fast wakeup is important – but easy to accomplish ! –Cache access time: 0.57ns (for 0.07μm from CACTI using 0.18μm baseline). –Speed dependent on voltage controller size: 64 x L eff – 0.28ns (half cycle at 4 GHz), 32 x L eff – 0.42ns, 16 x L eff – 0.77ns. Impact of drowsy tags are quite similar to double-cycle wake up.
14
14 Policy comparison simple 2000 simple 4000 noaccess 4000
15
15 Energy reduction Theoretical minimum assumes zero leakage in drowsy mode Total energy reduction within 0.1 of theoretical minimum –Diminishing returns for better leakage reduction techniques Above figures assume 6x leakage reduction, 10x possible with small additional run-time impact Normalized Total EnergyNormalized Leakage Energy Run-time increase DVSTheoretical min.DVSTheoretical min. Awake tags0.460.350.290.150.41% Drowsy tags0.420.310.240.090.84% > 50% total energy reduction> 70% leakage energy reduction
16
16 Conclusions Simple circuit technique –Need high-V t transistors, low V dd supply Simple architecture –No need to keep counter/predictor state for each line –Periodic global counter asserts drowsy signal –Window size (for periodic drowsy transition) depends on core: ~4000 cycles has good E-delay trade-off Technique also works well on in-order procesors –Memory subsystem is already latency tolerant Drowsy circuit is good enough –Diminishing returns on further leakage reduction –Focus is again on dynamic energy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.