Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Similar presentations


Presentation on theme: "1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007."— Presentation transcript:

1 1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007

2 2 Introduction  Caches are organized at linesize granularity  Helps when spatial locality is high  Unused words when spatial locality is low  Unused words occupy space without contributing to cache hits  Filtering unused words allows cache to store more cache lines

3 3 Problem: Not all words are useful On average less than 60% words used (4.7/8) Cache line (64B) divided into 8 words of 8B each (1 MB 8-way L2 cache) Words used per line (avg)

4 4 Goal: Improving cache performance  Smaller linesize can result in fewer unused words  Smaller linesize degrades cache performance  Linesize of 32B increases MPKI for 14 of 16 benchmarks  Average MPKI increases by 25% Insight: Words usage stabilizes as line traverses from MRU to LRU Goal: Improving cache performance by filtering unused words

5 5 Insight Footprint = 8-bits per line that tracks word usage Most footprint updates occur early in recency stack Max recency position before footprint update 78% 5% 6% 11% MRU Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 LRU Recency Stack Line Distillation (LDIS): Evict unused words when line crosses certain recency

6 6 Outline  Background  Line Distillation  Experimental Evaluation  Interaction with Compression  Related Work and Summary

7 7 Framework for LDIS PROCESSOR ICACHEDCACHE footprint LOC WOC L2 Cache Distill Cache valid bits (sectored) Line Organized Cache Word Organized Cache Line from memory

8 8 Distill Cache (Operation) Traditional cache (4-way) LOC WOC MRU LRU BAC Four cases: 1.Cache Miss: Access to line D 2.LOC Hit: Access to line B 3.WOC Hit: Access to line A (word A0) 4.Hole Miss: Access to line A (word A1) Words used? Evict A[1:6] Install A0,A7 (A0,A7 used) Install Line D in LOC and update LRU state Same as traditional cache Send A0 and A7 to L1 and valid bits Invalidate all words of A in WOC. Fetch A from Memory and install in LOC D A0,A7

9 9 Median Threshold Filtering A line with many used words can evict several lines from WOC A0A0 B0B0 C0C0 D0D0 E0E0 F0F0 G0G0 H0H0 Line X has all 8 words used X0X0 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 8 Lines evicted from WOC WOC Increase lines in WOC by not installing lines for which used words > threshold “K” K = median words used in LOC line (computed at runtime)

10 10 Outline  Background  Line Distillation  Experimental Evaluation  Interaction with Compression  Related Work and Summary

11 11 Methodology  Configuration: L2 cache: 1MB 8-way 64B linesize ( Distill cache gives 6 ways to LOC and 2 ways to WOC) Out-of-order processor with 16KB 2-way L1s 400 cycle memory  Benchmarks: 15 SPEC2K benchmarks + health from olden suite (A 250M instruction slice using SimPoint for SPEC2K)

12 12 Results LDIS (No MT) LDIS (with MT) LDIS (MT) reduces MPKI by 25% (%) Reduction in L2 MPKI

13 13 Reverter Circuit (RC)  Tournament selection: Distill cache vs. traditional cache  Dynamic set sampling with 32 sets [Qureshi+ ISCA’06] For sets A, C, D, F, H: if (SCTR > 75%) Enable LDIS if (SCTR < 25%) Disable LDIS ATD-LRU Distill cache Set B Set E Set G Set A Set C Set D Set F Set H Set B Set E Set G Set A Set C Set D Set F Set H Set B Set E Set G SCTR - + (storage overhead of ATD: 1KB)

14 14 Results with RC LDIS (MT, No RC) LDIS (MT,RC) RC disables LDIS when it increases MPKI. LDIS (MT,RC) reduces MPKI by 30% (%) Reduction in L2 MPKI

15 15 Overheads  Storage Tags for WOC + footprint bits: 12.2% overhead  Latency Tag-access (LOC+WOC) increases by one cycle WOC hits incur two cycles to rearrange words  Power Additional power of WOC tag-store

16 16 IPC Results LDIS improves average IPC by 12% (%) IPC Improvement

17 17 Outline  Background  Line Distillation  Experimental Evaluation  Interaction with Compression  Related Work and Summary

18 18 Compression vs. LDIS  Several proposals to increase capacity via compression  Compression and LDIS fundamentally different  Compression exploits redundancy in stored data  LDIS leverages unused words for spare capacity  Footprint Aware Compression (FAC) combines both  FAC compresses used words before installing in WOC

19 19 Results for FAC Compression and LDIS interact positively. FAC reduces MPKI by 50% LDIS Compression FAC (%) Reduction in L2 MPKI 50 40 30 20 10 0

20 20 Outline  Background  Line Distillation  Experimental Evaluation  Interaction with Compression  Related Work and Summary

21 21 Related work  Spatial-Temporal Cache -Gonzales+ [ICS’95]  Spatial Locality Prediction –Johnson+ [ISCA’97]  Variable Linesize Cache –Veidenbaum+ [ICS’99]  Spatial Footprint Prediction –Kumar+ [ISCA’98], Pujara+ [HPCA’06]  Spatial Pattern Prediction -Chen+ [HPCA’05] LDIS is particularly suited for large caches and outperforms predictor-based techniques without requiring separate structure for tracking spatial footprint

22 22 Contributions  Line Distillation: Filter unused words without a separate footprint predictor  Distill cache: Utilize extra capacity created by LDIS  Median Threshold Filtering and Reverter Circuit: Improve performance and robustness of LDIS Result: LDIS (MT+RC) reduces MPKI by 30%  Footprint Aware Compression: LDIS + compression Result: FAC reduces MPKI by 50%

23 23 Questions

24 24 Result comparing capacity

25 25 Line Size vs. MPKI

26 26 Distribution of Hit-Miss

27 27 Average words usage (detailed)

28 28 Result for 3 types of LDIS

29 29 Replacement  LRU in LOC  WOC needs variable sized replacement  Only power-of-two sizes allowed in WOC  Placement constrained to alignment boundary  Random selection in case of multiple candidates

30 30 Background (pictorial)

31 31 Result LDIS vs. FAC (detailed)

32 32 Comparison with SFP

33 33 Appendix A: Other SPEC Benchmarks

34 34 Appendix B: Cache Size vs. Density

35 35 Summary  Many words in cache lines remain unused  Unused words unlikely to be accessed in less recent part of LRU stack  Line Distillation (LDIS)  Distill-cache utilizes extra capacity created by LDIS  LDIS reduces MPKI by 30% and improves IPC by 12%  “Footprint Aware Compression” combines LDIS and compression to reduce MPKI by 50%


Download ppt "1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007."

Similar presentations


Ads by Google