Download presentation
Presentation is loading. Please wait.
Published byAshtyn Alles Modified over 10 years ago
1
1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007
2
2 Introduction Caches are organized at linesize granularity Helps when spatial locality is high Unused words when spatial locality is low Unused words occupy space without contributing to cache hits Filtering unused words allows cache to store more cache lines
3
3 Problem: Not all words are useful On average less than 60% words used (4.7/8) Cache line (64B) divided into 8 words of 8B each (1 MB 8-way L2 cache) Words used per line (avg)
4
4 Goal: Improving cache performance Smaller linesize can result in fewer unused words Smaller linesize degrades cache performance Linesize of 32B increases MPKI for 14 of 16 benchmarks Average MPKI increases by 25% Insight: Words usage stabilizes as line traverses from MRU to LRU Goal: Improving cache performance by filtering unused words
5
5 Insight Footprint = 8-bits per line that tracks word usage Most footprint updates occur early in recency stack Max recency position before footprint update 78% 5% 6% 11% MRU Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6 LRU Recency Stack Line Distillation (LDIS): Evict unused words when line crosses certain recency
6
6 Outline Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
7
7 Framework for LDIS PROCESSOR ICACHEDCACHE footprint LOC WOC L2 Cache Distill Cache valid bits (sectored) Line Organized Cache Word Organized Cache Line from memory
8
8 Distill Cache (Operation) Traditional cache (4-way) LOC WOC MRU LRU BAC Four cases: 1.Cache Miss: Access to line D 2.LOC Hit: Access to line B 3.WOC Hit: Access to line A (word A0) 4.Hole Miss: Access to line A (word A1) Words used? Evict A[1:6] Install A0,A7 (A0,A7 used) Install Line D in LOC and update LRU state Same as traditional cache Send A0 and A7 to L1 and valid bits Invalidate all words of A in WOC. Fetch A from Memory and install in LOC D A0,A7
9
9 Median Threshold Filtering A line with many used words can evict several lines from WOC A0A0 B0B0 C0C0 D0D0 E0E0 F0F0 G0G0 H0H0 Line X has all 8 words used X0X0 X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 8 Lines evicted from WOC WOC Increase lines in WOC by not installing lines for which used words > threshold “K” K = median words used in LOC line (computed at runtime)
10
10 Outline Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
11
11 Methodology Configuration: L2 cache: 1MB 8-way 64B linesize ( Distill cache gives 6 ways to LOC and 2 ways to WOC) Out-of-order processor with 16KB 2-way L1s 400 cycle memory Benchmarks: 15 SPEC2K benchmarks + health from olden suite (A 250M instruction slice using SimPoint for SPEC2K)
12
12 Results LDIS (No MT) LDIS (with MT) LDIS (MT) reduces MPKI by 25% (%) Reduction in L2 MPKI
13
13 Reverter Circuit (RC) Tournament selection: Distill cache vs. traditional cache Dynamic set sampling with 32 sets [Qureshi+ ISCA’06] For sets A, C, D, F, H: if (SCTR > 75%) Enable LDIS if (SCTR < 25%) Disable LDIS ATD-LRU Distill cache Set B Set E Set G Set A Set C Set D Set F Set H Set B Set E Set G Set A Set C Set D Set F Set H Set B Set E Set G SCTR - + (storage overhead of ATD: 1KB)
14
14 Results with RC LDIS (MT, No RC) LDIS (MT,RC) RC disables LDIS when it increases MPKI. LDIS (MT,RC) reduces MPKI by 30% (%) Reduction in L2 MPKI
15
15 Overheads Storage Tags for WOC + footprint bits: 12.2% overhead Latency Tag-access (LOC+WOC) increases by one cycle WOC hits incur two cycles to rearrange words Power Additional power of WOC tag-store
16
16 IPC Results LDIS improves average IPC by 12% (%) IPC Improvement
17
17 Outline Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
18
18 Compression vs. LDIS Several proposals to increase capacity via compression Compression and LDIS fundamentally different Compression exploits redundancy in stored data LDIS leverages unused words for spare capacity Footprint Aware Compression (FAC) combines both FAC compresses used words before installing in WOC
19
19 Results for FAC Compression and LDIS interact positively. FAC reduces MPKI by 50% LDIS Compression FAC (%) Reduction in L2 MPKI 50 40 30 20 10 0
20
20 Outline Background Line Distillation Experimental Evaluation Interaction with Compression Related Work and Summary
21
21 Related work Spatial-Temporal Cache -Gonzales+ [ICS’95] Spatial Locality Prediction –Johnson+ [ISCA’97] Variable Linesize Cache –Veidenbaum+ [ICS’99] Spatial Footprint Prediction –Kumar+ [ISCA’98], Pujara+ [HPCA’06] Spatial Pattern Prediction -Chen+ [HPCA’05] LDIS is particularly suited for large caches and outperforms predictor-based techniques without requiring separate structure for tracking spatial footprint
22
22 Contributions Line Distillation: Filter unused words without a separate footprint predictor Distill cache: Utilize extra capacity created by LDIS Median Threshold Filtering and Reverter Circuit: Improve performance and robustness of LDIS Result: LDIS (MT+RC) reduces MPKI by 30% Footprint Aware Compression: LDIS + compression Result: FAC reduces MPKI by 50%
23
23 Questions
24
24 Result comparing capacity
25
25 Line Size vs. MPKI
26
26 Distribution of Hit-Miss
27
27 Average words usage (detailed)
28
28 Result for 3 types of LDIS
29
29 Replacement LRU in LOC WOC needs variable sized replacement Only power-of-two sizes allowed in WOC Placement constrained to alignment boundary Random selection in case of multiple candidates
30
30 Background (pictorial)
31
31 Result LDIS vs. FAC (detailed)
32
32 Comparison with SFP
33
33 Appendix A: Other SPEC Benchmarks
34
34 Appendix B: Cache Size vs. Density
35
35 Summary Many words in cache lines remain unused Unused words unlikely to be accessed in less recent part of LRU stack Line Distillation (LDIS) Distill-cache utilizes extra capacity created by LDIS LDIS reduces MPKI by 30% and improves IPC by 12% “Footprint Aware Compression” combines LDIS and compression to reduce MPKI by 50%
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.