Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University Hohhot, Inner Mongolia, P. R. China JWAC-1: Cache Replacement Championship ISCA-2010
Background Cache Replacement Policy plays an important role in a cache design. LRU policy is widely used in nowadays microprocessor The LLC has poor locality due to the L1 already filters temporal locality LRU causes thrashing when working set > cache size Inner Mongolia University College of Computer ScienceJWAC-1: Cache Replacement Championship
Possible solution if working set > cache size, retain some working set [Qureshi, et al, ISCA’07] record part of a longer cache access history College of Computer Science Inner Mongolia University How we do it? Grouping a cache set and keeping part of access history in each group. Inspired by the thread migration paper of Pierre at HPCA’04 L2 C0C0 C1C1 CnCn g0g0 g1g1 gngn JWAC-1: Cache Replacement Championship
Overview Proposal: Subset Based Replacement Policy (SRP) Inner Mongolia University College of Computer Science ASRP obtains a 4.5 % of geometric average miss reduction over LRU. JWAC-1: Cache Replacement Championship SRP successfully reduces the misses through retaining part of longer history in the groups. But the static SRP does not suitable for different programs. To adapt the diversity of programs and the behavior changing inside a program, we propose Adaptive SRP policy (ASRP).
Outline Introduction Static Subset Based Replacement Policy Adaptive Subset Based Replacement Policy Summary College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship
Static Subset Based Replacement Policy Inner Mongolia University College of Computer ScienceJWAC-1: Cache Replacement Championship subset Cache set Active: Accept insertion Non-Active Local LRU Stack
Insertion scheme in SRP Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship Insertion only occurs in active subset Choose victim at LRU position. Do NOT promote to MRU abcd MRULRU abci Reference to ‘i’ blocks in active subset
Operation on cache hit in SRP Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship hit in any (active or non-active) subset abcd MRULRU Reference to ‘c’ cabd Move to local MRU position
Changing of active subset When the misses in a set > a threshold X, change active subset Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship Thus: A. force X consecutive misses only replacing the blocks in active subset B. assume N subsets, then a subset can change to active again ONLY after (N-1)*X misses C. a greater value of X, a longer time that blocks in non-active subsets can stay in a set
Thrashing access pattern in SRP College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship b 1 b 2 b 3 b 4 b 5 b 6 b 7 b 8 b 9 b 10 b 11 b 12 b 13 b 14 b 15 b 16 b 17 ….. b 24 x = 6 assume working set is 24 blocks, LLC is 16-way, 4 subsets, 4 blocks/subset b1 b2 b3 b4 LRU MRU Subset 0 b5b6b7 b8 b9 b10 b11b12 Subset 1 b6 b2 b3 b4 Blocks in a set with SRP: b 2 b 3 b 4 b 6 b 8 b 9 b 10 b 12 b 14 b 15 b 16 b 18 b 20 b 21 b 22 b 24 Blocks in a set with LRU: b 9 ….. b 24 When access b 2 b 3 b 4 b 6 b 8 again, SRP hits but LRU misses
Case Study of thrashing workload Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship Different static thresholds have different abilities to reduce misses
Hardware implementation Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship MRU LRU
Results Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship SRP reduces misses for thrashing workloads but increases for LRU-friendly ones. Not exist a threshold that is suitable for all benchmarks
College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship Outline Introduction Static Subset Based Replacement Policy Adaptive Subset Based Replacement Policy Summary
College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship Adaptive SRP policy Different programs prefer different thresholds. Victim selection and insertion policy are same as in SRP ONLY difference: threshold is selected dynamically from a pool of values according to which one causes fewest misses. The maximum threshold is 128 Pick eight values: 2 0, 2 1, …, 2 7 Apply the best threshold value to the cache In ASRP policy:
++ ASRP policy via “Set Dueling” Divide the cache into two type: Sampling sets (eight thresholds * 4sets/thres.) Follower sets Eight counters misses to threshold X’s sampling sets: counter_x++ Counters decides threshold for Follower sets: counter with smallest value Thres-2 0 -sets Follower Sets Thres-2 1 -sets Thres-2 7 -sets Cntr_0 miss Cntr_7 Eight thresholds JWAC-1: Cache Replacement Championship College of Computer Science Inner Mongolia University
Resetting mechanism Eight thresholds last_follow = global_follow Y ++ N -- threshold >? Cntr_0 Cntr_7 reset JWAC-1: Cache Replacement Championship College of Computer Science Inner Mongolia University To avoid the accumulative effect of a big value in a specific Cnrt_x Record the times of a same threshold is selected by the follower sets When the times > a threshold, reset all the Cntr_Xs
College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship Budget Totally 45K bits only 70% of the budget used by LRU policy, and 35% of the total budget provided by this championship
College of Computer Science Inner Mongolia University Results For 1MB 16-ways LLC. ASRP gets a geometric average speedup of 4.5% over LRU JWAC-1: Cache Replacement Championship
Analyze College of Computer Science Inner Mongolia University xalancbmk GemsFDTD JWAC-1: Cache Replacement Championship The sampling mechanism does help ASRP to find the best thresholds for different programs
Conclusion Keeping part of working set in the cache helps reducing misses when the cache suffers a thrashing problem The part of longer access history helps SRP more accurately capturing the frequently used blocks Different programs and different phases of a program prefer different thresholds to contribute maximum hits to the cache “Set Dueling” helps ASRP dynamically selecting a suitable threshold The experiment results show the effectiveness of ASRP policy Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship
Thank you! Any question? College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship
Result on multi-core processor College of Computer Science Inner Mongolia University JWAC-1: Cache Replacement Championship
Case Study of LRU-friendly workload Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship
Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship Explanation of active subset changing
A simple example of SRP policy Inner Mongolia University College of Computer Science JWAC-1: Cache Replacement Championship