Adaptive Cache Replacement Policy By: Neslisah Torosdagli, Awrad Mohammed Ali, Stacy Gramajo Spring 2015
Introduction What is Adaptive Cache Replacement? Motivations Given more than one cache replacement algorithm, one is chosen to access memory based on recent memory accesses Motivations Cache performance is critical for various reasons Changing the hardware (i.e. increasing size of cache) can only go so far Various replacement policies work their best in different situation Combining different replacement policies can help with performance
Research Paper Implementation Author: Yannis Smaragdakis ACM 2004 Hardware Modifications First Structure: Two sets of parallel tag arrays Same size of the regular tag array of adaptive cache Contains cache contents for each replacement policy Second Structure: Miss History Buffer Tracks of past performance of cache misses for one replacement policy Replacement Algorithm For every memory reference, the parallel tag arrays and miss history buffer is updated
Research Paper Extension Assume that we are running a program of 2000 instructions which’s: first 1000 instructions perform pretty good with policy A last 1000 instructions perform pretty good with policy B At instruction 1000, miss rates are as follows: policy A : 250 policy B : 750 Since miss rate of policy A is less than miss rate of policy B, policy A will definitely be prefered for instruction 1001, 1002, .., 1500 although policy B performs better due to accumulated success of policy A Can we use an algorithm similar to tournament algorithm of branch prediction to solve this unfair selection?
Our Implementation - Software Used SimpleScalar Implementing Adaptive Cache Policy, LFR, and Branch-like Predictions Added Features LRU Tag Array LFU Tag Array Adaptive Tag Array Local Histories Global History Global History keeps track of which replacement algorithm was used Local History records the history of hits and misses of a policy. Miss Counts
Research Paper Extension Global History Buffer 32 bits 0 - Policy A is used 1 - Policy B is used Local History Buffer 0 - Policy missed 1 - Policy Hit Global History Buffer 1 0 0 1 32 bits 0 1 32 1 1 32 bits 1 Local History Buffer 1 1 1 32 bits 1
Issues In the initial implementation, history buffers are initialized to 0: Since 0 refers to policy A, voting unfairly selects policy A. History buffers are filled randomly: Un-consistent results are obtained directly proportional with the success of randomization function History ready buffers are added to the implementation: Voting algorithm uses error counts until history buffers are filled completely, Once history buffers are ready, voting algorithm uses history buffers and error counts to make a decision
Our Implementation - Software Added three additional arrays Global History Ready (1) Local History Ready (2) The additional features removes the randomization problem from occurring Miss Counts
Adaptive Replacement UML Diagram
Policies The adaptive cache mimic either LFU or user selected cache mechanism by SimpleScalar command line arguments: LRU: evict the blocks that least recently used strong in conditions where access is made mainly to the most recent items, such as an application computing average temperature of last 2 hours LFU: evict the blocks with the lowest referenced frequencies through creating a counter. strong in conditions where large regions of blocks used only once from commonly accessed data Fails when there is an item that is accessed frequently in the past, and not accessed anymore. LFU do not evict it New items added to cache are more probably to be evicted
Configurations for Simulations First Configuration L1 Data and Instruction Cache 16 KB cache, 64B block, 4–way associative Unified L2 Cache 512K cache size, 64B block, 8-way associative Second Configuration L1 Data and Instruction Cache 16 KB cache, 64B block, 2–way associative Unified L2 Cache 512K cache size, 64B block 4-way associative
Results: Comparison MPKI (Misses Per Thousand Instructions = (# access/ # instructions) * miss rate * 1000
More Results
Success and Difficulties The project is implemented in an iterative manner shaped by issues faced SimpleScalar is slow and some benchmarks are not working correctly The results are in proportion with the paper
Conclusion Adaptive cache replacement policy guaranteed a good performance with small percentage of error We run our experiments using different configurations on different benchmarks to show that the adaptive policy is able to perform good on multiple platforms Future Recommendations Implement other replacement algorithms such as, Pseudo LRU (PLRU) and Segmented LRU (SLRU). Adaptively decide for replacement algorithm among more than 2 replacement policies
References [1] Smaragdakis, Yannis. "General adaptive replacement policies." Proceedings of the 4th international symposium on Memory management. ACM, 2004. [2] Subramanian, Ranjith, Yannis Smaragdakis, and Gabriel H. Loh. "Adaptive caches: Effective shaping of cache behavior to workloads." Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 2006. [3] Smaragdakis, Yannis. "General adaptive replacement policies." Proceedings of the 4th international symposium on Memory management. ACM, 2004. [4] E. G. Hallnor and S. K. Reinhardt. A Fully Associative Software-Managed Cache Design. In Proceedings of the 27th International Symposium on Computer Architecture, Vancouver, Canada, June 2000.
Questions?