Performance Evaluation of Cache Replacement Policies for the SPEC CPU2000 Benchmark Suite Hussein Al-Zoubi
Overview Introduction Common cache replacement policies Experimental methodology Evaluating cache replacement policies: questions and answers Conclusion
Introduction Increasing speed gap between processor and memory Modern processors include multiple levels of caches, cache associativity increases Replacement policy: Which block to discard when the cache is full
Introduction...cont. Optimal Replacement (OPT) algorithm: replace cache memory block whose next reference farthest away in the future, infeasible State-of-the-art processors employ various policies
Introduction...cont. Random LRU (Least Recently Used) Round-robin (FIFO – First-In-First-Out) PLRU (Pseudo Least Recently Used) : reduce the hardware cost by approximating the LRU mechanism
Introduction...cont. Our goal: explore and evaluate common cache replacement policies how existing policies relate to OPT effect on instruction and data caches how good are pseudo techniques in approximating true LRU
Common cache replacement policies: LRU
Common cache replacement policies … cont. Random policy: simpler, but at the expense performance. Linear Feedback Shift Register (LFSR) Round Robin (or FIFO) replacement: replacing oldest block in cache memory. Circular counter
Common cache replacement policies … cont. PLRUt
Common cache replacement policies … cont. PLRUm
Experimental methodology sim-cache and sim-cheetah simulators Alpha version of the SimpleScalar original simulators modified to support additional pseudo-LRU replacement policies sim-cache simulator modified to print interval statistics per specified number of instructions
Evaluating cache replacement policies: questions and answers Q: How much associativity is enough for state-of-the-art benchmarks? A: For data cache, performance gain for transition from a direct mapped to a two- way set associative cache For instruction cache, OPT replacement policy benefits from increased associativity. realistic policies don t exploit more than 8 ways, or in some cases even more than 2 ways
Evaluating cache replacement policies: questions and answers … cont. Q: How much space is there for improvement for each specific benchmark and cache configuration?
Evaluating cache replacement policies: questions and answers … cont. Q: Do replacement policies behave differently for different types of memory references, such as instruction and data? A: In general, LRU policy has better performance than FIFO and Random with some exceptions
Evaluating cache replacement policies: questions and answers … cont. Q: Can dynamic change of replacement policy reduce the total number of cache misses? A: If one policy better than the other, it stays consistently better
Evaluating cache replacement policies: questions and answers … cont. Can we use most recently used information for cache way prediction?
Evaluating cache replacement policies: questions and answers … cont. Q: How good are pseudo LRU techniques at approximating true LRU? A: PLRUm and PLRUt very efficient in approximating LRU policy and close to LRU during whole program execution
Conclusion Eliminating cache misses extremely important for improving overall processor performance Cache replacement policies gain more significance in set associative caches Gap between LRU and OPT replacement policies, up to 50%, new research to close the gap is necessary