Analysis of and Dynamic Page Remapping Technique to Reduce L2 Misses in an SMT Processor CSE 240B Class Project Spring 2005, UCSD Subhradyuti Sarkar Siddhartha Saha
Motivation ● Considerable amount of penalty for cache miss. L2 miss penalty is usually orders of magnitude higher than L1 miss. ● SMT Processors maybe more vulnerable to L2 misses due to: – If more than one identical thread runs in the processor, then they will always collide in the same cache page – Even for different threads, if they are compiled by the same compiler then they will have similar virtual address range for stack, heap and data segment.
Introduction ● In this work, we look at the hybrid hardware/software technique to to reduce L2 cache misses in an SMT processor. ● We use a set of hardware counters for every cache page to keep track of the relative hotness/coldness of cache pages. ● If the miss-rate and/or access rate amongst the cache pages become skewed over a certain threshold, we use an adaptive algorithm which tries to smooth out the cache utilization.
Contribution Summary ● An adaptive algorithm which can detect variation in utilization amongst cache pages. ● Another algorithm that can smoothen the cache utilization, possibly improving the cache preference.
Hot/Cold Detection Algorithm ● Short Term History – We detect if a cache page is hot, cold and neutral in last epoch – COLD: ● access_count[i] < total_access_count/N*t_cold – HOT: ● miss_rate[i] > t_miss – && ● access_count[i] > total_access_count/N*t_hot – NEUTRAL: ● otherwise
Hot/Cold Detection Algorithm ● Long Term History – We keep a N element circular history to keep the state of the cache pages for last N epochs. – In our simulations, we took N = 4 ● Based on the Long Term History, we determine when to classify a page as HOT or COLD. ● If number of HOT pages and number of COLD pages is non-zero, then call the re- coloring algorithm.
Re-coloring Algorithm ● For each page, keep track of the virtual pages which access the cache page most frequently. ● From each HOT page, move all but one frequently accessed virtual page to a COLD page in cache. ● We exit when number of HOT pages or COLD pages becomes zero – or when a maximum number of pages have been re- colored.
Re-Coloring ● Ideally, the page in memory should be moved. ● Following the idea of Calder et al, we can achieve the same effect by modifying the TLB. ● We simulated this in SMTSIM by implementing a address remap module.
Processor IC DCDC MAF L2 Cache Processor IC DCDC MAF L2 Cache Translation Unit Hot/Cold Detection Remap Changes to SMTSIM
Result
Future Work ● This experiments did not produce very good results. But there are further scopes of improvements. ● Many more design choices are there for the re-coloring algorithm. Our choice was a basic one. ● Based on the HOT/COLD information, the effect of a skewed cache indexing may also be investigated.