Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall
Outline Background Motivation for Noise Prediction Concepts of Noise Prediction Implementation of Noise Prediction Related Work Prefetching Data Profiling Conclusion
Background Cache Fetch On Cache Miss Prefetch Exploiting Spatial Locality Cache words are fetched in blocks Fetch neighboring block(s) on a cache miss Results in fewer cache misses Fetches words that aren’t needed
Background Cache noise Words that are fetched into the cache but never used Cache utilization The fraction of words in the cache that are used Represents how efficiently the cache is used
Motivation for Noise Prediction Level 1 data cache utilization is ~57% for SPEC2K benchmarks [2] Fetching unused words: Increases bandwidth requirements between cache levels Increases hardware and power requirements Wastes valuable cache space [2] D. Burger et. al., Memory bandwidth limitations of future microprocessors, Proc. ISCA-23, 1996
Motivation for Noise Prediction Cache block size Larger blocks Exploit spatial locality better Reduce cache tag overhead Increase bandwidth requirements Smaller blocks Reduced cache noise Any block size results in suboptimal performance
Motivation for Noise Prediction Sub-blocking Only portions of the cache blocks are fetched Decreases tag overhead by associating one tag with many sub-blocks Words fetched must be in contiguous blocks of fixed size High miss-rate and cache noise for non- contiguous access patterns
Motivation for Noise Prediction By predicting which words will actually be used, cache noise can be reduced But: Fetching fewer words could increase the number of cache misses
Concepts of Noise Prediction Selective fetching For each block, fetch only the words that are predicted to be accessed If no prediction is available, fetch the entire block Uses a valid bit for each word and a words usage bit to track which words have been used
Concepts of Noise Prediction Cache Noise Predictors Phase Context Predictor (PCP) Based on the usage pattern of the most recently evicted block Memory Context Predictor (MCP) Based on the MSBs of the memory address Code Context Predictor (CCP) Based on the MSBs of the PC
Concepts of Noise Prediction Prediction table size Larger tables decrease the probability of “no predictions” Smaller tables use less power A prediction is considered successful if all the needed words are fetched If extra words are fetched, still considered a success
Concepts of Noise Prediction Improving Prediction Miss Initiator Based History (MIBH) Keep separate histories according to which word in the block caused the miss Improves predictability if relative position of words accessed is fixed Example: looping through a struct and accessing only one field
Concepts of Noise Prediction Improving Prediction OR-ing Previous Two Histories (OPTH) Increases predictability by looking at more than the most recent access Reduces cache utilization OR-ing more than two accesses reduces utilization substantially
Results Empirically, CCP provides the best results MIBH greatly increases predictability OPTH improves predictability only marginally while increasing cache noise Cache utilization increased from 57% to 92%
Results
Related Work Existing work focuses reducing cache misses, not on improving utilization Sub-blocked caches used mainly to decrease tag overhead Some existing work on prediction of which sub- blocks to load in a sub-blocked cache No existing techniques for predicting and fetching non-contiguous words
Related Work
Prefetching Prefetching improves the cache miss rate Commonly, prefetching is implemented by also fetching the next block on a cache miss Prefetching increases noise and increases bandwidth requirements
Prefetching Noise prediction leads to more intelligent prefetching but requires extra hardware On average, prefetching with noise prediction leads to less energy consumption In the worst case, energy requirements increase
Prefetching
Data Profiling For some benchmarks there are a low number of predictions The predictor table is too small to hold all the word usage histories Don’t increasing table size, profile the data Profiling increases prediction rate by ~7% Gains aren’t as high as expected
Data Profiling
Analysis of Noise Prediction Pros Small increase in miss rate (0.1%) Decreased power requirements in most cases Decreased bandwidth requirements between cache levels Adapts effective block size to access patterns Dynamic technique but profiling can be used Scaleable to different predictor sizes
Analysis of Noise Prediction Cons Increased hardware overhead Increases power in the worst case Not all programs benefit Profiling provides limited improvement
Other Thoughts How were benchmarks chosen? 6 of 12 integer and 8 of 14 floating point SPEC2K benchmarks were used Not all predictors were examined equally 22-bit MCP predictor performed slightly poorer than a 28-bit CCP 28-bit MCP? How can the efficiency of the prediction table be increased?