MICRO-48, 2015 Computer System Lab, Kim Jeong Won
Motivation First, in most high volume CPU designs, the program counter (PC) is unavailable at this level in the cache hierarchy Second, a prefetcher located at the last level cache must deal with physical addresses directly without the benefit of a TLB or other page table information
Idea · Addresses patterns in page → (A, A-24, A+1, A-23, A+2, A-22, A+3) · Extracted delta patterns → (-24, +25) · Five common delta sequences found in LBM
Proposal_ Variable Length Delta Prefetcher (VLDP) A key innovation of VLDP → T he use of multiple DPT tables features of VLDP · enables the prediction of complex multi-delta access patterns · works on a per-page basis, and it can prefetch a di ff erent complex pattern for each page · uses multiple global prediction tables that can learn common access patterns across many pages · these prediction tables are indexed by varying lengths of delta histories
Proposal_ Delta History Bu ff er (DHB) · Page Num. - page number · Last Add. - page o ff set of the last address accessed in this page · Last 4 Deltas - sequence of up to 4 recently observed deltas · Last Predictor - the DPT level used for the latest delta prediction · Num. Times Used - the number of times this page has been used · Last Four Prefetched Offsets - sequence of up to 4 recently prefetched o ff sets The Delta History Bu ff er (DHB) tracks delta histories for recently accessed pages These histories, in turn, are used to lookup the DPT and predict future memory requests
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 124
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 154
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number DHB hit(after the DHB entry has been updated with the most recent delta) 1.The newly updated delta history is used to index the DPT 2.The DHB entry stores the ID of the DPT
Proposal_ O ff set Prediction Table (OPT) O ff set Prediction Table OffsetDelta predictionAccuracy 1b · Offset – page offset · Delta prediction – predicted delta for second page access · Accuracy – 1-bit accuracy field OPT prediction = delta:1 → 1 OPT prediction = delta: 0 → 1 OPT prediction ≠ delta: 1 → 0 OPT prediction ≠ delta: *0 → 0 if the accuracy bit was already 0, the old predicted delta is replaced with the new observed delta 10 match not match *not match
Proposal_ Delta Prediction Table (DPT) A key feature of the DPT → it is not just a single table, but rather a set of cascaded tables · Deltas - delta history(obtained from the DHB) used as the keys · Pred - delta predictions used as the values · Accuracy - 2-bit accuracy counter · nMRU - 1-bit nMRU value
Proposal_ Delta Prediction Table (DPT) DPT updated by PAE · any new delta patterns will be allocated in the DPT · accuracy bits can be updated · if the prediction accuracy is su ffi ciently low, the delta prediction field may be updated to reflect the new delta
Proposal_ Multi-Degree Prefetch
Result_ Simulator Parameters
Result_ Performance Evaluation · 17.2% better than FDP · 8.5% better than SBP · 5.8% better than AMPM
Result_ Comparing VLDP to Prefetchers that use the Program Counter · VLDP has an accuracy of 61% · GHB has an accuracy of 33% · 7.1% better than GHB PC/DC · 7.6% better than SMS
Result_ Cache Misses and Prefetcher Coverage
Result_ Prefetcher Accuracy and DRAM accesses DRAM accesses · FDP has 3.7% · SMS has 60.5% · SBP has 22.6% · GHB has 5.4% · AMPM has 13.4% · VLDP has 17.2%
Result_ Sensitivity Analysis