Download presentation
Presentation is loading. Please wait.
Published byLisa Sullivan Modified over 8 years ago
1
MICRO-48, 2015 Computer System Lab, Kim Jeong Won
2
Motivation First, in most high volume CPU designs, the program counter (PC) is unavailable at this level in the cache hierarchy Second, a prefetcher located at the last level cache must deal with physical addresses directly without the benefit of a TLB or other page table information
3
Idea · Addresses patterns in page → (A, A-24, A+1, A-23, A+2, A-22, A+3) · Extracted delta patterns → (-24, +25) · Five common delta sequences found in LBM
4
Proposal_ Variable Length Delta Prefetcher (VLDP) A key innovation of VLDP → T he use of multiple DPT tables features of VLDP · enables the prediction of complex multi-delta access patterns · works on a per-page basis, and it can prefetch a di ff erent complex pattern for each page · uses multiple global prediction tables that can learn common access patterns across many pages · these prediction tables are indexed by varying lengths of delta histories
5
Proposal_ Delta History Bu ff er (DHB) · Page Num. - page number · Last Add. - page o ff set of the last address accessed in this page · Last 4 Deltas - sequence of up to 4 recently observed deltas · Last Predictor - the DPT level used for the latest delta prediction · Num. Times Used - the number of times this page has been used · Last Four Prefetched Offsets - sequence of up to 4 recently prefetched o ff sets The Delta History Bu ff er (DHB) tracks delta histories for recently accessed pages These histories, in turn, are used to lookup the DPT and predict future memory requests
6
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 124
7
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number if DHB miss 1.DHB entry is evicted and assigned to the new page number 2.The page o ff set of the cache line is recorded in the last address field *On subsequent hits to this page in the DHB 3.Delta is computed 4.then added to the delta sequence (last 4 deltas) 5.Last add is updated 6.4 most recent deltas maintained 154
8
Proposal_ Prefetch Activation Events (PAE) PAE occurs → a fully associative search in the DHB to find an entry with a matching page number DHB hit(after the DHB entry has been updated with the most recent delta) 1.The newly updated delta history is used to index the DPT 2.The DHB entry stores the ID of the DPT
9
Proposal_ O ff set Prediction Table (OPT) O ff set Prediction Table OffsetDelta predictionAccuracy 1b · Offset – page offset · Delta prediction – predicted delta for second page access · Accuracy – 1-bit accuracy field OPT prediction = delta:1 → 1 OPT prediction = delta: 0 → 1 OPT prediction ≠ delta: 1 → 0 OPT prediction ≠ delta: *0 → 0 if the accuracy bit was already 0, the old predicted delta is replaced with the new observed delta 10 match not match *not match
10
Proposal_ Delta Prediction Table (DPT) A key feature of the DPT → it is not just a single table, but rather a set of cascaded tables · Deltas - delta history(obtained from the DHB) used as the keys · Pred - delta predictions used as the values · Accuracy - 2-bit accuracy counter · nMRU - 1-bit nMRU value
11
Proposal_ Delta Prediction Table (DPT) DPT updated by PAE · any new delta patterns will be allocated in the DPT · accuracy bits can be updated · if the prediction accuracy is su ffi ciently low, the delta prediction field may be updated to reflect the new delta
12
Proposal_ Multi-Degree Prefetch
13
Result_ Simulator Parameters
14
Result_ Performance Evaluation · 17.2% better than FDP · 8.5% better than SBP · 5.8% better than AMPM
15
Result_ Comparing VLDP to Prefetchers that use the Program Counter · VLDP has an accuracy of 61% · GHB has an accuracy of 33% · 7.1% better than GHB PC/DC · 7.6% better than SMS
16
Result_ Cache Misses and Prefetcher Coverage
17
Result_ Prefetcher Accuracy and DRAM accesses DRAM accesses · FDP has 3.7% · SMS has 60.5% · SBP has 22.6% · GHB has 5.4% · AMPM has 13.4% · VLDP has 17.2%
18
Result_ Sensitivity Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.