Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A

Similar presentations


Presentation on theme: "Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A"— Presentation transcript:

1 Enhancing Signature Path Prefetching with Perceptron Prefetch Filtering
Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A. Jiménez3 1Texas A&M University 2Texas A&M International University 3Texas A&M University / Barcelona Supercomputing Center

2 Introduction Design Space: Standalone L1D, L2C and LLC Prefetchers
Distribution of hardware budget across three prefetchers Interaction among the prefetchers Control over placing the incoming prefetch line (L1D vs L2C vs LLC)

3 Key Ideas Aggressive L2C Prefetching Optimizing Prefetch Queue Sharing
Signature Path Prefetcher (SPP)[Kim, MICRO ‘16] Perceptron-based Prefetch Filtering (PPF)[Bhatia, ISCA ‘19] Optimizing Prefetch Queue Sharing Page based resource sharing Minimal LLC Prefetching Lack of information LLC is a shared resource among cores Coordination between levels Minimizing impact of noisy prefetches on lower level prefetchers

4 Page Based Resource Sharing
Prefetch Queue (PQ) limited in number Valuable resource for L1D / L2C Aggressive (but still accurate) prefetching Takes the current page deep into the speculation path Blocks PQ resources for other pages Timing disparity between multiple pages with interleaved accesses Efficient Resource Utilization Track number of distinct pages in last few memory accesses Divide PQ resource over those pages

5 L1D Prefetcher: Next-N-Lines
Fetches N consecutive lines wrt current demand address N determined through PQ resource availability Page level throttling Tracks per page access pattern for the last two accesses Scores page as +1 delta friendly or averse Throttles prefetching for averse pages

6 L2C Underlying Prefetcher: SPP
Lookahead Prefetcher Uses previous prefetch suggestion to trigger new speculation Recursively iterate and keep compounding the confidence Stop when the confidence falls below a certain threshold Threshold (hyperparameter) is an indication of aggressiveness Less threshold -> more aggressive -> more coverage -> less accuracy Pre-defined trade-off between coverage and accuracy

7 Enhanced SPP Decoupled coverage and accuracy concerns
SPP enhanced to its most aggressive extreme Helps capture complex memory access patterns Increases coverage Perceptron Filtering (PPF) takes care of accuracy

8 Hashed Perceptron Model
Use feature values to index into distinct tables Example: PC, memory address etc Prediction: Lookup, summation, threshold Use xi value to index into table of corresponding Wi Learning occurs when ground truth known Positive Outcome: Increment each feature’s partial prediction weight Negative Outcome: Decrement each feature’s partial prediction weight No multiplication, no division, no complex back-propagation

9 PPF Architecture Baseline prefetcher: SPP Modified for high coverage
Perceptron Weights Tables Tables of 5-bit up-down saturating counters 1 table per feature Variable depth, independent indexing Prefetch and Reject Tables Record prefetches for future training

10 PPF Design Prefetch suggestions tested using PPF
Outcome and indexing metadata recorded in Prefetch / Reject Table Subsequent feedback of a prior prefetch Same perceptron weights re-indexed and updated by +1 / -1

11 Putting Pieces Together
Single Core Configuration L1D: Enhanced Next-N-line L2C: PPF with SPP Triggered on all accesses to L2C Can place prefetches in L2C or LLC LLC: Next Line prefetcher Triggered on demand accesses and only last prefetch from L1D reaching LLC Uses the metadata communication path between the prefetchers Overhead: KBs Multi Core Configuration L1D: No Prefetching L2C: PPF with SPP Triggered on all accesses to L2C Can place prefetches in L2C or LLC LLC: SPP (without PPF) Separate tables for each core Modified to be less aggressive than the original SPP (LLC is a shared resource) Overhead: KBs

12 Results Improvement reported over no prefetching Single Core: 40.4%
Multi Core: 20.3%

13 Future Works Better baseline prefetchers for PPF
Interaction between the prefetchers Metadata communication path between the levels

14 Thank you!

15 Backup Slides

16 L2C Underlying Prefetcher: SPP


Download ppt "Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A"

Similar presentations


Ads by Google