Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr. , Joel Emer

Slides:



Advertisements
Similar presentations
Advanced Piloting Cruise Plot.
Advertisements

Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data.
SE-292 High Performance Computing
Chapter 4 Memory Management Basic memory management Swapping
ABC Technology Project
CS 241 Spring 2007 System Programming 1 Memory Replacement Policies Lecture 32 Klara Nahrstedt.
Page Replacement Algorithms
Online Algorithm Huaping Wang Apr.21
1 A Case for MLP-Aware Cache Replacement International Symposium on Computer Architecture (ISCA) 2006 Moinuddin K. Qureshi Daniel N. Lynch, Onur Mutlu,
Cache and Virtual Memory Replacement Algorithms
Chapter 3.3 : OS Policies for Virtual Memory
Module 10: Virtual Memory
Chapter 10: Virtual Memory
Virtual Memory II Chapter 8.
Memory Management.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
CRUISE: Cache Replacement and Utility-Aware Scheduling
Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Bypass and Insertion Algorithms for Exclusive Last-level Caches
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2012 National Heart Foundation of Australia. Slide 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Week 1.
SE-292 High Performance Computing
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
PSSA Preparation.
CpSc 3220 Designing a Database
High Performing Cache Hierarchies for Server Workloads
1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.
Prefetch-Aware Cache Management for High Performance Caching
Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
International Symposium on Computer Architecture ( ISCA – 2010 )
PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.
The Evicted-Address Filter
Cache Replacement Policy Based on Expected Hit Count
Improving Cache Performance using Victim Tag Stores
CRC-2, ISCA 2017 Toronto, Canada June 25, 2017
18742 Parallel Computer Architecture Caching in Multi-core Systems
Prefetch-Aware Cache Management for High Performance Caching
Adaptive Cache Replacement Policy
International Symposium on Computer Architecture ( ISCA – 2010 )
Lecture 14: Large Cache Design II
Presentation transcript:

Aamer Jaleel, Kevin B. Theobald, Simon C. Steely Jr. , Joel Emer Intel Corporation High Performance Cache Replacement Using Re-Reference Interval Prediction (RRIP) The ACM IEEE International Symposium on Computer Architecture (ISCA) conference, June 19–23, 2010, Saint-Malo, France. Chien-Chih(Paul) Chao Chih-Chiang(Michael) Chang Instructor: Dr. Ann Gordon-Ross

Agenda Motivation Background Re-Reference Interval Prediction (RRIP) Least Recently Used (LRU) policy Dynamic Insertion Policy (DIP) Least Frequently Used(LFU) policy Re-Reference Interval Prediction (RRIP) Not Recently Used policy Static RRIP Dynamic RRIP Experimental Methodology Results and Analysis Conclusion

Cache Replacement Policies Cache stores the frequently required data Discard items to make room for the new ones when cache is full. Knowledge based on “Memory Hierarchy” “The Librarian Example” Cache memory stores the data items that are frequently required by the processor When the cache is full, the algorithm must choose which items to discard to make room for the new ones http://en.wikipedia.org/wiki/Cache_algorithms http://www.mymodernmet.com/profiles/blogs/2100445:BlogPost:33176 http://i.crn.com/enc/CACHEMEM.GIF

Motivation Efficient last-level cache(LLC) utilization Avoid long latency cache misses to main memory Need a practical cache replacement policy that is not only thrash-resistant but scan-resistant LLC, also called as the L3 cache as we know

Background LRU Replacement Policies DIP (Improvement of LRU) Cache Access Patterns LRU/ LFU Hybrid replacement policy Comparison of DIP and Hybrid(LRU/LFU) Improving LLC performance by targeting cache blocks that are dead upon cache insertion Both industry and academia have produced an impressive amount of research work dedicated to improving the performance of replacement policies. While we cannot describe all replacement policies that exist in the literature, summarize prior art that most closely relates to improving LLC performance by targeting cache blocks that are dead upon cache insertion.

LRU Replacement Policies Least Recently Used(LRU) LRU Chain: LRU / MRU Re-Reference Interval Prediction (RRIP) chain Good with high data locality Bad performance when re-references only occur in the distant future. Near-immediate Distant It is a commonly used cache replacement policy Re-Reference Interval Prediction (RRIP) chain that represents the order in which blocks are predicted to be re-referenced

Dynamic Insertion Policy(DIP) Improves LRU replacement by dynamically changing the re-reference prediction Both DIP and LRU are failed t0 make accurate predictions when mixed re-reference patterns occur Scan: a burst of references to data whose re-reference interval is in the distant future improves LRU replacement in situations where the re-reference interval is in the distant future by dynamically changing the re-reference prediction from a near-immediate re-reference interval to a distant re-reference interval. A scan is any sequence of one-time use requests

Cache Access Patterns Recency-friendly Access Pattern Thrashing Access Pattern Streaming Access Pattern Mixed Access Pattern http://www.islington.gov.uk/education/libraries/borrowingfromlibrary.asp

Cache Access Patterns (cont.) Recency-friendly Access Pattern

Cache Access Patterns (cont.) Thrashing Access Pattern

Cache Access Patterns (cont.) Streaming Access Pattern

Cache Access Patterns (cont.) Mixed Access Pattern

LRU / LFU Replacement Policy Least Frequently Used(LFU) frequently accessed : near-immediate future infrequently accessed : distant future Measured by counter Features: DIP: Thrash-resistant LRU/LFU: Scan-resistant

Comparison of DIP and Hybrid(LRU/LFU)

Re-Reference Intervall Prediction Not Recently Used (NRC) replacement policy Static RRIP SRRIP with Hit priority SRRIP with Frequency priority Dynamic RRIP Behavior for a Mixed Access Pattern Experimental methodology and Results Simulator Benchmark

NRC replacement policy Motivation LRU cannot perform to mixed access patterns Chained-based LRU is impractical for highly associative caches The nru-bit Value of ‘1’ implies was recently used and is predicted to be re-referenced in the near-immediate future Value of ‘0’ implies was not recently used and is predicted to be re-referenced in the distant future

Static RRIP Motivation M-bit Re-Reference Prediction Values (RRPV) One bit of information is not enough NRU cannot identify non-scan blocks in a mix access pattern M-bit Re-Reference Prediction Values (RRPV) 2M possible RRPV eables intermediate re-reference intervals predicton Hit Priority (HP) Updates RRIP to be near-immediate on a hit Prioritize replacement of blocks with no hits Frequency Priority Decrementing the RRPV register on cache hits Prioritize replacement of blocks with infrequently re-ref

Behavior of LRU Mixed Access Pattern a1, a2, a2, a1, b1, b2, b3, b4, a1, a2 Cache Hit: Move block to MRU Cache Miss: Replace LRU block Move block to MUR

Behavior of NRC Mixed Access Pattern a1, a2, a2, a1, b1, b2, b3, b4, a1, a2 Cache Hit: Set nru-bit of block to ‘0’ Cache Miss: Search for first ‘1’ from left If ‘1’ found go to step (5) Set all nru-bits to ‘1’ Go to step (1) Replace block and set nru-bit to ‘0’

Behavior of 2-bit SRRIP with HP Mixed Access Pattern a1, a2, a2, a1, b1, b2, b3, b4, a1, a2 Cache Hit: Set RRPV of block to ‘0’ Cache Miss: Search for first ‘3’ from left If ‘3’ found go to step (5) Increment all RRPVs Go to step (1) Replace block and set RRPV to ‘2’

Dynamic RRIP Motivation Bimodal RRIP (BRRIP) Set Dueling SRRIP does not thrash-resistant Bimodal RRIP (BRRIP) Similar to Bimodal Insertion Policy of DIP Insert majority of cache blocks with distant re-ref Insert infrequently with a long re-ref interval Set Dueling Choose between scan-resistant SRRIP and thrash-resistant BRRIP by using two Set Dueling Mointors Use a single policy selection counter

Experimental Methodology Simulator CMP$IM 4-way out-of-oreder 128-entry reorder buffer 3 level cache hierarchy Benchmarks 5 workloads from SPEC CPU2006 9 “real world” workloads PC Games Multimedia Server

Results of SRRIP-FP Reduces MPKI by 5-18% Outpeform LRU by an average of 2.5%

Results of SRRIP-HP Reduces MPKI by 5-15% Outpeform LRU by an average of 5%

SRRIP-HP Sensitivity to Cache size SRRIP is insensitive when M>3 Wider RRPV retain blocks for longer periods 2-bit or 3-bit RRPV is sufficient to be scan-resistant

DRRIP Performance Improve avg 5% above SRRIP

Comparing to Other Policies Base on single-core processor with 16-way 2MB LLC RRIP requires less hardware than LRU yet outperform LRU on average RRIP requires 2.5X less hardware than HYB

Conclusion RRIP predicts intermediate re-ref between near-immediate and distant re-ref interval SRRIP needs only 2-bit for scan-resistant DRRIP for both scan-resistant and thrash-resistant SRRIP and DRRIP outperform LRU by an average of 4% and 10%