ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms.

Slides:

Advertisements

Similar presentations

1 Utility-Based Partitioning of Shared Caches Moinuddin K. Qureshi Yale N. Patt International Symposium on Microarchitecture (MICRO) 2006.

Advertisements

Yuejian Xie, Gabriel H. Loh. Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data.

Bypass and Insertion Algorithms for Exclusive Last-level Caches

Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.

Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.

D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.

FLEXclusion: Balancing Cache Capacity and On-chip Bandwidth via Flexible Exclusion Jaewoong Sim Jaekyu Lee Moinuddin K. Qureshi Hyesoon Kim.

1 Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007.

Adaptive Subset Based Replacement Policy for High Performance Caching Liqiang He Yan Sun Chaozhong Zhang College of Computer Science, Inner Mongolia University.

1 Lecture 9: Large Cache Design II Topics: Cache partitioning and replacement policies.

1 The 3P and 4P cache replacement policies Pierre Michaud INRIA Cache Replacement Championship June 20, 2010.

|Introduction |Background |TAP (TLP-Aware Cache Management Policy) Core sampling Cache block lifetime normalization TAP-UCP and TAP-RRIP |Evaluation Methodology.

Improving Cache Performance by Exploiting Read-Write Disparity

1 Lecture 10: Large Cache Design III Topics: Replacement policies, prefetch, dead blocks, associativity Sign up for class mailing list Pseudo-LRU has a.

Mitigating Prefetcher-Caused Pollution Using Informed Caching Policies for Prefetched Blocks Vivek Seshadri Samihan Yedkar ∙ Hongyi Xin ∙ Onur Mutlu Phillip.

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer.

Dyer Rolan, Basilio B. Fraguela, and Ramon Doallo Proceedings of the International Symposium on Microarchitecture (MICRO’09) Dec /7/14.

ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 5 Non-Uniform Cache.

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.

Achieving Non-Inclusive Cache Performance with Inclusive Caches Temporal Locality Aware (TLA) Cache Management Policies Aamer Jaleel,

Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter.

Moinuddin K.Qureshi, Univ of Texas at Austin MICRO’ , 12, 05 PAK, EUNJI.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Computer Architecture Lecture 26 Fasih ur Rehman.

Bypass and Insertion Algorithms for Exclusive Last-level Caches Jayesh Gaur 1, Mainak Chaudhuri 2, Sreenivas Subramoney 1 1 Intel Architecture Group, Intel.

Improving Cache Performance by Exploiting Read-Write Disparity Samira Khan, Alaa R. Alameldeen, Chris Wilkerson, Onur Mutlu, and Daniel A. Jiménez.

1 Utility-Based Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches Written by Moinuddin K. Qureshi and Yale N.

Sampling Dead Block Prediction for Last-Level Caches

MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,

International Symposium on Computer Architecture ( ISCA – 2010 )

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,

PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying.

Embedded System Lab. 정범종 PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie et al. ACM, 2009.

The Evicted-Address Filter

1 Lecture 12: Large Cache Design Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.

Quantifying and Controlling Impact of Interference at Shared Caches and Main Memory Lavanya Subramanian, Vivek Seshadri, Arnab Ghosh, Samira Khan, Onur.

Speaker : Kyu Hyun, Choi. Problem: Interference in shared caches – Lack of isolation → no QoS – Poor cache utilization → degraded performance.

15-740/ Computer Architecture Lecture 18: Caching in Multi-Core Prof. Onur Mutlu Carnegie Mellon University.

- 세부 1 - 이종 클라우드 플랫폼 데이터 관리 브로커 연구 및 개발 Network and Computing Lab.

Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.

Spring 2011 Parallel Computer Architecture Lecture 25: Shared Resource Management Prof. Onur Mutlu Carnegie Mellon University.

18-740/640 Computer Architecture Lecture 14: Memory Resource Management I Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 10/26/2015.

15-740/ Computer Architecture Lecture 22: Caching in Multi-Core Systems Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 11/7/2011.

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,

Samira Khan University of Virginia April 26, 2016

Cache Replacement Policy Based on Expected Hit Count

Samira Khan University of Virginia April 21, 2016

Improving Cache Performance using Victim Tag Stores

CRC-2, ISCA 2017 Toronto, Canada June 25, 2017

Prof. Onur Mutlu Carnegie Mellon University

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance

Adaptive Cache Partitioning on a Composite Core

ASR: Adaptive Selective Replication for CMP Caches

Xiaodong Wang, Shuang Chen, Jeff Setter,

18742 Parallel Computer Architecture Caching in Multi-core Systems

Prefetch-Aware Cache Management for High Performance Caching

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Energy-Efficient Address Translation

Application Slowdown Model

A Case for MLP-Aware Cache Replacement

Lecture 15: Large Cache Design III

CARP: Compression-Aware Replacement Policies

Massachusetts Institute of Technology

Lecture 14: Large Cache Design II

Presentation transcript:

ECE8833 Polymorphous and Many-Core Computer Architecture Prof. Hsien-Hsin S. Lee School of Electrical and Computer Engineering Lecture 6 Fair Caching Mechanisms for CMP

ECE8833 H.-H. S. Lee Cache Sharing in CMP [Kim, Chandra, Solihin, PACT’04] L2 $ L1 $ …… Processor Core 1Processor Core 2 L1 $ Slide courtesy: Seongbeom Kim, D. Chandra and Y. [Kim, Chandra, Solihin PACT2004]

ECE8833 H.-H. S. Lee Cache Sharing in CMP L2 $ L1 $ …… Processor Core 1 L1 $ Processor Core 2 ←t1 Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Cache Sharing in CMP Slide courtesy: Seongbeom Kim, D. Chandra and Y. L1 $ Processor Core 1 L1 $ Processor Core 2 L2 $ …… t2→

ECE8833 H.-H. S. Lee Cache Sharing in CMP Slide courtesy: Seongbeom Kim, D. Chandra and Y. L1 $ L2 $ …… Processor Core 1Processor Core 2 ←t1 L1 $ t2→ t2’s throughput is significantly reduced due to unfair cache sharing.

ECE8833 H.-H. S. Lee Shared L2 Cache Space Contention Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Impact of Unfair Cache Sharing 7 Uniprocessor scheduling 2-core CMP scheduling gzip will get more time slices than others if gzip is set to run at higher priority (and it could run slower than others  priority inversion) It could further slows down the other processes (starvation) Thus the overall throughput is reduced (uniform slowdown) t1 t4 t1 t3 t2 t1 t2 t1 t3 t1 t2 t1 t3 t4 t1 P1: P2: time slice

ECE8833 H.-H. S. Lee Stack Distance Profiling Algorithm CTR Pos 0 CTR Pos 1 CTR Pos 2 CTR Pos 3 MRULRU HIT Counters Cache Tag HIT CountersValue CTR Pos 0 CTR Pos 1 CTR Pos 2 CTR Pos Misses = 25 [Qureshi+, MICRO-39]

ECE8833 H.-H. S. Lee Stack Distance Profiling A counter for each cache way, C >A is the counter for misses Show the reuse frequency for each way in a cache Can be used to predict the misses for associativity smaller than “A” –Misses for 2-way cache for gzip = C >A + Σ C i where i = 3 to 8 art does not need all the space for likely poor temporal locality If the given space is halved for art and given to gzip, what happens?

ECE8833 H.-H. S. Lee Fairness Metrics [Kim et al. PACT’04] Uniform slowdown Execution time of t i when it runs alone. Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Fairness Metrics [Kim et al. PACT’04] Uniform slowdown Execution time of t i when it shares cache with others. Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Fairness Metrics [Kim et al. PACT’04] Slide courtesy: Seongbeom Kim, D. Chandra and Y. Uniform slowdown We want to minimize: –Ideally: Try to equalize the ratio of miss increase of each thread

ECE8833 H.-H. S. Lee Fairness Metrics [Kim et al. PACT’04] Slide courtesy: Seongbeom Kim, D. Chandra and Y. Uniform slowdown We want to minimize: –Ideally:

ECE8833 H.-H. S. Lee Partitionable Cache Hardware LRU P1: 448B P2 Miss P2: 576B Current Partition P1: 384B P2: 640B Target Partition Modified LRU cache replacement policy –G. E. Suh, et. al., HPCA 2002 Per-thread Counter Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Partitionable Cache Hardware LRU * P1: 448B P2 Miss P2: 576B Current Partition P1: 384B P2: 640B Target Partition Modified LRU cache replacement policy –G. Suh, et. al., HPCA 2002 LRU * P1: 384B P2: 640B Current Partition P1: 384B P2: 640B Target Partition Slide courtesy: Seongbeom Kim, D. Chandra and Y. Partition granularity could be as coarse as one entire cache way

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm P1: P2: Ex) Optimizing M3 metric P1: P2: Target Partition MissRate alone P1: P2: MissRate shared Repartitioning interval Counters to keep miss rates running the process alone (from stack distance profiling) Counters to keep dynamic miss rates (running with a shared cache) Counters to keep target partition size Slide courtesy: Seongbeom Kim, D. Chandra and Y. 10K accesses found to be the best

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm 1 st Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1: P2: MissRate shared P1:20% P2:15% MissRate shared P1:256KB P2:256KB Target Partition Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm Repartition! Evaluate M3 P1: 20% / 20% P2: 15% / 5% P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:256KB P2:256KB Target Partition P1:192KB P2:320KB Target Partition Partition granularity: 64KB Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm 2 nd Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:20% P2:15% MissRate shared P1:192KB P2:320KB Target Partition Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm Repartition! Evaluate M3 P1: 20% / 20% P2: 10% / 5% P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:20% P2:10% MissRate shared P1:192KB P2:320KB Target Partition P1:128KB P2:384KB Target Partition Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm 3 rd Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:10% MissRate shared P1:128KB P2:384KB Target Partition P1:20% P2:10% MissRate shared P1:25% P2: 9% MissRate shared Slide courtesy: Seongbeom Kim, D. Chandra and Y.

ECE8833 H.-H. S. Lee Dynamic Fair Caching Algorithm Repartition! Do Rollback if: P2: Δ<T rollback Δ=MR old -MR new P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:10% MissRate shared P1:25% P2: 9% MissRate shared P1:128KB P2:384KB Target Partition P1:192KB P2:320KB Target Partition Slide courtesy: Seongbeom Kim, D. Chandra and Y. The best T rollback threshold found to be 20%

ECE8833 H.-H. S. Lee Generic Repartitioning Algorithm Pick the largest and smallest as a pair for repartitioning Repeat for all candidate processes

Utility-Based Cache Partitioning (UCP)

ECE8833 H.-H. S. Lee Running Processes on Dual-Core [Qureshi & Patt, MICRO-39] LRU: in real runs on avg., 7 ways were allocated to equake and 9 to vpr UTIL –How much you use (in a set) is how much you will get –Ideally, 3 ways to equake and 13 to vpr # of ways given (1 to 16)

ECE8833 H.-H. S. Lee Defining Utility Utility U a b = Misses with a ways – Misses with b ways Low Utility High Utility Saturating Utility Num ways from 16-way 1MB L2 Misses per 1000 instructions Slide courtesy: Moin Qureshi, MICRO-39

ECE8833 H.-H. S. Lee Framework for UCP Slide courtesy: Moin Qureshi, MICRO-39 Three components:  Utility Monitors (UMON) per core  Partitioning Algorithm (PA)  Replacement support to enforce partitions I$ D$ Core1 I$ D$ Core2 Shared L2 cache Main Memory UMON1 UMON2 PA

ECE8833 H.-H. S. Lee Utility Monitors (UMON)  For each core, simulate LRU policy using Auxiliary Tag Dir (ATD)  UMON-global (one way-counter for all sets)  Hit counters in ATD to count hits per recency position  LRU is a stack algorithm: hit counts  utility E.g., hits(2 ways) = H0+H1 Set A Set B Set C Set D Set E Set F Set G Set H (MRU)(LRU)H0H1H2H3H15 ATD

ECE8833 H.-H. S. Lee Utility Monitors (UMON)  Extra tags incur hardware and power overhead  DSS reduces overhead [Qureshi et al. ISCA’06] Set A Set B Set C Set D Set E Set F Set G Set H (MRU)(LRU)H0H1H2H3H15 ATD Set A Set B Set C Set D Set E Set F Set G Set H

ECE8833 H.-H. S. Lee Utility Monitors (UMON)  Extra tags incur hardware and power overhead  DSS reduces overhead [Qureshi et al. ISCA’06]  32 sets sufficient based on Chebyshev’s inequality  Sample every 32 sets (simple static) used in the paper  Storage < 2KB/UMON (or 0.17% L2) Set A Set B Set C Set D Set E Set F Set G Set H (MRU)(LRU)H0H1H2H3H15 ATD UMON (DSS) Set B Set E Set F

ECE8833 H.-H. S. Lee Partitioning Algorithm (PA)  Evaluate all possible partitions and select the best  With a ways to core1 and (16-a) ways to core2: Hits core1 = (H 0 + H 1 + … + H a-1 ) ---- from UMON1 Hits core2 = (H 0 + H 1 + … + H 16-a-1 ) ---- from UMON2  Select a that maximizes (Hits core1 + Hits core2 )  Partitioning done once every 5 million cycles  After each partitioning interval  Hit counters in all UMONs are halved  To retain some past information

ECE8833 H.-H. S. Lee Replacement Policy to Reach Desired Partition Use way partitioning [Suh+ HPCA’02, Iyer ICS’04] Each Line contains core-id bits On a miss, count ways_occupied in the set by miss-causing app Binary decision for dual-core (in this paper) ways_occupied < ways_given Yes No Victim is the LRU line from other app Victim is the LRU line from miss-causing app

ECE8833 H.-H. S. Lee UCP Performance (Weighted Speedup) UCP improves average weighted speedup by 11% (Dual Core)

ECE8833 H.-H. S. Lee UPC Performance (Throughput) UCP improves average throughput by 17%

Dynamic Insertion Policy

ECE8833 H.-H. S. Lee Conventional LRU MRU LRU Incoming Block Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee Conventional LRU MRU LRU Occupies one cache block for a long time with no benefit! Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee LIP: LRU Insertion Policy [Qureshi et al. ISCA’07] MRU LRU Incoming Block 38 Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee LIP: LRU Insertion Policy [Qureshi et al. ISCA’07] MRU LRU Useless BlockEvicted at next eviction Useful BlockMoved to MRU position Adapted Slide from Yuejian Xie

ECE8833 H.-H. S. Lee LIP: LRU Insertion Policy [Qureshi et al. ISCA’07] MRU LRU Useless BlockEvicted at next eviction Useful BlockMoved to MRU position Slide Source: Yuejian Xie LIP is not entirely new, Intel has tried this in 1998 when designing “Timna” (integrating CPU and Gfx accelerator that share L2)

ECE8833 H.-H. S. Lee BIP: Bimodal Insertion Policy [Qureshi et al. ISCA’07] if ( rand() < e ) Insert at MRU position; // LRU replacement policy else Insert at LRU position; Promote to MRU if reused LIP may not age older lines Infrequently insert lines in MRU position Let e = Bimodal throttle parameter

ECE8833 H.-H. S. Lee DIP: Dynamic Insertion Policy [Qureshi et al. ISCA’07] Two types of workloads: LRU-friendly or BIP-friendly DIP can be implemented by: 1.Monitor both policies (LRU and BIP) 2.Choose the best-performing policy 3.Apply the best policy to the cache Need a cost-effective implementation  “Set Dueling” DIP BIPLRU LIP LRU ε 1-ε

ECE8833 H.-H. S. Lee Set Dueling for DIP [Qureshi et al. ISCA’07] LRU-sets Follower Sets BIP-sets Divide the cache in three: Dedicated LRU sets Dedicated BIP sets Follower sets (winner of LRU,BIP) n-bit saturating counter misses to LRU sets: counter++ misses to BIP sets : counter-- Counter decides policy for follower sets: MSB = 0, Use LRU MSB = 1, Use BIP n-bit cntr + miss – MSB = 0? YES No Use LRU Use BIP monitor  choose  apply (using a single counter) Slide Source: Moin Qureshi

Promotion/Insertion Pseudo Partitioning

ECE8833 H.-H. S. Lee PIPP [Xie & Loh ISCA’09] What’s PIPP? –Promotion/Insertion Pseudo Partitioning –Achieving both capacity (UCP) and dead-time management (DIP). Eviction –LRU block as the victim Insertion –The core’s quota worth of blocks away from LRU Promotion –To MRU by only one. MRU LRU To Evict Promote Hit Insert Position = 3 (Target Allocation) New 45 Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee PIPP Example Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A B B C C Core0’s Block Core1’s Block Request MRU LRU Core1’s quota=3 D D Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee PIPP Example Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D B B Core0’s Block Core1’s Block Request MRU LRU 6 6 Core0’s quota=5 Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee PIPP Example Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D B B Core0’s Block Core1’s Block Request MRU LRU Core0’s quota=5 7 7 Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee PIPP Example Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A D D Core0’s Block Core1’s Block Request MRU LRU D D 7 7 Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee Core0Core1Core2Core3 Quota6442 MRU LRU Insert closer to LRU position 50 How PIPP Does Both Management Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee MRU 0 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0’s Block Core1’s Block Request Strict Partition MRU 1 LRU 1 LRU 0 New Pseudo Partitioning Benefits Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee MRU LRU Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0’s Block Core1’s Block Request New Pseudo Partition Pseudo Partitioning Benefits Slide Source: Yuejian Xie Core1 “stole” a line from Core0

ECE8833 H.-H. S. Lee Pseudo Partitioning Benefits

ECE8833 H.-H. S. Lee Directly to MRU (TADIP) Directly to MRU (TADIP) 54 New MRU LRU Promote By One (PIPP) Promote By One (PIPP) MRU LRU New Single Reuse Block Slide Source: Yuejian Xie

ECE8833 H.-H. S. Lee Algorithm Capacity Management Dead-time Management Note LRU Baseline, no explicit management UCPStrict partitioning DIP / TADIP Insert at LRU and promote to MRU on hit PIPP Pseudo-partitioning and incremental promotion Algorithm Comparison Slide Source: Yuejian Xie