Download presentation
Presentation is loading. Please wait.
Published byLucia Somers Modified over 10 years ago
1
Yuejian Xie, Gabriel H. Loh
2
Core0 IL1 DL1 Core1 IL1 DL1 Last Level Cache (LLC) Core1s Data 2 Core0s Data
3
Capacity Management –Considering different cache space need, allocate proper space to each core. –Guo-MICRO07, Kim-PACT04, Srikantaiah-ASPLOS09, Qureshi-MICRO06 (UCP), … Dead Time Management –Evict dead lines (blocks with no reuse) sooner. –Kaxiras-ISCA01, Qureshi-ISCA07, Jaleel-PACT07 (TADIP), … 3
4
Core1 Core0 Core 0 gets 5 ways Core 1 gets 3 ways 4
5
MRU LRU Incoming Block 5
6
MRU LRU 6 Occupies one cache block for a long time with no benefit!
7
MRU LRU Incoming Block 7
8
MRU LRU 8 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position
9
MRU LRU 9 Useless BlockEvicted at next eviction Useful BlockMoved to MRU position
10
PIPP: Novel scheme for Promotion and Insertion Eviction –When replacing a block in a set, which should be evicted? Insertion –For new blocks, where to insert the new block? Promotion –When there is a hit in the cache, how to adjust the blocks position/priority? 10
11
Whats PIPP? –Promotion/Insertion Pseudo Partitioning –Achieving both capacity and dead-time management. Eviction –LRU block as the victim Insertion –The cores quota worth of blocks away from LRU Promotion –To MRU by only one. MRU LRU To Evict Promote Hit Insert Position = 3 (Target Allocation) New 11
12
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 3 3 4 4 5 5 B B C C Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 12
13
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 5 5 3 3 4 4 D D B B Core0s Block Core1s Block Request MRU LRU 6 6 Core0s quota=5 13
14
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 6 6 3 3 4 4 D D B B Core0s Block Core1s Block Request MRU LRU Core0s quota=5 7 7 14
15
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 6 6 3 3 4 4 D D Core0s Block Core1s Block Request MRU LRU D D 7 7 15
16
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 7 7 6 6 4 4 Core0s Block Core1s Block Request MRU LRU Core1s quota=3 D D 3 3 E E 16
17
Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks 1 1 A A 2 2 7 7 6 6 D D Core0s Block Core1s Block Request MRU LRU 3 3 E E 2 2 17
18
Core0Core1Core2Core3 Quota6442 MRU LRU Insert closer to LRU position 18
19
19 MRU 0 Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request Strict Partition MRU 1 LRU 1 LRU 0 New
20
20 MRU LRU Core0 quota: 5 blocks Core1 quota: 3 blocks Core0 quota: 5 blocks Core1 quota: 3 blocks Core0s Block Core1s Block Request New Pseudo Partition
21
Directly to MRU (TADIP) Directly to MRU (TADIP) 21 New MRU LRU Promote By One (PIPP) Promote By One (PIPP) MRU LRU New
22
22 Algorithm Capacity Management Dead-time Management Note LRU Baseline, no explicit management UCPStrict partitioning TADIP Insert at LRU and promote to MRU on hit PIPP Pseudo-partitioning and incremental promotion
23
Simulation environment –SimpleScalar-Zesto, Out-Of-Order, Intel Core2-like –32KB, 8way DL1 IL1, 4MB 16way LLC, 1.6GHz DDR2 Workloads Classification –UCP2-5 UCP-friendly, 2-core, 5 th workload –DIP4-3 TADIP-friendly, 4-core, 3 th workload 23
24
TADIP Friendly UCP Friendly PIPP outperforms LRU, 19.0%, UCP 10.6%, TADIP 10.1% PIPP is too cautious here. 24
25
TADIP Friendly UCP Friendly PIPP outperforms LRU 21.9%, UCP 12.1%, TADIP 17.5% 25
26
Occupancy Control Insertion Behavior TADIP inserts no-reuse lines at 1.7 while PIPP inserts those at 1.3. (LRU position equals to 0.) Pseudo-Partition Benefit 26
27
Novel proposal on Insertion and Promotion A single unified mechanism provides both capacity and dead time management Outperforms prior UCP and TADIP In the full paper: –Special version of PIPP for streaming application –Reducing hardware overhead –Sensitivity analysis 27
28
28
29
29
30
30
31
31
32
E.g. Target Partition {5,3} – Actual Occupancy {6,2} = 1 32
33
33
34
Streaming Application Detection –#Accesses, #Misses, MissRate > threshold Insertion –At a fixed position (independent of quota) –#Streaming Apps blocks away from LRU position Promotion –Promote by 1 with probability p stream –p stream « 1 34
35
35
36
36 Promotion Prob for General App Promotion Prob for Streaming App
37
37
38
38
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.