Download presentation
Presentation is loading. Please wait.
Published byElijah Hoover Modified over 9 years ago
1
LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs - UPC Barcelona, Spain antonio.gonzalez@intel.com ф Dept. Enginyeria Informàtica Universitat Rovira i Virgili Tarragona, Spain carlos.molina@urv.net ψ Dept. Arquitectura de Computadors Universitat Politècnica de Catalunya Barcelona, Spain javier.lira@ac.upc.edu ICCD 2009, Lake Tahoe, CA (USA) - October 6, 2009
2
Outline Introduction Methodology LRU-PEA Results Conclusions 2
3
Introduction CMPs have emerged as a dominant paradigm in system design. 1. Keep performance improvement while reducing power consumption. 2. Take advantage of Thread-level parallelism. Commercial CMPs are currently available. CMPs incorporate larger and shared last-level caches. Wire delay is a key constraint. 3
4
NUCA Non-Uniform Cache Architecture (NUCA) was first proposed in ASPLOS 2002 by Kim et al. [1]. NUCA divides a large cache in smaller and faster banks. Banks close to cache controller have smaller latencies than further banks. Processor [1] C. Kim, D. Burger and S.W. Keckler. An Adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ‘02 4
5
NUCA Policies Bank Placement PolicyBank Access Policy Bank Replacement PolicyBank Migration Policy
6
Outline Introduction Methodology LRU-PEA Results Conclusions 6
7
Methodology Simulation tools: Simics + GEMS CACTI v6.0 PARSEC Benchmark Suite Number of cores8 – UltraSPARC IIIi Frequency1.5 GHz Main Memory Size4 Gbytes Memory Bandwidth512 Bytes/cycle L1 cache latency3 cycles NUCA bank latency4 cycles Router delay1 cycle On-chip wire delay1 cycle Main memory latency250 cycles (from core) Private L1 caches8 x 32 Kbytes, 2-way Shared L2 NUCA cache8 MBytes, 256 Banks NUCA Bank32 KBytes, 8-way
8
Baseline NUCA cache architecture CMP-DNUCA 8 cores 256 banks Non-inclusive [2] B. M. Beckmann and D. A. Wood. Managing wire delay in large chip-multiprocessor caches. MICRO ‘04
9
Outline Introduction Methodology LRU-PEA Background How does it work? Results Conclusions 9
10
Background Entrance into the NUCA Off-chip memory L1 cache replacements Migration movements Promotion Demotion 10
11
Data categories 11 1. Off-chip 2. L1 cache Replacements 3. Promoted data 4. Demoted data
12
LRU-PEA LRU with Priority Eviction Approach Replacement policy for CMP-NUCA architectures. Data Eviction Policy: Chooses data to evict from a NUCA bank. Data Target Policy: Determines the destination bank of the evicted data. Globalizes replacement decisions to the whole NUCA. 12 Data Eviction Policy Data Target Policy LRU-PEA
13
Data Eviction Policy Based on the LRU replacement policy. Static prioritisation of NUCA data categories. Lowest-category data is evicted from the NUCA bank. PROBLEM: Highest-category could monopolize the NUCA cache. Category comparisson is restricted to the LRU and the LRU-1 positions. 13 BANK LocalCentral +L1 ReplacementsPromoted PRIORITY PromotedOff-chip Demoted - L1 Replacements
14
Data Eviction Policy Example (NUCA bank, 4-way)**: 14 @A Promoted @A Promoted @B Demoted @B Demoted @C Offchip @C Offchip @D Promoted @D Promoted ** The set associativity assumed in this work for NUCA banks is 8-way. 0 0 1 1 2 2 3 3 MRULRU L1 Replacement Promoted Offchip Demoted @C Offchip @C Offchip @D Promoted @D Promoted LRU-PEA @D Promoted @D Promoted Available
15
Data Target Policy Migration movements provoke bank usage imbalance in the NUCA cache. Replacements in most accessed banks are unfair. LRU-PEA globalizes replacement decisions to evict the most appropriate data from the NUCA cache. 15
16
Data Target Policy Example (256 NUCA Banks, 16 possible placements): 16 Current eviction Off-chip P2 Central Step 1 L1 Replac. P1 Local Step 2 Off-chip P2 Central Step 3 Demoted P4 Local … Current eviction Demoted P4 Local Cascade mode
17
Outline Introduction Methodology LRU-PEA Results Conclusions 17
18
Increasing network congestion No CascadeCascade Enabled DirectProvoked 1 step645420 2 steps1277 3 steps424 4 steps324 5 steps323 6 steps214 7 steps213 8 steps214 9 steps113 10 steps114 11 steps113 12 steps116 13 steps116 14 steps1130 15 steps321- Values in percentage (%) 18
19
NUCA miss rate analysis 19
20
Performance analysis 20
21
Dynamic EPI analysis 21
22
Outline Introduction Methodology LRU-PEA Results Conclusions 22
23
Conclusions LRU-PEA is proposed as an alternative to the traditional LRU replacement policy in CMP-NUCA architectures. Defines four novel NUCA categories and prioritises them to find the most appropriate data to evict. In a D-NUCA architecture, data movements provoke unfair replacements in most accessed banks. LRU-PEA globalizes replacement decisions taken in a single bank to the whole NUCA cache. LRU-PEA reduces miss rate, increases performance with parallel applications, reduces energy consumed per instruction, compared to the traditional LRU policy. 23
24
LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.