Download presentation
Presentation is loading. Please wait.
Published byAmberlynn Payne Modified over 9 years ago
1
Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) javierx.lira@intel.comtimothy.jones@cl.cam.ac.uk Carlos Molina (URV, Spain)Antonio González (Intel-UPC, Spain) carlos.molina@urv.netantonio.gonzalez@intel.com HiPEAC 2012, Paris (France) – January 23, 2012
2
CMPs have become the dominant paradigm. Incorporate large shared last- level caches. Access latency in large caches is dominated by wire delays. 24 MBytes Intel® 32 MBytes IBM® 32 MBytes Tilera® Nehalem POWER7 Tile-GX 2
3
NUCA divides a large cache in smaller and faster banks. Cache access latency consists of the routing and bank access latencies. Banks close to cache controller have smaller latencies than further banks. Processor [1] Kim et al. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Architectures. ASPLOS’02 3
4
Data can be mapped in multiple banks. Migration allows data to adapt to application’s behaviour. 4 S-NUCAD-NUCA Migration movements are effective, but about 50% of hits still happen in non-optimal banks.
5
Introduction Methodology The Migration Prefetcher Analysis of results Conclusions 5
6
MigrationPlacement Access Replacement Placement Access Migration Replacement Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7 16 positions per data Partitioned multicast Gradual promotion LRU + Zero-copy Core 0 [2] Beckmann and Wood. Managing Wire Delay in Large Chip-Multiprocessor Caches. MICRO’04 6
7
Number of cores8 – UltraSPARC IIIi Frequency1.5 GHz Main Memory Size4 Gbytes Memory Bandwidth512 Bytes/cycle Private L1 caches8 x 32 Kbytes, 2-way Shared L2 NUCA cache8 MBytes, 128 Banks NUCA Bank64 KBytes, 8-way L1 cache latency3 cycles NUCA bank latency4 cycles Router delay1 cycle On-chip wire delay1 cycle Main memory latency250 cycles (from core) GEMS Simics Solaris 10 PARSEC SPEC CPU2006 8 x UltraSPARC IIIi Ruby Garnet Orion 7
8
Introduction Methodology The Migration Prefetcher Analysis of results Conclusions 8
9
Uses prefetching principles on data migration. This not a traditional prefetcher. ◦ It does not bring data from main memory. ◦ Potential benefits are much restricted. Require simple data correlation. 9
10
Core 0Core 1Core 2Core 3 Core 4 Core 5Core 6Core 7 Next AddressBank @Data block NAT PS 10 B 5 A B
11
Fraction of prefetching requests that ended up being useful. 11 1 confidence bit is effective. > 1 bit is not worthy.
12
Percentage of prefetching requests submitted with other address’s information. 12 12-14 bits use about 25% of erroneous information. NAT with 12 addressable bits is 232 KBytes in total.
13
Percentage of prefetching requests that are found in the NUCA cache. 13 Predicting data location in based on the last appearance provides 50% accuracy. Accuracy increases accessing to local bank.
14
The realistic Migration Prefetcher uses: ◦ 1-bit confidence for data patterns. ◦ A NAT with 12 addressable bits (29KBytes/table). ◦ Last responder + Local as search scheme. Total hardware overhead is 264 KBytes. Latency: 2 cycles. 14
15
Introduction Methodology The Migration Prefetcher Analysis of results Conclusions 15
16
16
17
Achieves overall performance improvements of 4%, and up to 17%. NUCA is up to 25% faster with the Migration Prefetcher. Reduces NUCA cache latency by 15%, on average. 17
18
This technique does not increase energy consumption. The prefetcher introduces extra traffic into the network. In case of hit, reduces the number of messages significantly. 18
19
Introduction Methodology The Migration Prefetcher Analysis of results Conclusions 19
20
Existing migration techniques effectively concentrate most accessed data to banks that are close to the cores. About 50% of hits in NUCA are in non-optimal banks. The Migration Prefetcher anticipates migrations based on the past. It reduces the average NUCA latency by 15%. Outperforms the baseline configuration by 4%, on average, and does not increase energy consumption. 20
21
Questions? HiPEAC 2012, Paris (France) – January 23, 2012
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.