Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio.

Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) javierx.lira@intel.comtimothy.jones@cl.cam.ac.uk Carlos Molina (URV, Spain)Antonio González (Intel-UPC, Spain) carlos.molina@urv.netantonio.gonzalez@intel.com HiPEAC 2012, Paris (France) – January 23, 2012

 CMPs have become the dominant paradigm.  Incorporate large shared last- level caches.  Access latency in large caches is dominated by wire delays. 24 MBytes Intel® 32 MBytes IBM® 32 MBytes Tilera® Nehalem POWER7 Tile-GX 2

 NUCA divides a large cache in smaller and faster banks.  Cache access latency consists of the routing and bank access latencies.  Banks close to cache controller have smaller latencies than further banks. Processor [1] Kim et al. An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Architectures. ASPLOS’02 3

 Data can be mapped in multiple banks.  Migration allows data to adapt to application’s behaviour. 4 S-NUCAD-NUCA Migration movements are effective, but about 50% of hits still happen in non-optimal banks.

 Introduction  Methodology  The Migration Prefetcher  Analysis of results  Conclusions 5

MigrationPlacement Access Replacement Placement Access Migration Replacement Core 0Core 1Core 2Core 3 Core 4Core 5Core 6Core 7 16 positions per data Partitioned multicast Gradual promotion LRU + Zero-copy Core 0 [2] Beckmann and Wood. Managing Wire Delay in Large Chip-Multiprocessor Caches. MICRO’04 6

Number of cores8 – UltraSPARC IIIi Frequency1.5 GHz Main Memory Size4 Gbytes Memory Bandwidth512 Bytes/cycle Private L1 caches8 x 32 Kbytes, 2-way Shared L2 NUCA cache8 MBytes, 128 Banks NUCA Bank64 KBytes, 8-way L1 cache latency3 cycles NUCA bank latency4 cycles Router delay1 cycle On-chip wire delay1 cycle Main memory latency250 cycles (from core) GEMS Simics Solaris 10 PARSEC SPEC CPU2006 8 x UltraSPARC IIIi Ruby Garnet Orion 7

 Uses prefetching principles on data migration.  This not a traditional prefetcher. ◦ It does not bring data from main memory. ◦ Potential benefits are much restricted.  Require simple data correlation. 9

Core 0Core 1Core 2Core 3 Core 4 Core 5Core 6Core 7 Next AddressBank @Data block NAT PS 10 B 5 A B

 Fraction of prefetching requests that ended up being useful. 11 1 confidence bit is effective. > 1 bit is not worthy.

 Percentage of prefetching requests submitted with other address’s information. 12 12-14 bits use about 25% of erroneous information. NAT with 12 addressable bits is 232 KBytes in total.

 Percentage of prefetching requests that are found in the NUCA cache. 13 Predicting data location in based on the last appearance provides 50% accuracy. Accuracy increases accessing to local bank.

 The realistic Migration Prefetcher uses: ◦ 1-bit confidence for data patterns. ◦ A NAT with 12 addressable bits (29KBytes/table). ◦ Last responder + Local as search scheme.  Total hardware overhead is 264 KBytes.  Latency: 2 cycles. 14

 Achieves overall performance improvements of 4%, and up to 17%.  NUCA is up to 25% faster with the Migration Prefetcher.  Reduces NUCA cache latency by 15%, on average. 17

 This technique does not increase energy consumption.  The prefetcher introduces extra traffic into the network.  In case of hit, reduces the number of messages significantly. 18

 Existing migration techniques effectively concentrate most accessed data to banks that are close to the cores.  About 50% of hits in NUCA are in non-optimal banks.  The Migration Prefetcher anticipates migrations based on the past.  It reduces the average NUCA latency by 15%.  Outperforms the baseline configuration by 4%, on average, and does not increase energy consumption. 20

Questions? HiPEAC 2012, Paris (France) – January 23, 2012

Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio.

Similar presentations

Presentation on theme: "Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio.

Similar presentations

Presentation on theme: "Javier Lira (Intel-UPC, Spain)Timothy M. Jones (U. of Cambridge, UK) Carlos Molina (URV, Spain)Antonio."— Presentation transcript:

Similar presentations

About project

Feedback