1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted.

1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted

2 Intel Montecito Cache Two cores, each with a private 12 MB L3 cache and 1 MB L2 Naffziger et al., Journal of Solid-State Circuits, 2006

3 Intel 80-Core Prototype – Polaris Prototype chip with an entire die of SRAM cache stacked upon the cores

4 Example Intel Studies L3 Cache sizes up to 32 MB C L1 C L2 C L1 C L2 L3 Memory interface C L1 C L2 C L1 C L2 Interconnect IO interface From Zhao et al., CMP-MSI Workshop 2007

5 Shared Vs. Private Caches in Multi-Core What are the pros/cons to a shared L2 cache? P4P3P2P1 L1 L2 P4P3P2P1 L1 L2

6 Shared Vs. Private Caches in Multi-Core Advantages of a shared cache:  Space is dynamically allocated among cores  No wastage of space because of replication  Potentially faster cache coherence (and easier to locate data on a miss) Advantages of a private cache:  small L2  faster access time  private bus to L2  less contention

7 UCA and NUCA The small-sized caches so far have all been uniform cache access: the latency for any access is a constant, no matter where data is found For a large multi-megabyte cache, it is expensive to limit access time by the worst case delay: hence, non-uniform cache architecture

8 Large NUCA CPU Issues to be addressed for Non-Uniform Cache Access: Mapping Migration Search Replication

9 Static and Dynamic NUCA Static NUCA (S-NUCA)  The address index bits determine where the block is placed  Page coloring can help here as well to improve locality Dynamic NUCA (D-NUCA)  Blocks are allowed to move between banks  The block can be anywhere: need some search mechanism  Each core can maintain a partial tag structure so they have an idea of where the data might be (complex!)  Every possible bank is looked up and the search propagates (either in series or in parallel) (complex!)

10 Where Should Data be Placed? CPU 2 L1DL1I CPU 4 L1D L1I CPU 6 L1DL1I CPU 1 L1D L1I CPU 3 L1DL1I CPU 5 L1D L1I CPU 7 L1DL1I CPU 0 L1D L1I

11 Example Organization Latency 13-17cyc Latency 65 cyc Data must be placed close to the center-of-gravity of requests

12 Examples: Frequency of Accesses Dark  more accesses  OLTP (on-line transaction processing) Ocean  (scientific code)

13 Organizations in 3D Possibly some inter-mingling of cache and cores? CPCP CPCP CPCP CPCP CPCP CPCP PPPP CPPC PPPP CCCC CCCC CCCC

14 Title Bullet

1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted.

Similar presentations

Presentation on theme: "1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted.

Similar presentations

Presentation on theme: "1 Lecture 15: Large Cache Design Topics: innovations for multi-mega-byte cache hierarchies Reminders:  Assignment 5 posted."— Presentation transcript:

Similar presentations

About project

Feedback