Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Cache Organization of Pentium
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Maninder Kaur CACHE MEMORY 24-Nov
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Lecture 5 Cache Operation ECE 463/521 Fall 2002 Edward F. Gehringer Based on notes by Drs. Eric Rotenberg & Tom Conte of NCSU.
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
The Memory Hierarchy Lecture # 30 15/05/2009Lecture 30_CA&O_Engr Umbreen Sabir.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
Lecture 20 Last lecture: Today’s lecture: Types of memory
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Recitation 6 – 3/11/01 Outline Cache Organization Replacement Policies MESI Protocol –Cache coherency for multiprocessor systems Anusha
Memory Hierarchy and Cache Design (3). Reducing Cache Miss Penalty 1. Giving priority to read misses over writes 2. Sub-block placement for reduced miss.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
CSCI206 - Computer Organization & Programming
Cache Organization of Pentium
CAM Content Addressable Memory
Replacement Policy Replacement policy:
Lecture: Cache Hierarchies
Consider a Direct Mapped Cache with 4 word blocks
Lecture: Cache Hierarchies
Lecture 21: Memory Hierarchy
Example Cache Coherence Problem
Lecture 21: Memory Hierarchy
Lecture 23: Cache, Memory, Virtual Memory
Module IV Memory Organization.
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 11: Cache Hierarchies
Lecture 21: Memory Hierarchy
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Cache - Optimization.
Update : about 8~16% are writes
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Presentation transcript:

Multiprocessor cache coherence

Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully associative placement, replacement, location (tags) hits and misses clean vs. dirty entries write-through vs. write-back multi-level inclusion

Cache line, line and cache size caches hold multi-byte lines bytes are from sequential locations in memory (called a memory block) number of bytes in a line usually a multiple of bus width lines are identified by “tags” cache size = # lines * # bytes per line –tags are not included in cache size

Degree of associativity how many “ways” or places can we store a block from memory in the cache? –direct-mapped => 1 “way” or place to store a given block –set-associative => multiple sets, or “ways” –fully associative => a block from memory can be stored in any line in the cache

Placement when we bring a block in from memory, where do we put it? –use address of the first byte of the block –break into offset, index and tag –remove log 2 (# bytes in block) low-order bits for offset –use middle log 2 (# lines per way) bits to select line -> called “index” –remaining high-order bits are the tag

Replacement placement selects the same line number in each way if one way has an empty line at that location, use it if all ways have valid lines at that location, one will need to be victimized use LRU, clock, random,….

Locating a block: Tags how do we know whether a given block is in cache? calculate the index and tag from the address check the tags for that index in each way separate memory for tag array circuitry for tag comparisons

Hit vs. miss hit == block is in the cache read hit / miss: process wants to read one or more bytes in the block write hit / miss similarly we want high hit rates

Clean vs. dirty has the value in the cache been modified since it was placed in the cache? one bit per line similarly, one bit for valid / not valid

Write-through vs. write-back write-through: on a write, update the cache and also write to memory –more traffic to memory –no need to stall on replacement write-back: hold writes in cache, mark the line dirty –less memory traffic –coalesces multiple writes to the line

Multi-level inclusion (MLI) L1 (the child) holds a subset of L2 (the parent) L2 holds a subset of main memory affects servicing of misses & invalidations –see 6.3.1, pp in text constrains the organizations we can build if we want MLI - can switch to MLE, though ensures there will be an allocated line in the parent with the same contents as the child (if clean) or which can receive a dirty line from the child when it is replaced

Uniprocessor MLI assuming writeback caches Ap, Bp are parent’s associativity and line size, respectively; Ac and Bc for the child we are constrained to: Ap >= (Bp / Bc) Ac associativity must at least cover the ratio of the line sizes

Multiprocessor MLI a parent may have k children: Ap >= k (Bp / Bc) Ac e.g. parent is 32 KB, 16B lines, child is 1 KB, 4B lines & direct-mapped Ac / Bc = ¼. If k=1, Ap >= 4 If k=4, Ap >= 16

Coherence MLI makes it easy to keep an L1 / L2 pair consistent, or “coherent” what about multiple caches in a multiprocessor or multi-core system?

Shared memory machines

What’s the problem? TimeEventCache ACache BMemory 0X = 10 1CPU A reads X 10 2CPU B reads X 10 3CPU A writes X 2010

Coherence (formally) determines the value returned by a read coherent memory system: –If P writes to X then reads X, with no writes to X by other processors, should return value written by P –If P1 writes to X and then P2 reads from X, if read/write “sufficiently” separated in time, should return value written P1 –Writes to the same location are serialized; two writes to the same location by any two processors are seen in the same order by all processors

Consistency determines when a written value will be available to be read –“sufficiently separated” in the previous slide various consistency models are possible –later

Think / group / share how can we ensure coherence in a shared- memory multiprocessor?

Reading assignment section 7.3 on pages 281 to 290 in Baer –covers synchronization example 1, pages 269 to 270