Cache Coherence for Shared Memory Multiprocessors

Slides:



Advertisements
Similar presentations
Extra Cache Coherence Examples In the following examples there are a couple questions. You can answer these for practice by ing Colin at
Advertisements

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Cache Optimization Summary
CS252 Graduate Computer Architecture Lecture 25 Memory Consistency Models and Snoopy Bus Protocols Prof John D. Kubiatowicz
Computer Architecture II 1 Computer architecture II Lecture 8.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
EECC756 - Shaaban #1 lec # 10 Spring Shared Memory Multiprocessors Symmetric Memory Multiprocessors (SMPs): commonly 2-4 processors/node.
EECC756 - Shaaban #1 lec # 11 Spring Shared Memory Multiprocessors Symmetric Multiprocessors (SMPs): –Symmetric access to all of main memory.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
Computer architecture II
Cache Coherence: Part 1 Todd C. Mowry CS 740 November 4, 1999 Topics The Cache Coherence Problem Snoopy Protocols.
Bus-Based Multiprocessor
EECC756 - Shaaban #1 lec # 10 Spring Multiprocessors Cache Coherence in Bus-Based Shared Memory Multiprocessors Shared Memory Multiprocessors.
Cache Coherence in Bus-Based Shared Memory Multiprocessors
CS 258 Parallel Computer Architecture Lecture 12 Shared Memory Multiprocessors II March 1, 2002 Prof John D. Kubiatowicz
Snooping Cache and Shared-Memory Multiprocessors
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
©RG:E0243:L2- Parallel Architecture 1 E0-243: Computer Architecture L2 – Parallel Architecture.
CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Presented By:- Prerna Puri M.Tech(C.S.E.) Cache Coherence Protocols MSI & MESI.
Spring EE 437 Lillevik 437s06-l21 University of Portland School of Engineering Advanced Computer Architecture Lecture 21 MSP shared cached MSI protocol.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Cache Coherence CSE 661 – Parallel and Vector Architectures
Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.
CS252 Graduate Computer Architecture Lecture 18 April 4 th, 2011 Memory Consistency Models and Snoopy Bus Protocols Prof John D. Kubiatowicz
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
Lecture 9 ECE/CSC Spring E. F. Gehringer, based on slides by Yan Solihin1 Lecture 9 Outline  MESI protocol  Dragon update-based protocol.
Cache Coherence for Small-Scale Machines Todd C
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
ECE 4100/6100 Advanced Computer Architecture Lecture 13 Multiprocessor and Memory Coherence Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
Cache Coherence CS433 Spring 2001 Laxmikant Kale.
The University of Adelaide, School of Computer Science
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Outline Introduction (Sec. 5.1)
COSC6385 Advanced Computer Architecture
תרגול מס' 5: MESI Protocol
Cache Coherence in Shared Memory Multiprocessors
Shared Memory Multiprocessors
Cache Coherence: Part 1 Todd C. Mowry CS 740 October 25, 2000
CS 704 Advanced Computer Architecture
A Study on Snoop-Based Cache Coherence Protocols
Multiprocessor Cache Coherency
Lecture 9 Outline MESI protocol Dragon update-based protocol
Prof. Gennady Pekhimenko University of Toronto Fall 2017
CMSC 611: Advanced Computer Architecture
Krste Asanovic Electrical Engineering and Computer Sciences
Example Cache Coherence Problem
Prof John D. Kubiatowicz
Protocol Design Space of Snooping Cache Coherent Multiprocessors
Cache Coherence (controllers snoop on bus transactions)
Lecture 2: Snooping-Based Coherence
Chip-Multiprocessor.
Cache Coherence in Bus-Based Shared Memory Multiprocessors
CMSC 611: Advanced Computer Architecture
Multiprocessors - Flynn’s taxonomy (1966)
Symmetric Multiprocessors
Lecture 4: Update Protocol
Bus-Based Coherent Multiprocessors
Shared Memory Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Lecture 25: Multiprocessors
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 25: Multiprocessors
Lecture 24: Multiprocessors
Lecture 3: Coherence Protocols
Lecture 8 Outline Memory consistency
Prof. Onur Mutlu ETH Zürich Fall November 2017
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
CS 258 Parallel Computer Architecture Lecture 16 Snoopy Protocols I
Prof John D. Kubiatowicz
Presentation transcript:

Cache Coherence for Shared Memory Multiprocessors

Cache Coherence Problem Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory

Bus Snooping A coherence technique for Bus-based shared memory multiprocessors Snoopy cache controller (SCC) inserted to do bus snooping Bus transactions are visible to all SCCs $ P n 1 SCC Bus I/O devices Mem

Snooping for Write-Through Caches When a SCC detects a relevant write transaction, it can either Invalidate the block containing the relevant variable (write-invalidate approach) Update the value in cache (write-update approach)

Write-Invalidate Protocol Two states per block in each cache As in uniprocessor Hardware state bits associated with blocks that are in the cache Invalid state is also used in place of “not present” state I V BusWr / -- PrRd/ -- PrWr / BusWr PrRd / BusRd State Tag Data State Tag Data I/O devices Mem P 1 $ n Bus This is just a particular design where on a write miss, the processor writes to main memory. Other designs may read the block first to validate it. A/B: if A is observed, transaction B is generated

Example Three processors, consider the states of the blocks containing X Main memory P3 $ P2 $ P1 $ (X / State) Operation 10 ? / I Initially 10 / V P2 Rd X P3 Rd X 15 10 / I 15 / V ? /I P2 Wr X=15 P1 Rd X 3 15 / I 3 / V P1 Wr X = 3 6 3 / I P3 Wr X = 6 Block remains invalid. Updating the value of X isn’t enough to validate the whole block

Snoopy Cache Controller Bus Snooping Advantages No need to change processor design No explicit coherence statements added to program Snoopy cache controller observes events from Local processor Bus Write operations Write-invalidate vs. write-update Write-through caches See last lecture Write-back caches Now, writes take place locally; SCCs don’t observe them How can we handle this? Extra work has to be done Snoopy Cache Controller

Write-Back Caches Usually have a “dirty bit” One bit per block State True: block has been modified False: block unchanged Use for uniprocessor Block has to be written back to memory upon replacement Use for multiprocessors Same as uniprocessor plus It means the processor “owns” the block

The Extra Work … ...before a processor writes into cache, it performs an “ownership” transaction… Case 1: No other modified copies of block in system Processor can write back Case 2: A modified copy exists somewhere in the system Old owner Writes block to memory Invalidates its local copy New owner Reads the block as it’s being written back to memory Performs write What the new owner did is called “read to own” (read to modify) transaction There is only one owner at a time Still don’t get it? Wait until you see the MSI protocol!

Ownership Overhead Ownership transactions are overhead If it happens every time a write is needed A block will be written back to memory every time Then, write-back caches would be as good/bad as write-through Let’s cross our fingers and count on the concept of locality Spatial and temporal locality can do it for us A processor owns the block and performs several writes consecutively

MSI Protocol: States This means it’s another write-invalidate protocol We need to differentiate between reads and writes Split the Valid state into two states I: Invalid S: Shared (one or more can read only) M: Modified or Dirty (only one can write) This means it’s another write-invalidate protocol Invalid Valid

MSI Protocol: Events/Actions Local processor events PrRd: read PrWr: write Bus transactions BusRd: read w/ no intent to modify BusRdX: read w/ intent to modify (read to own) BusWB: update memory Possible actions _: Nothing BusRd: send read request over the bus BusRdX: ownership (read to own) transaction Flush: copy modified block to memory

MSI Protocol: State Transitions PrRd, PrWr/_ M BusRd/Flush PrWr/BusRdX Promote Demote BusRdX/Flush PrWr/BusRdX S PrRd,BusRd/_ PrRd/BusRd BusRdX/_ I

MSI Protocol: Example Three processors, consider the states of the blocks containing X Main memory P3 $ P2 $ P1 $ (X / State) Operation 10 ? / I Initially 10 / S P2 Rd X P3 Rd X 10 / I 15 / M P2 Wr X=15 15 15 / S P1 Rd X 15 / I 3 / M P1 Wr X = 3 6/M P1 Wr X = 6

MESI Protocol: What’s wrong with MSI? Another write-invalidate protocol Consider this MSI scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: S P1 modifies X: BusWr, state: M BusWr is to let everybody else know X is being modified Previous scenario has 2 bus transactions No need for 2 transactions since P1 is the only processor to know about X!

MESI Protocol: States Same as MSI except S is split in 2 E: Exclusive clean (only one processor) S: Shared clean (more than one processor) Let’s consider same scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: E P1 modifies X: nothing, state: M In other words, P1 doesn’t need to let anybody know about the modification

MESI Protocol: Hardware Support Additional bus signal is needed Use S signal (S for shared) This helps processor know whether to load block in E or S state A cache controller asserts S signal if the relevant block is in cache S bus signal is a wired OR line

MESI Protocol: State Transitions Diagram only showing labels for what’s different from MSI Flushing a “clean” block A fast way for the new reader to read the block While flushing a shared block, Flush’ means only 1 processor is responsible Other protocol variations may not flush a clean block M PrWr/_ E Demote BusRd/Flush Promote PrRd,/_ BusRdX/Flush S Not(S) BusRdX/Flush’ S I

Dragon Protocol Write-back update protocol States Exclusive (E): 1 cache has a clean copy Shared-clean (Sc): 2 or more caches have a clean copy; memory up-to-date Shared-modified (Sm): 1 cache just modified the block, some other chaches memory outdated Modified (M): 1 cache has a modified copy Added processor events: PrRdMiss, PrWrMiss (remember we don’t have I state) Added bus transactions: BusUpd Broadcast the word or byte written by processor so other processors can update their copies

Dragon Protocol: State Transitions PrRd/— PrRd/— BusUpd/Update BusRd/— E Sc PrRdMiss/BusRd(S) PrRdMiss/BusRd(S) PrW r/— PrW r/BusUpd(S) PrW r/BusUpd(S) BusUpd/Update BusRd/Flush PrW rMiss/(BusRd(S); BusUpd) PrW rMiss/BusRd(S) Sm M PrW r/BusUpd(S) PrRd/— PrRd/— PrW r/BusUpd(S) BusRd/Flush PrW r/—

Snoopy Protocol Taxonomy Write-back Write- through MSI MESI IV Write-invalidate Dragon Homework Write-update Cache Protocol