Cache Coherence for Shared Memory Multiprocessors
Cache Coherence Problem Example Processors see different values for u after event 3 P P 2 P 1 3 4 u = ? 3 u = 7 5 u = ? $ $ $ 1 u :5 2 u :5 I/O devices u :5 Memory
Bus Snooping A coherence technique for Bus-based shared memory multiprocessors Snoopy cache controller (SCC) inserted to do bus snooping Bus transactions are visible to all SCCs $ P n 1 SCC Bus I/O devices Mem
Snooping for Write-Through Caches When a SCC detects a relevant write transaction, it can either Invalidate the block containing the relevant variable (write-invalidate approach) Update the value in cache (write-update approach)
Write-Invalidate Protocol Two states per block in each cache As in uniprocessor Hardware state bits associated with blocks that are in the cache Invalid state is also used in place of “not present” state I V BusWr / -- PrRd/ -- PrWr / BusWr PrRd / BusRd State Tag Data State Tag Data I/O devices Mem P 1 $ n Bus This is just a particular design where on a write miss, the processor writes to main memory. Other designs may read the block first to validate it. A/B: if A is observed, transaction B is generated
Example Three processors, consider the states of the blocks containing X Main memory P3 $ P2 $ P1 $ (X / State) Operation 10 ? / I Initially 10 / V P2 Rd X P3 Rd X 15 10 / I 15 / V ? /I P2 Wr X=15 P1 Rd X 3 15 / I 3 / V P1 Wr X = 3 6 3 / I P3 Wr X = 6 Block remains invalid. Updating the value of X isn’t enough to validate the whole block
Snoopy Cache Controller Bus Snooping Advantages No need to change processor design No explicit coherence statements added to program Snoopy cache controller observes events from Local processor Bus Write operations Write-invalidate vs. write-update Write-through caches See last lecture Write-back caches Now, writes take place locally; SCCs don’t observe them How can we handle this? Extra work has to be done Snoopy Cache Controller
Write-Back Caches Usually have a “dirty bit” One bit per block State True: block has been modified False: block unchanged Use for uniprocessor Block has to be written back to memory upon replacement Use for multiprocessors Same as uniprocessor plus It means the processor “owns” the block
The Extra Work … ...before a processor writes into cache, it performs an “ownership” transaction… Case 1: No other modified copies of block in system Processor can write back Case 2: A modified copy exists somewhere in the system Old owner Writes block to memory Invalidates its local copy New owner Reads the block as it’s being written back to memory Performs write What the new owner did is called “read to own” (read to modify) transaction There is only one owner at a time Still don’t get it? Wait until you see the MSI protocol!
Ownership Overhead Ownership transactions are overhead If it happens every time a write is needed A block will be written back to memory every time Then, write-back caches would be as good/bad as write-through Let’s cross our fingers and count on the concept of locality Spatial and temporal locality can do it for us A processor owns the block and performs several writes consecutively
MSI Protocol: States This means it’s another write-invalidate protocol We need to differentiate between reads and writes Split the Valid state into two states I: Invalid S: Shared (one or more can read only) M: Modified or Dirty (only one can write) This means it’s another write-invalidate protocol Invalid Valid
MSI Protocol: Events/Actions Local processor events PrRd: read PrWr: write Bus transactions BusRd: read w/ no intent to modify BusRdX: read w/ intent to modify (read to own) BusWB: update memory Possible actions _: Nothing BusRd: send read request over the bus BusRdX: ownership (read to own) transaction Flush: copy modified block to memory
MSI Protocol: State Transitions PrRd, PrWr/_ M BusRd/Flush PrWr/BusRdX Promote Demote BusRdX/Flush PrWr/BusRdX S PrRd,BusRd/_ PrRd/BusRd BusRdX/_ I
MSI Protocol: Example Three processors, consider the states of the blocks containing X Main memory P3 $ P2 $ P1 $ (X / State) Operation 10 ? / I Initially 10 / S P2 Rd X P3 Rd X 10 / I 15 / M P2 Wr X=15 15 15 / S P1 Rd X 15 / I 3 / M P1 Wr X = 3 6/M P1 Wr X = 6
MESI Protocol: What’s wrong with MSI? Another write-invalidate protocol Consider this MSI scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: S P1 modifies X: BusWr, state: M BusWr is to let everybody else know X is being modified Previous scenario has 2 bus transactions No need for 2 transactions since P1 is the only processor to know about X!
MESI Protocol: States Same as MSI except S is split in 2 E: Exclusive clean (only one processor) S: Shared clean (more than one processor) Let’s consider same scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: E P1 modifies X: nothing, state: M In other words, P1 doesn’t need to let anybody know about the modification
MESI Protocol: Hardware Support Additional bus signal is needed Use S signal (S for shared) This helps processor know whether to load block in E or S state A cache controller asserts S signal if the relevant block is in cache S bus signal is a wired OR line
MESI Protocol: State Transitions Diagram only showing labels for what’s different from MSI Flushing a “clean” block A fast way for the new reader to read the block While flushing a shared block, Flush’ means only 1 processor is responsible Other protocol variations may not flush a clean block M PrWr/_ E Demote BusRd/Flush Promote PrRd,/_ BusRdX/Flush S Not(S) BusRdX/Flush’ S I
Dragon Protocol Write-back update protocol States Exclusive (E): 1 cache has a clean copy Shared-clean (Sc): 2 or more caches have a clean copy; memory up-to-date Shared-modified (Sm): 1 cache just modified the block, some other chaches memory outdated Modified (M): 1 cache has a modified copy Added processor events: PrRdMiss, PrWrMiss (remember we don’t have I state) Added bus transactions: BusUpd Broadcast the word or byte written by processor so other processors can update their copies
Dragon Protocol: State Transitions PrRd/— PrRd/— BusUpd/Update BusRd/— E Sc PrRdMiss/BusRd(S) PrRdMiss/BusRd(S) PrW r/— PrW r/BusUpd(S) PrW r/BusUpd(S) BusUpd/Update BusRd/Flush PrW rMiss/(BusRd(S); BusUpd) PrW rMiss/BusRd(S) Sm M PrW r/BusUpd(S) PrRd/— PrRd/— PrW r/BusUpd(S) BusRd/Flush PrW r/—
Snoopy Protocol Taxonomy Write-back Write- through MSI MESI IV Write-invalidate Dragon Homework Write-update Cache Protocol