1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections 4.1-4.3)

Slides:



Advertisements
Similar presentations
Cache Coherence Mechanisms (Research project) CSCI-5593
Advertisements

1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
1 Lecture 19: Shared-Memory Multiprocessors Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections )
Lecture 18: Multiprocessors
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
1 Lecture 24: Multiprocessors Today’s topics:  Directory-based cache coherence protocol  Synchronization  Consistency  Writing parallel programs Reminder:
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
CS252/Patterson Lec /28/01 CS 213 Lecture 9: Multiprocessor: Directory Protocol.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
1 Lecture 18: Shared-Memory Multiprocessors Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections )
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
1 Lecture: Memory, Coherence Protocols Topics: wrap-up of memory systems, multi-thread programming models, snooping-based protocols.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
The University of Adelaide, School of Computer Science
1 Lecture: Coherence Protocols Topics: snooping-based protocols.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Lecture 8: Snooping and Directory Protocols
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Lecture 9: Directory-Based Examples II
Lecture 2: Snooping-Based Coherence
Lecture 8: Directory-Based Cache Coherence
11 – Snooping Cache and Directory Based Multiprocessors
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 25: Multiprocessors
Lecture 26: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 3: Coherence Protocols
Lecture 19: Coherence Protocols
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Cache Coherence
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )

2 SMP Example Processor A Caches Processor B Caches Processor C Caches Processor D Caches Main Memory I/O System A: Rd X B: Rd X C: Rd X A: Wr X C: Wr X B: Rd X A: Rd X A: Rd Y B: Wr X B: Rd Y B: Wr X B: Wr Y

3 SMP Example A: Rd X B: Rd X C: Rd X A: Wr X C: Wr X B: Rd X A: Rd X A: Rd Y B: Wr X B: Rd Y B: Wr X B: Wr Y A B C

4 SMP Example A: Rd X S B: Rd X S S C: Rd X S S S A: Wr X E I I C: Wr X I I E B: Rd X I S S A: Rd X S S S A: Rd Y S (Y) S (X) S (X) B: Wr X S (Y) E (X) I B: Rd Y S (Y) S (Y) I B: Wr X S (Y) E (X) I B: Wr Y I E (Y) I A B C

5 Design Issues Invalidate Find data Writeback / writethrough Processor Caches Processor Caches Processor Caches Processor Caches Main Memory I/O System Cache block states Contention for tags Enforcing write serialization

6 Cache Coherence Protocols Directory-based: A single location (directory) keeps track of the sharing status of a block of memory Snooping: Every cache block is accompanied by the sharing status of that block – all cache controllers monitor the shared bus so they can update the sharing status of the block, if necessary  Write-invalidate: a processor gains exclusive access of a block before writing by invalidating all other copies  Write-update: when a processor writes, it updates other shared copies of that block

7 Example Protocol RequestSourceBlock stateAction Read hitProcShared/exclRead data in cache Read missProcInvalidPlace read miss on bus Read missProcSharedConflict miss: place read miss on bus Read missProcExclusiveConflict miss: write back block, place read miss on bus Write hitProcExclusiveWrite data in cache Write hitProcSharedPlace write miss on bus Write missProcInvalidPlace write miss on bus Write missProcSharedConflict miss: place write miss on bus Write missProcExclusiveConflict miss: write back, place write miss on bus Read missBusSharedNo action; allow memory to respond Read missBusExclusivePlace block on bus; change to shared Write missBusSharedInvalidate block Write missBusExclusiveWrite back block; change to invalid

8 Performance Improvements What determines performance on a multiprocessor:  What fraction of the program is parallelizable?  How does memory hierarchy performance change? New form of cache miss: coherence miss – such a miss would not have happened if another processor did not write to the same cache line False coherence miss: the second processor writes to a different word in the same cache line – this miss would not have happened if the line size equaled one word

9 How do Cache Misses Scale? CompulsoryCapacityConflictCoherence True False Increasing cache capacity Increasing processor count Increasing block size Increasing associativity

10 Simplifying Assumptions All transactions on a read or write are atomic – on a write miss, the miss is sent on the bus, a block is fetched from memory/remote cache, and the block is marked exclusive Potential problem if the actions are non-atomic: P1 sends a write miss on the bus, P2 sends a write miss on the bus: since the block is still invalid in P1, P2 does not realize that it should write after receiving the block from P1 – instead, it receives the block from memory Most problems are fixable by keeping track of more state: for example, don’t acquire the bus unless all outstanding transactions for the block have completed

11 Directory-Based Cache Coherence The physical memory is distributed among all processors The directory is also distributed along with the corresponding memory The physical address is enough to determine the location of memory The (many) processing nodes are connected with a scalable interconnect (not a bus) – hence, messages are no longer broadcast, but routed from sender to receiver – since the processing nodes can no longer snoop, the directory keeps track of sharing state

12 Distributed Memory Multiprocessors Processor & Caches MemoryI/O Processor & Caches MemoryI/O Processor & Caches MemoryI/O Processor & Caches MemoryI/O Interconnection network Directory

13 Cache Block States What are the different states a block of memory can have within the directory? Note that we need information for each cache so that invalidate messages can be sent The block state is also stored in the cache for efficiency The directory now serves as the arbitrator: if multiple write attempts happen simultaneously, the directory determines the ordering

14 Directory-Based Example Processor & Caches MemoryI/O Processor & Caches MemoryI/O Processor & Caches MemoryI/O Interconnection network Directory X Directory Y A: Rd X B: Rd X C: Rd X A: Wr X C: Wr X B: Rd X A: Rd X A: Rd Y B: Wr X B: Rd Y B: Wr X B: Wr Y

15 Directory Actions If block is in uncached state:  Read miss: send data, make block shared  Write miss: send data, make block exclusive If block is in shared state:  Read miss: send data, add node to sharers list  Write miss: send data, invalidate sharers, make excl If block is in exclusive state:  Read miss: ask owner for data, write to memory, send data, make shared, add node to sharers list  Data write back: write to memory, make uncached  Write miss: ask owner for data, write to memory, send data, update identity of new owner, remain exclusive

16 Title Bullet