CS252/Patterson Lec 11.1 2/23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
The University of Adelaide, School of Computer Science
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
BusMultis.1 Review: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor –
Chapter 6 Multiprocessors and Thread-Level Parallelism 吳俊興 高雄大學資訊工程學系 December 2004 EEF011 Computer Architecture 計算機結構.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
CS252/Patterson Lec /28/01 CS 213 Lecture 9: Multiprocessor: Directory Protocol.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
DAP Spr.‘98 ©UCB 1 Lecture 18: Review. DAP Spr.‘98 ©UCB 2 Cache Organization (1) How do you know if something is in the cache? (2) If it is in the cache,
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
Snooping Cache and Shared-Memory Multiprocessors
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
CS492B Analysis of Concurrent Programs Coherence Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
CSIE30300 Computer Architecture Unit 15: Multiprocessors Hsin-Chou Chi [Adapted from material by and
EECS 252 Graduate Computer Architecture Lec 13 – Snooping Cache and Directory Based Multiprocessors David Patterson Electrical Engineering and Computer.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Lecture 13: Multiprocessors Kai Bu
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
Multiprocessors— Flynn Categories, Large vs. Small Scale, Cache Coherency Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Additional Material CEG 4131 Computer Architecture III
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
CS 704 Advanced Computer Architecture
Lecture 8: Snooping and Directory Protocols
תרגול מס' 5: MESI Protocol
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 704 Advanced Computer Architecture
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
Flynn’s Taxonomy Flynn classified by data and control streams in 1966
The University of Adelaide, School of Computer Science
CPE 631 Session 21: Multiprocessors (Part 2)
Chapter 5 Multiprocessor and Thread-Level Parallelism
CMSC 611: Advanced Computer Architecture
11 – Snooping Cache and Directory Based Multiprocessors
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
High Performance Computing
Professor David A. Patterson Computer Science 252 Spring 1998
Lecture 8: Directory-Based Examples
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
CPE 631 Lecture 20: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem

CS252/Patterson Lec /23/01 Symmetric Multiprocessor (SMP) Memory: centralized with uniform access time (“uma”) and bus interconnect Examples: Sun Enterprise 5000, SGI Challenge, Intel SystemPro

CS252/Patterson Lec /23/01 Small-Scale—Shared Memory Caches serve to: –Increase bandwidth versus bus/memory –Reduce latency of access –Valuable for both private data and shared data What about cache consistency?

CS252/Patterson Lec /23/01 What Does Coherency Mean? Informally: –“Any read must return the most recent write” –Too strict and too difficult to implement Better: –“Any write must eventually be seen by a read” –All writes are seen in proper order (“serialization”) Two rules to ensure this: –“If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart” –Writes to a single location are serialized: seen in one order »Latest write will be seen »Otherwise could see writes in illogical order (could see older value after a newer value)

CS252/Patterson Lec /23/01 Potential HW Coherency Solutions Snooping Solution (Snoopy Bus): –Send all requests for data to all processors –Processors snoop to see if they have a copy and respond accordingly –Requires broadcast, since caching information is at processors –Works well with bus (natural broadcast medium) –Dominates for small scale machines (most of the market) Directory-Based Schemes (discussed later) –Keep track of what is being shared in 1 centralized place (logically) –Distributed memory => distributed directory for scalability (avoids bottlenecks) –Send point-to-point requests to processors via network –Scales better than Snooping –Actually existed BEFORE Snooping-based schemes

CS252/Patterson Lec /23/01 Basic Snoopy Protocols Write Invalidate Protocol: –Multiple readers, single writer –Write to shared data: an invalidate is sent to all caches which snoop and invalidate any copies –Read Miss: »Write-through: memory is always up-to-date »Write-back: snoop in caches to find most recent copy »Write-once: Write only once to the memory – caches invalidate their copies – Good for a write run Write Broadcast Protocol (typically write through): –Write to shared data: broadcast on bus, processors snoop, and update any copies –Read miss: memory is always up-to-date

CS252/Patterson Lec /23/01 Basic Snoopy Protocols Write Invalidate versus Broadcast: –Invalidate requires one transaction per write-run –Invalidate uses spatial locality: one transaction per block –Broadcast has lower latency between write and read Write serialization: bus serializes requests! –Bus is single point of arbitration –A single point of control through directory if no bus

CS252/Patterson Lec /23/01 An Example Snoopy Protocol Invalidation protocol, write-back cache Each block of memory is in one state: –Clean in all caches and up-to-date in memory (Shared) –OR Dirty in exactly one cache (Exclusive) –OR Not in any caches Each cache block is in one state (track these): –Shared : block can be read –OR Exclusive : cache has only copy, its writeable, and dirty –OR Invalid : block contains no data Read misses: cause all caches to snoop bus Writes to clean line are treated as misses

CS252/Patterson Lec /23/01 Snoopy-Cache State Machine-I State machine for CPU requests for a cache block (not an address). Hence, CPU read miss to a shared or exclusive block means a conflict miss => Missed Block in another cache Invalid Shared (read/only) Exclusive (read/write) CPU Read CPU Write CPU Read hit Place read miss on bus Place Write Miss on bus CPU read miss Write back block, Place read miss on bus CPU Write Place Write Miss on Bus CPU Read miss (conflict) Place read miss on bus CPU Write Miss (Conflict) Write back cache block Place write miss on bus CPU read hit CPU write hit Cache Block State

CS252/Patterson Lec /23/01 Snoopy-Cache State Machine-II State machine for bus requests for each cache block Appendix E? gives details of bus requests Invalid Shared (read/only) Exclusive (read/write) Write Back Block; (abort memory access) Write miss for this block Read miss for this block Write miss for this block Write Back Block; (abort memory access)

CS252/Patterson Lec /23/01 Place read miss on bus Snoopy-Cache State Machine-III State machine for CPU requests for each cache block and for bus requests for each cache block Invalid Shared (read/only) Exclusive (read/write) CPU Read CPU Write CPU Read hit Place Write Miss on bus CPU read miss Write back block, Place read miss on bus CPU Write Place Write Miss on Bus CPU Read miss Place read miss on bus CPU Write Miss Write back cache block Place write miss on bus CPU read hit CPU write hit Cache Block State Write miss for this block Write Back Block; (abort memory access) Write miss for this block Read miss for this block Write Back Block; (abort memory access)

CS252/Patterson Lec /23/01 Implementation Complications Write Races: –Cannot update cache until bus is obtained »Otherwise, another processor may get bus first, and then write the same cache block! –Two step process: »Arbitrate for bus »Place miss on bus and complete operation –If miss occurs to block while waiting for bus, handle miss (invalidate may be needed) and then restart. –Split transaction bus: »Bus transaction is not atomic: can have multiple outstanding transactions for a block »Multiple misses can interleave, allowing two caches to grab block in the Exclusive state »Must track and prevent multiple misses for one block Must support interventions and invalidations by creating transient states. See Appendix I

CS252/Patterson Lec /23/01

CS252/Patterson Lec /23/01 Implementing Snooping Caches Multiple processors must be on bus, access to both addresses and data Add a few new commands to perform coherency, in addition to read and write Processors continuously snoop on address bus –If address matches tag, either invalidate or update Since every bus transaction checks cache tags, could interfere with CPU just to check: –solution 1: duplicate set of tags for L1 caches just to allow checks in parallel with CPU –solution 2: L2 cache already duplicate, provided L2 obeys inclusion with L1 cache »block size, associativity of L2 affects L1

CS252/Patterson Lec /23/01 Implementing Snooping Caches Bus serializes writes, getting bus ensures no one else can perform memory operation On a miss in a write back cache, may have the desired copy and its dirty, so must reply Add extra state bit to cache to determine shared or not Add 4th state (MESI) – See next transparency

CS252/Patterson Lec /23/01 Snooping Cache Variations Berkeley Protocol Owned Exclusive Owned Shared Shared Invalid Basic Protocol Exclusive Shared Invalid Illinois Protocol Private Dirty Private Clean Shared Invalid Owner can update via bus invalidate operation Owner must write back when replaced in cache If read sourced from memory, then Private Clean if read sourced from other cache, then Shared Can write in cache if held private clean or dirty MESI Protocol Modfied (private,!=Memory) eXclusive (private,=Memory) Shared (shared,=Memory) Invalid

CS252/Patterson Lec /23/01 Remote Read Place Data on Bus? The MESI Protocol Extensions: –Fourth State: Ownership Remote Write or Miss due to address conflict Write back block Remote Write or Miss due to address conflict Invalid Shared (read/only) Modified (read/write) CPU Read hit CPU Read CPU Write Place Write Miss on bus CPU Write CPU read hit CPU write hit Exclusive (read/only) CPU Write Place Write Miss on Bus? CPU Read hit Remote Read Write back block –Shared-> Modified, need invalidate only (upgrade request), don’t read memory Berkeley Protocol –Clean exclusive state (no miss for private data on write) MESI Protocol –Cache supplies data when shared state (no memory access) Illinois Protocol Place read miss on bus Place Write Miss on Bus