1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Cache Optimization Summary
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
The University of Adelaide, School of Computer Science
1 Lecture 19: Shared-Memory Multiprocessors Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections )
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
Lecture 18: Multiprocessors
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
1 Lecture 24: Multiprocessors Today’s topics:  Directory-based cache coherence protocol  Synchronization  Consistency  Writing parallel programs Reminder:
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
1 Lecture 8: Large Cache Design I Topics: Shared vs. private, centralized vs. decentralized, UCA vs. NUCA, recent papers.
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 Lecture 2: Intro and Snooping Protocols Topics: multi-core cache organizations, programming models, cache coherence (snooping-based)
1 Lecture 18: Shared-Memory Multiprocessors Topics: coherence protocols for symmetric shared-memory multiprocessors (Sections )
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
1 Lecture 1: Introduction Course organization:  13 lectures on parallel architectures  ~5 lectures on cache coherence, consistency  ~3 lectures on TM.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
April 18, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Lecture 1: Introduction and Memory Systems CS 7810 Course organization:  5 lectures on memory systems  5 lectures on cache coherence and consistency.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Lecture 7: PCM, Cache coherence
Lecture 13: Multiprocessors Kai Bu
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
Cache Coherence CS433 Spring 2001 Laxmikant Kale.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
1 Lecture: Coherence Protocols Topics: snooping-based protocols.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
1 Lecture 7: Implementing Cache Coherence Topics: implementation details.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Lecture 13: Multiprocessors Kai Bu
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
תרגול מס' 5: MESI Protocol
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Lecture 1: Parallel Architecture Intro
Lecture 5: Snooping Protocol Design Issues
Lecture 25: Multiprocessors
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 3: Coherence Protocols
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
CSE 486/586 Distributed Systems Cache Coherence
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection networks  4 lectures on caches  4 lectures on memory systems  4 lectures on core design  2 lectures on parallel algorithms  5 lectures: student paper presentations  2 lectures: student project presentations CS for those that want to sign up for 1 credit

2 Logistics Texts: Parallel Computer Architecture, Culler, Singh, Gupta (a more recent reference is Fundamentals of Parallel Computer Architecture, Yan Solihin) Principles and Practices of Interconnection Networks, Dally & Towles Introduction to Parallel Algorithms and Architectures, Leighton Transactional Memory, Larus & Rajwar Memory Systems: Cache, DRAM, Disk, Jacob et al. Multi-Core Cache Hierarchies, Balasubramonian et al.

3 More Logistics Projects: simulation-based, creative, teams of up to 4 students, be prepared to spend time towards middle and end of semester – more details on simulators in a few weeks Final project report due in late April (will undergo conference-style peer reviewing); also watch out for workshop deadlines for ISCA Grading:  70% project  10% paper presentation  20% take-home final

4 Multi-Core Cache Organizations P C P C P C P C P C P C P C P C CCCCCC CCCCCC Private L1 caches Shared L2 cache Bus between L1s and single L2 cache controller Snooping-based coherence between L1s

5 Multi-Core Cache Organizations Private L1 caches Shared L2 cache, but physically distributed Scalable network Directory-based coherence between L1s P C P C P C P C P C P C P C P C

6 Multi-Core Cache Organizations Private L1 caches Shared L2 cache, but physically distributed Bus connecting the four L1s and four L2 banks Snooping-based coherence between L1s P C P C P C P C

7 Multi-Core Cache Organizations Private L1 caches Private L2 caches Scalable network Directory-based coherence between L2s (through a separate directory) P C P C P C P C P C P C P C P C D

8 Shared-Memory Vs. Message Passing Shared-memory  single copy of (shared) data in memory  threads communicate by reading/writing to a shared location Message-passing  each thread has a copy of data in its own private memory that other threads cannot access  threads communicate by passing values with SEND/ RECEIVE message pairs

9 Cache Coherence A multiprocessor system is cache coherent if a value written by a processor is eventually visible to reads by other processors – write propagation two writes to the same location by two processors are seen in the same order by all processors – write serialization

10 Cache Coherence Protocols Directory-based: A single location (directory) keeps track of the sharing status of a block of memory Snooping: Every cache block is accompanied by the sharing status of that block – all cache controllers monitor the shared bus so they can update the sharing status of the block, if necessary  Write-invalidate: a processor gains exclusive access of a block before writing by invalidating all other copies  Write-update: when a processor writes, it updates other shared copies of that block

11 Protocol-I MSI 3-state write-back invalidation bus-based snooping protocol Each block can be in one of three states – invalid, shared, modified (exclusive) A processor must acquire the block in exclusive state in order to write to it – this is done by placing an exclusive read request on the bus – every other cached copy is invalidated When some other processor tries to read an exclusive block, the block is demoted to shared

12 Design Issues, Optimizations When does memory get updated?  demotion from modified to shared?  move from modified in one cache to modified in another? Who responds with data? – memory or a cache that has the block in exclusive state – does it help if sharers respond? We can assume that bus, memory, and cache state transactions are atomic – if not, we will need more states A transition from shared to modified only requires an upgrade request and no transfer of data

13 Reporting Snoop Results In a multiprocessor, memory has to wait for the snoop result before it chooses to respond – need 3 wired-OR signals: (i) indicates that a cache has a copy, (ii) indicates that a cache has a modified copy, (iii) indicates that the snoop has not completed Ensuring timely snoops: the time to respond could be fixed or variable (with the third wired-OR signal) Tags are usually duplicated if they are frequently accessed by the processor (regular ld/sts) and the bus (snoops)

14 4 and 5 State Protocols Multiprocessors execute many single-threaded programs A read followed by a write will generate bus transactions to acquire the block in exclusive state even though there are no sharers (leads to MESI protocol) Also, to promote cache-to-cache sharing, a cache must be designated as the responder (leads to MOESI protocol) Note that we can optimize protocols by adding more states – increases design/verification complexity

15 MESI Protocol The new state is exclusive-clean – the cache can service read requests and no other cache has the same block When the processor attempts a write, the block is upgraded to exclusive-modified without generating a bus transaction When a processor makes a read request, it must detect if it has the only cached copy – the interconnect must include an additional signal that is asserted by each cache if it has a valid copy of the block When a block is evicted, a block may be exclusive-clean, but will not realize it

16 MOESI Protocol The first reader or the last writer are usually designated as the owner of a block The owner is responsible for responding to requests from other caches There is no need to update memory when a block transitions from M  S state The block in O state is responsible for writing back a dirty block when it is evicted

17 Title Bullet