Cache Coherence Constructive Computer Architecture Arvind

Slides:



Advertisements
Similar presentations
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Advertisements

1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
May 11, ACL2 Panel: What is the Future of Theorem Proving? Arvind Computer Science & Artificial Intelligence Laboratory.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
ECE669 L20: Evaluation and Message Passing April 13, 2004 ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing.
1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
April 18, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November.
1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Includes.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
Operating systems Deadlocks.
PROTOCOL CORRECTNESS Tutorial 3 Theoretical
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
Concurrency Control.
12.4 Memory Organization in Multiprocessor Systems
Multiprocessor Cache Coherency
Double-Ended Priority Queues
Caches-2 Constructive Computer Architecture Arvind
Notation Addresses are ordered triples:
Cache Coherence Constructive Computer Architecture Arvind
CMSC 611: Advanced Computer Architecture
Krste Asanovic Electrical Engineering and Computer Sciences
Example Cache Coherence Problem
Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 9: Directory-Based Examples II
CS5102 High Performance Computer Systems Distributed Shared Memory
Lecture 2: Snooping-Based Coherence
Constructive Computer Architecture Tutorial 7 Final Project Overview
Cache Coherence Constructive Computer Architecture Arvind
Inter Process Communication (IPC)
Interconnect with Cache Coherency Manager
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Multiprocessors - Flynn’s taxonomy (1966)
Cache Coherence Constructive Computer Architecture Arvind
Lecture 8: Directory-Based Cache Coherence
CC protocol for blocking caches
Lecture 7: Directory-Based Cache Coherence
Caches-2 Constructive Computer Architecture Arvind
Operating systems Deadlocks.
Chapter 15 : Concurrency Control
Mutual Exclusion CS p0 CS p1 p2 CS CS p3.
Caches and store buffers
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
High Performance Computing
Lecture 8: Directory-Based Examples
Typically for using the shared memory the processes should:
Lecture 25: Multiprocessors
Lecture 24: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Update : about 8~16% are writes
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Cache Coherence Constructive Computer Architecture Arvind
Lecture 18: Coherence and Synchronization
CSE 486/586 Distributed Systems Cache Coherence
Caches-2 Constructive Computer Architecture Arvind
Lecture 10: Directory-Based Examples II
Multiprocessors and Multi-computers
Presentation transcript:

Cache Coherence Constructive Computer Architecture Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 21, 2014 http://www.csg.csail.mit.edu/6.175

Further issues Are these rules enough, i.e., complete? Effect of blocking vs non-blocking caches Communication systems and buffer requirements to avoid deadlocks November 21, 2014 http://www.csg.csail.mit.edu/6.175

Are the rules exhaustive? Parent rules 2. Parent to Child: Upgrade-to-y response (j, m.waitc[j][a]=No) & c2m.msg=<Req,cm,a,y,-> & (i≠c, IsCompatible(m.child[i][a],y))  m2c.enq(<Resp, mc, a, y, (if (m.child[c][a]=I) then m.data[a] else -)>); m.child[c][a]:=y; c2m.deq; What if the guard fails because 1.some child is not in compatible state? or 2. some child is in wait state? if condition 1 holds then rule 4 can be invoked if condition 2 holds then rule 4 must have been invoked and the each child will eventually send a response November 21, 2014 http://www.csg.csail.mit.edu/6.175

Is every rule necessary? Consider rule 7 for cache 7. Child receiving downgrade-to-y request (m2c.msg=<Req, mc, a, y, - >) & (c.state[a]≤y)  m2c.deq; A downgrade request comes but the cache is already in the downgraded state Can happen because of voluntary downgrade 8. Child to Parent: Downgrade-to-y response (vol) (c.waitp[a]=No) & (c.state[a]>y)  c2m.enq(<Resp, c->m, a, y, (if (c.state[a]=M) then c.data[a] else - )>); c.state[a]:=y; November 21, 2014 http://www.csg.csail.mit.edu/6.175

More rules? How about a voluntary upgrade rule from parent? Parent to Child: Upgrade-to-S response (vol) (m.waitc[c][a]=No) & (m.cstate[c][a]=S)  m2c.enq(<Resp, m->c, a, M, -); m.cstate[c][a]:=M; The child could have simultaneously evicted the line, in which case the parent eventually makes m.cstate[c][a] = I while the child makes its c.state[a] = M. This breaks our invariant A cc protocol is like a Swiss watch, even the smallest change can easily (and usually does) introduce bugs November 21, 2014 http://www.csg.csail.mit.edu/6.175

More rules? How about a “silent drop” 8a. Child to Parent: Downgrade-S-to-I response (vol) (c.waitp[a]=No) & (c.state[a]=S)  c2m.enq(<Resp, c->m, a, y, (if (c.state[a]=M) then c.data[a] else - )>); c.state[a]:=I; November 21, 2014 http://www.csg.csail.mit.edu/6.175

A Directory-based Protocol an abstract view interconnect PP P c2m m2c L1 p2m m2p m in out Each cache has 2 pairs of queues (c2m, m2c) to communicate with the memory (p2m, m2p) to communicate with the processor Message format: <cmd, srcdst, a, s, data> FIFO message passing between each (srcdst) pair except a Req cannot block a Resp Req messages from p to m cannot block Req messages from m to p Req/Resp address state November 21, 2014 http://www.csg.csail.mit.edu/6.175

Communication Network Interconnect P P P P L1 L1 L1 L1 Mem Two virtual networks: For requests and responses from cache to memory For requests and responses from memory to caches Each network has H and L priority messages - a L message can never block an H message other than that messages are delivered in FIFO order November 21, 2014 http://www.csg.csail.mit.edu/6.175

H and L Priority Messages At the memory, unprocessed request messages cannot block reply messages. H and L messages can share the same wires but must have separate queues H L An L message can be processed only if H queue is empty November 21, 2014 http://www.csg.csail.mit.edu/6.175

FIFO property of queues If FIFO property is not enforced, then the protocol can either deadlock or update with wrong data A deadlock scenario: msg1: Child 1 requests (I -> M) upgrade msg2: Parent responds to Child 1 with upgrade (I -> M) msg3: Child 2 requests (I -> M) upgrade msg4: Parent requests Child 1 (M -> I) downgrade msg4 overtakes msg2 Child 1 sees msg4 and drops it Parent never gets a response from Child 1 for msg4 November 21, 2014 http://www.csg.csail.mit.edu/6.175

Deadlocks due to buffer space A cache or memory always accepts a response, thus responses will always drain from the network From the children to the parent, two buffers are needed to implement the H-L priority. A child’s req can be blocked and generate more requests From parent to all the children, just one buffer is needed for both requests and responses because a parent’s req only generates responses November 21, 2014 http://www.csg.csail.mit.edu/6.175