Cache Coherence Constructive Computer Architecture Arvind

Slides:

Advertisements

Similar presentations

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.

Advertisements

1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)

1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.

May 11, ACL2 Panel: What is the Future of Theorem Proving? Arvind Computer Science & Artificial Intelligence Laboratory.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.

ECE669 L20: Evaluation and Message Passing April 13, 2004 ECE 669 Parallel Computer Architecture Lecture 20 Evaluation and Message Passing.

1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.

1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.

NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.

1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.

April 18, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 19: Directory-Based Cache Protocols Krste Asanovic Electrical Engineering.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November.

1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Includes.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.

Operating systems Deadlocks.

PROTOCOL CORRECTNESS Tutorial 3 Theoretical

The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.

Concurrency Control.

12.4 Memory Organization in Multiprocessor Systems

Multiprocessor Cache Coherency

Double-Ended Priority Queues

Caches-2 Constructive Computer Architecture Arvind

Notation Addresses are ordered triples:

Cache Coherence Constructive Computer Architecture Arvind

CMSC 611: Advanced Computer Architecture

Krste Asanovic Electrical Engineering and Computer Sciences

Example Cache Coherence Problem

Architecture of Parallel Computers CSC / ECE 506 Summer 2006 Scalable Programming Models Lecture 11 6/19/2006 Dr Steve Hunter.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 9: Directory-Based Examples II

CS5102 High Performance Computer Systems Distributed Shared Memory

Lecture 2: Snooping-Based Coherence

Constructive Computer Architecture Tutorial 7 Final Project Overview

Cache Coherence Constructive Computer Architecture Arvind

Inter Process Communication (IPC)

Interconnect with Cache Coherency Manager

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Multiprocessors - Flynn’s taxonomy (1966)

Cache Coherence Constructive Computer Architecture Arvind

Lecture 8: Directory-Based Cache Coherence

CC protocol for blocking caches

Lecture 7: Directory-Based Cache Coherence

Caches-2 Constructive Computer Architecture Arvind

Operating systems Deadlocks.

Chapter 15 : Concurrency Control

Mutual Exclusion CS p0 CS p1 p2 CS CS p3.

Caches and store buffers

Lecture 25: Multiprocessors

Lecture 9: Directory-Based Examples

High Performance Computing

Lecture 8: Directory-Based Examples

Typically for using the shared memory the processes should:

Lecture 25: Multiprocessors

Lecture 24: Virtual Memory, Multiprocessors

Lecture 24: Multiprocessors

Lecture: Coherence Topics: wrap-up of snooping-based coherence,

Update : about 8~16% are writes

Coherent caches Adapted from a lecture by Ian Watson, University of Machester.

Cache Coherence Constructive Computer Architecture Arvind

Lecture 18: Coherence and Synchronization

CSE 486/586 Distributed Systems Cache Coherence

Caches-2 Constructive Computer Architecture Arvind

Lecture 10: Directory-Based Examples II

Multiprocessors and Multi-computers

Presentation transcript:

Cache Coherence Constructive Computer Architecture Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 21, 2014 http://www.csg.csail.mit.edu/6.175

Further issues Are these rules enough, i.e., complete? Effect of blocking vs non-blocking caches Communication systems and buffer requirements to avoid deadlocks November 21, 2014 http://www.csg.csail.mit.edu/6.175

Are the rules exhaustive? Parent rules 2. Parent to Child: Upgrade-to-y response (j, m.waitc[j][a]=No) & c2m.msg=<Req,cm,a,y,-> & (i≠c, IsCompatible(m.child[i][a],y))  m2c.enq(<Resp, mc, a, y, (if (m.child[c][a]=I) then m.data[a] else -)>); m.child[c][a]:=y; c2m.deq; What if the guard fails because 1.some child is not in compatible state? or 2. some child is in wait state? if condition 1 holds then rule 4 can be invoked if condition 2 holds then rule 4 must have been invoked and the each child will eventually send a response November 21, 2014 http://www.csg.csail.mit.edu/6.175

Is every rule necessary? Consider rule 7 for cache 7. Child receiving downgrade-to-y request (m2c.msg=<Req, mc, a, y, - >) & (c.state[a]≤y)  m2c.deq; A downgrade request comes but the cache is already in the downgraded state Can happen because of voluntary downgrade 8. Child to Parent: Downgrade-to-y response (vol) (c.waitp[a]=No) & (c.state[a]>y)  c2m.enq(<Resp, c->m, a, y, (if (c.state[a]=M) then c.data[a] else - )>); c.state[a]:=y; November 21, 2014 http://www.csg.csail.mit.edu/6.175

More rules? How about a voluntary upgrade rule from parent? Parent to Child: Upgrade-to-S response (vol) (m.waitc[c][a]=No) & (m.cstate[c][a]=S)  m2c.enq(<Resp, m->c, a, M, -); m.cstate[c][a]:=M; The child could have simultaneously evicted the line, in which case the parent eventually makes m.cstate[c][a] = I while the child makes its c.state[a] = M. This breaks our invariant A cc protocol is like a Swiss watch, even the smallest change can easily (and usually does) introduce bugs November 21, 2014 http://www.csg.csail.mit.edu/6.175

More rules? How about a “silent drop” 8a. Child to Parent: Downgrade-S-to-I response (vol) (c.waitp[a]=No) & (c.state[a]=S)  c2m.enq(<Resp, c->m, a, y, (if (c.state[a]=M) then c.data[a] else - )>); c.state[a]:=I; November 21, 2014 http://www.csg.csail.mit.edu/6.175

A Directory-based Protocol an abstract view interconnect PP P c2m m2c L1 p2m m2p m in out Each cache has 2 pairs of queues (c2m, m2c) to communicate with the memory (p2m, m2p) to communicate with the processor Message format: <cmd, srcdst, a, s, data> FIFO message passing between each (srcdst) pair except a Req cannot block a Resp Req messages from p to m cannot block Req messages from m to p Req/Resp address state November 21, 2014 http://www.csg.csail.mit.edu/6.175

Communication Network Interconnect P P P P L1 L1 L1 L1 Mem Two virtual networks: For requests and responses from cache to memory For requests and responses from memory to caches Each network has H and L priority messages - a L message can never block an H message other than that messages are delivered in FIFO order November 21, 2014 http://www.csg.csail.mit.edu/6.175

H and L Priority Messages At the memory, unprocessed request messages cannot block reply messages. H and L messages can share the same wires but must have separate queues H L An L message can be processed only if H queue is empty November 21, 2014 http://www.csg.csail.mit.edu/6.175

FIFO property of queues If FIFO property is not enforced, then the protocol can either deadlock or update with wrong data A deadlock scenario: msg1: Child 1 requests (I -> M) upgrade msg2: Parent responds to Child 1 with upgrade (I -> M) msg3: Child 2 requests (I -> M) upgrade msg4: Parent requests Child 1 (M -> I) downgrade msg4 overtakes msg2 Child 1 sees msg4 and drops it Parent never gets a response from Child 1 for msg4 November 21, 2014 http://www.csg.csail.mit.edu/6.175

Deadlocks due to buffer space A cache or memory always accepts a response, thus responses will always drain from the network From the children to the parent, two buffers are needed to implement the H-L priority. A child’s req can be blocked and generate more requests From parent to all the children, just one buffer is needed for both requests and responses because a parent’s req only generates responses November 21, 2014 http://www.csg.csail.mit.edu/6.175