CC protocol for blocking caches

Slides:

Advertisements

Similar presentations

Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.

Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.

© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.

CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.

1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November.

Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.

Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Includes.

1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.

Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.

Multiprocessors – Locks

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Cache Coherence Constructive Computer Architecture Arvind

6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.

Caches-2 Constructive Computer Architecture Arvind

Realistic Memories and Caches

Constructive Computer Architecture Tutorial 6 Cache & Exception

CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches

Realistic Memories and Caches

Lecture 19: Coherence and Synchronization

Lecture 18: Coherence and Synchronization

Project Part 2: Coherence

Caches-2 Constructive Computer Architecture Arvind

Notation Addresses are ordered triples:

The University of Adelaide, School of Computer Science

Cache Coherence Constructive Computer Architecture Arvind

CMSC 611: Advanced Computer Architecture

CMSC 611: Advanced Computer Architecture

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Krste Asanovic Electrical Engineering and Computer Sciences

Lecture 2: Snooping-Based Coherence

Constructive Computer Architecture Tutorial 7 Final Project Overview

Realistic Memories and Caches

Cache Coherence Constructive Computer Architecture Arvind

Designing Parallel Algorithms (Synchronization)

Lecture 5: Snooping Protocol Design Issues

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Cache Coherence Constructive Computer Architecture Arvind

Lecture 21: Synchronization and Consistency

Caches-2 Constructive Computer Architecture Arvind

Lecture: Coherence and Synchronization

Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP

Caches and store buffers

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Lecture 25: Multiprocessors

Lecture 4: Synchronization

Lecture 25: Multiprocessors

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture 24: Virtual Memory, Multiprocessors

CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches

Lecture 24: Multiprocessors

Lecture: Coherence, Synchronization

Lecture: Coherence Topics: wrap-up of snooping-based coherence,

Lecture 17 Multiprocessors and Thread-Level Parallelism

Lecture: Coherence and Synchronization

Lecture 19: Coherence and Synchronization

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 22 Synchronization Krste Asanovic Electrical Engineering and.

Cache Coherence Constructive Computer Architecture Arvind

Lecture 18: Coherence and Synchronization

Caches-2 Constructive Computer Architecture Arvind

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

CC protocol for blocking caches Extension to the cache rules for Blocking L1 design discussed in lecture L15 Code is somewhat simplified by assuming that cache-line size = one word Some pseudo syntax is used November 20, 2017 http://www.csg.csail.mit.edu/6.175

Processing misses: Requests and Responses Cache 1,5,8 3,5,7 Cache 1,5,8 3,5,7 3 5 1 2 4 1 Up-req send (cache) 2 Up-req proc, Up resp send (memory) 3 Up-resp proc (cache) 4 Dn-req send (memory) 5 Dn-req proc, Dn resp send (cache) 6 Dn-resp proc (memory) 7 Dn-req proc, drop (cache) 8 Voluntary Dn-resp (cache) 6 Memory 2,6 2,4 November 20, 2017 http://www.csg.csail.mit.edu/6.175

Req method hit processing method Action req(MemReq r) if(mshr == Ready); let a = r.addr; let hit = contains(state, a); if(hit) begin let slot = getSlot(state, a); let x = dataArray[slot]; if(r.op == Ld) hitQ.enq(x); else // it is store if (isStateM(state[slot]) dataArray[slot] <= r.data; else begin missReq <= r; mshr <= SendFillReq; missSlot <= slot; end end else begin missReq <= r; mshr <= StartMiss; end // (1) endmethod PP P c2m m2c L1 p2m m2p November 20, 2017 http://www.csg.csail.mit.edu/6.175

Start-miss and Send-fill rules Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule startMiss(mshr == StartMiss); let slot = findVictimSlot(state, missReq.addr); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]); let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(<Resp, c->m, a, I, d>); end mshr <= SendFillReq; missSlot <= slot; endrule rule sendFillReq (mshr == SendFillReq); // must have a reserved slot to receive reply let upg = (missReq.op == Ld)? S : M; c2m.enq(<Req, c->m, missReq.addr, upg, - >); mshr <= WaitFillResp; endrule // (1) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Wait-fill rule and Proc Resp rule Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule waitFillResp ((mshr == WaitFillResp) &&& (m2c.firstResp matches <Resp, m->c, .a, .cs, .d>)); let slot = missSlot; dataArray[slot] <= (missReq.op == Ld)? d : missReq.data; state[slot] <= (cs, a); m2c.deqResp; mshr <= Resp; endrule // (3) Is there a problem with waitFill? What if the hitQ is blocked? Should we not at least write it in the cache? rule sendProc(mshr == Resp); if(missReq.op == Ld) begin let slot = missSlot; c2p.enq(dataArray[slot]); end mshr <= Ready; endrule November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent Responds rule parentResp (c2m.firstReq matches <Req,.c->m,.a,.y,.*>); let slot = getSlot(state, a); // in a 2-level // system a has to be present in the memory let statea = state[slot]; if((i≠c, isCompatible(statea.dir[i],y)) && (statea.waitc[c]=No)) begin let d =(statea.dir[c]=I)? dataArray[slot]: -); m2c.enq(<Resp, m->c, a, y, d>); state[slot].dir[c]:=y; c2m.deq; end endrule IsCompatible(M, M) = False IsCompatible(M, S) = False IsCompatible(S, M) = False All other cases = True November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent (Downgrade) Requests rule dwn (c2m.firstReq matches <Req,c->m,.a,.y,.*>); let slot = getSlot(state, a); let statea = state[slot]; if (findChild2Dwn(statea) matches (Valid .i)) begin state[slot].waitc[i] <= Yes; m2c.enq(<Req, m->i, a, (y==M?I:S), ? >); end; endrule // (4) This rule will execute as long some child cache is not compatible with the incoming request November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent receives Response rule dwnRsp (c2m.firstResp matches <Resp, c->m, .a, .y, .data>); c2m.deqResp; let slot = getSlot(state, a); let statea = state[slot]; if(statea.dir[c]=M) dataArray[slot]<=data; state[slot].dir[c]<=y; state[slot].waitc[c]<=No; endrule // (6) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Child Responds Incoming downgrade requests in L1 rule dng ((mshr != Resp) &&& (m2c.firstReq matches <Req,m->c,.a,.y,.*>); let slot = getSlot(state,a); if(getCacheState(state[slot])>y) begin let d = (isStateM(state[slot])? dataArray[slot]: -); c2m.enq(<Resp, c->m, a, y, d>); state[slot] <= (y,a); end // the address has already been downgraded m2c.deqReq; endrule // (5) and (7) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Child Voluntarily downgrades rule startMiss(mshr == Ready); let slot = findVictimSlot(state); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]); let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(<Resp, c->m, a, I, d>); end endrule // (8) Rules 1 to 8 are complete - cover all possibilities and cannot deadlock or violate cache invariants November 20, 2017 http://www.csg.csail.mit.edu/6.175

MSI protocol: some issues It never makes sense to have two outstanding requests for the same address from the same processor/cache It is possible to have multiple requests for the same address from different processors. Hence there is a need to arbitrate requests A cache needs to be able to evict an address in order to make room for a different address Voluntary downgrade Memory system (higher-level cache) should be able to force a lower-level cache to downgrade caches need to keep track of the state of their children’s caches November 20, 2017 http://www.csg.csail.mit.edu/6.175

Invariants for a CC-protocol design Directory state is always a conservative estimate of a child’s state E.g., if directory thinks that a child cache is in S state then the cache has to be in either I or S state For every request there is a corresponding response, though sometimes it is generated even before the request is processed Communication system has to ensure that responses cannot be blocked by requests a request cannot overtake a response for the same address At every merger point for requests, we will assume fair arbitration to avoid starvation November 20, 2017 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve & Store-conditional Special register(s) to hold reservation flag and address, and the outcome of store-conditional Load-reserve R, m: <flag, adr>  <1, m>; R  M[m]; Store-conditional m, R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; try: Load-reserve Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead = Rhead + 1 Store-conditional head, Rhead if (status==fail) goto try process(R) The corresponding instructions in RISC V are called lr and sc, respectively November 20, 2017 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve R, (m): <flag, adr>  <1, m>; R  M[m]; Store-conditional (m), R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; The flag is cleared in other processors on a Store using the CC protocol’s invalidation mechanism Usually address m is not remembered by Load-reserve; the flag is cleared on any invalidation works as long as the Load-reserve instructions are not used in a nested manner These instructions won’t work properly if Loads and Stores can be reordered dynamically November 20, 2017 http://www.csg.csail.mit.edu/6.175