CC protocol for blocking caches

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November.
Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.
Constructive Computer Architecture Tutorial 8 Final Project Part 2: Coherence Sizhuo Zhang TA Nov 25, 2015T08-1http://csg.csail.mit.edu/6.175.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Constructive Computer Architecture Cache Coherence Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology Includes.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
Multiprocessors – Locks
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Constructive Computer Architecture Arvind
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
Caches-2 Constructive Computer Architecture Arvind
Realistic Memories and Caches
Constructive Computer Architecture Tutorial 6 Cache & Exception
CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches
Realistic Memories and Caches
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Project Part 2: Coherence
Caches-2 Constructive Computer Architecture Arvind
Notation Addresses are ordered triples:
The University of Adelaide, School of Computer Science
Cache Coherence Constructive Computer Architecture Arvind
CMSC 611: Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 2: Snooping-Based Coherence
Constructive Computer Architecture Tutorial 7 Final Project Overview
Realistic Memories and Caches
Cache Coherence Constructive Computer Architecture Arvind
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Constructive Computer Architecture Arvind
Lecture 21: Synchronization and Consistency
Caches-2 Constructive Computer Architecture Arvind
Lecture: Coherence and Synchronization
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
Caches and store buffers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 25: Multiprocessors
Lecture 4: Synchronization
Lecture 25: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 22 Synchronization Krste Asanovic Electrical Engineering and.
Cache Coherence Constructive Computer Architecture Arvind
Lecture 18: Coherence and Synchronization
Caches-2 Constructive Computer Architecture Arvind
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

CC protocol for blocking caches Extension to the cache rules for Blocking L1 design discussed in lecture L15 Code is somewhat simplified by assuming that cache-line size = one word Some pseudo syntax is used November 20, 2017 http://www.csg.csail.mit.edu/6.175

Processing misses: Requests and Responses Cache 1,5,8 3,5,7 Cache 1,5,8 3,5,7 3 5 1 2 4 1 Up-req send (cache) 2 Up-req proc, Up resp send (memory) 3 Up-resp proc (cache) 4 Dn-req send (memory) 5 Dn-req proc, Dn resp send (cache) 6 Dn-resp proc (memory) 7 Dn-req proc, drop (cache) 8 Voluntary Dn-resp (cache) 6 Memory 2,6 2,4 November 20, 2017 http://www.csg.csail.mit.edu/6.175

Req method hit processing method Action req(MemReq r) if(mshr == Ready); let a = r.addr; let hit = contains(state, a); if(hit) begin let slot = getSlot(state, a); let x = dataArray[slot]; if(r.op == Ld) hitQ.enq(x); else // it is store if (isStateM(state[slot]) dataArray[slot] <= r.data; else begin missReq <= r; mshr <= SendFillReq; missSlot <= slot; end end else begin missReq <= r; mshr <= StartMiss; end // (1) endmethod PP P c2m m2c L1 p2m m2p November 20, 2017 http://www.csg.csail.mit.edu/6.175

Start-miss and Send-fill rules Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule startMiss(mshr == StartMiss); let slot = findVictimSlot(state, missReq.addr); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]); let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(<Resp, c->m, a, I, d>); end mshr <= SendFillReq; missSlot <= slot; endrule rule sendFillReq (mshr == SendFillReq); // must have a reserved slot to receive reply let upg = (missReq.op == Ld)? S : M; c2m.enq(<Req, c->m, missReq.addr, upg, - >); mshr <= WaitFillResp; endrule // (1) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Wait-fill rule and Proc Resp rule Rdy -> StrtMiss -> SndFillReq -> WaitFillResp -> Resp -> Rdy rule waitFillResp ((mshr == WaitFillResp) &&& (m2c.firstResp matches <Resp, m->c, .a, .cs, .d>)); let slot = missSlot; dataArray[slot] <= (missReq.op == Ld)? d : missReq.data; state[slot] <= (cs, a); m2c.deqResp; mshr <= Resp; endrule // (3) Is there a problem with waitFill? What if the hitQ is blocked? Should we not at least write it in the cache? rule sendProc(mshr == Resp); if(missReq.op == Ld) begin let slot = missSlot; c2p.enq(dataArray[slot]); end mshr <= Ready; endrule November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent Responds rule parentResp (c2m.firstReq matches <Req,.c->m,.a,.y,.*>); let slot = getSlot(state, a); // in a 2-level // system a has to be present in the memory let statea = state[slot]; if((i≠c, isCompatible(statea.dir[i],y)) && (statea.waitc[c]=No)) begin let d =(statea.dir[c]=I)? dataArray[slot]: -); m2c.enq(<Resp, m->c, a, y, d>); state[slot].dir[c]:=y; c2m.deq; end endrule IsCompatible(M, M) = False IsCompatible(M, S) = False IsCompatible(S, M) = False All other cases = True November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent (Downgrade) Requests rule dwn (c2m.firstReq matches <Req,c->m,.a,.y,.*>); let slot = getSlot(state, a); let statea = state[slot]; if (findChild2Dwn(statea) matches (Valid .i)) begin state[slot].waitc[i] <= Yes; m2c.enq(<Req, m->i, a, (y==M?I:S), ? >); end; endrule // (4) This rule will execute as long some child cache is not compatible with the incoming request November 20, 2017 http://www.csg.csail.mit.edu/6.175

Parent receives Response rule dwnRsp (c2m.firstResp matches <Resp, c->m, .a, .y, .data>); c2m.deqResp; let slot = getSlot(state, a); let statea = state[slot]; if(statea.dir[c]=M) dataArray[slot]<=data; state[slot].dir[c]<=y; state[slot].waitc[c]<=No; endrule // (6) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Child Responds Incoming downgrade requests in L1 rule dng ((mshr != Resp) &&& (m2c.firstReq matches <Req,m->c,.a,.y,.*>); let slot = getSlot(state,a); if(getCacheState(state[slot])>y) begin let d = (isStateM(state[slot])? dataArray[slot]: -); c2m.enq(<Resp, c->m, a, y, d>); state[slot] <= (y,a); end // the address has already been downgraded m2c.deqReq; endrule // (5) and (7) November 20, 2017 http://www.csg.csail.mit.edu/6.175

Child Voluntarily downgrades rule startMiss(mshr == Ready); let slot = findVictimSlot(state); if(!isStateI(state[slot])) begin // write-back (Evacuate) let a = getAddr(state[slot]); let d = (isStateM(state[slot])? dataArray[slot]: -); state[slot] <= (I, _); c2m.enq(<Resp, c->m, a, I, d>); end endrule // (8) Rules 1 to 8 are complete - cover all possibilities and cannot deadlock or violate cache invariants November 20, 2017 http://www.csg.csail.mit.edu/6.175

MSI protocol: some issues It never makes sense to have two outstanding requests for the same address from the same processor/cache It is possible to have multiple requests for the same address from different processors. Hence there is a need to arbitrate requests A cache needs to be able to evict an address in order to make room for a different address Voluntary downgrade Memory system (higher-level cache) should be able to force a lower-level cache to downgrade caches need to keep track of the state of their children’s caches November 20, 2017 http://www.csg.csail.mit.edu/6.175

Invariants for a CC-protocol design Directory state is always a conservative estimate of a child’s state E.g., if directory thinks that a child cache is in S state then the cache has to be in either I or S state For every request there is a corresponding response, though sometimes it is generated even before the request is processed Communication system has to ensure that responses cannot be blocked by requests a request cannot overtake a response for the same address At every merger point for requests, we will assume fair arbitration to avoid starvation November 20, 2017 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve & Store-conditional Special register(s) to hold reservation flag and address, and the outcome of store-conditional Load-reserve R, m: <flag, adr>  <1, m>; R  M[m]; Store-conditional m, R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; try: Load-reserve Rhead, head spin: Load Rtail, tail if Rhead==Rtail goto spin Load R, (Rhead) Rhead = Rhead + 1 Store-conditional head, Rhead if (status==fail) goto try process(R) The corresponding instructions in RISC V are called lr and sc, respectively November 20, 2017 http://www.csg.csail.mit.edu/6.175

Nonblocking Synchronization Load-reserve R, (m): <flag, adr>  <1, m>; R  M[m]; Store-conditional (m), R: if <flag, adr> == <1, m> then cancel other procs’ reservation on m; M[m] R; status succeed; else status fail; The flag is cleared in other processors on a Store using the CC protocol’s invalidation mechanism Usually address m is not remembered by Load-reserve; the flag is cleared on any invalidation works as long as the Load-reserve instructions are not used in a nested manner These instructions won’t work properly if Loads and Stores can be reordered dynamically November 20, 2017 http://www.csg.csail.mit.edu/6.175