Caches-2 Constructive Computer Architecture Arvind

Slides:



Advertisements
Similar presentations
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Advertisements

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.
Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Cache Coherence Constructive Computer Architecture Arvind
6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.
Caches-2 Constructive Computer Architecture Arvind
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Realistic Memories and Caches
CSE 351 Section 9 3/1/12.
Constructive Computer Architecture Tutorial 6 Cache & Exception
Bluespec-3: A non-pipelined processor Arvind
Concurrency properties of BSV methods and rules
Realistic Memories and Caches
Bluespec-6: Modeling Processors
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Caches-2 Constructive Computer Architecture Arvind
Notation Addresses are ordered triples:
Cache Coherence Constructive Computer Architecture Arvind
Pipelining combinational circuits
Constructive Computer Architecture: Guards
Constructive Computer Architecture Tutorial 7 Final Project Overview
Pipelining combinational circuits
Realistic Memories and Caches
Cache Coherence Constructive Computer Architecture Arvind
Constructive Computer Architecture: Well formed BSV programs
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Cache Coherence Constructive Computer Architecture Arvind
Adapted from slides by Sally McKee Cornell University
CC protocol for blocking caches
Bluespec-3: A non-pipelined processor Arvind
Caches-2 Constructive Computer Architecture Arvind
Modular Refinement - 2 Arvind
Caches and store buffers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Constructive Computer Architecture: Guards
Control Hazards Constructive Computer Architecture: Arvind
CS 3410, Spring 2014 Computer Science Cornell University
Modeling Processors Arvind
Modular Refinement Arvind
Bluespec SystemVerilog™
Modular Refinement Arvind
Realistic Memories and Caches
Cache Coherence Constructive Computer Architecture Arvind
Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson
Presentation transcript:

Caches-2 Constructive Computer Architecture Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking vs. Non-Blocking cache At most one outstanding miss Cache must wait for memory to respond Cache does not accept requests in the meantime Non-blocking cache: Multiple outstanding misses Cache can continue to process requests while waiting for memory to respond to misses We will first design a write-back, Write-miss allocate, Direct-mapped, blocking cache November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking Cache Interface req status memReq mReqQ missReq DRAM or next level cache Processor cache memResp resp hitQ mRespQ interface Cache; method Action req(MemReq r); method ActionValue#(Data) resp; method ActionValue#(MemReq) memReq; method Action memResp(Line r); endinterface November 3, 2014 http://csg.csail.mit.edu/6.175

Interface dynamics The cache either gets a hit and responds immediately, or it gets a miss, in which case it takes several steps to process the miss Reading the response dequeues it Requests and responses follow the FIFO order Methods are guarded, e.g., the cache may not be ready to accept a request because it is processing a miss A status register keeps track of the state of the cache while it is processing a miss typedef enum {Ready, StartMiss, SendFillReq, WaitFillResp} CacheStatus deriving (Bits, Eq); November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking Cache code structure module mkCache(Cache); RegFile#(CacheIndex, Line) dataArray <- mkRegFileFull; … rule startMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(Data) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(Line r) … endmethod; endmodule November 3, 2014 http://csg.csail.mit.edu/6.175

Extracting cache tags & index tag index L 2 Cache size in bytes Byte addresses Processor requests are for a single word but internal communications are in line sizes (2L words, typically L=2) AddrSz = CacheTagSz + CacheIndexSz + LS + 2 Need getIdx, getTag, getOffset functions function CacheIndex getIdx(Addr addr) = truncate(addr>>4); function Bit#(2) getOffset(Addr addr) = truncate(addr >> 2); function CacheTag getTag(Addr addr) = truncateLSB(addr); truncate = truncateMSB November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking cache state elements RegFile#(CacheIndex, Line) dataArray <- mkRegFileFull; RegFile#(CacheIndex, Maybe#(CacheTag)) tagArray <- mkRegFileFull; RegFile#(CacheIndex, Bool) dirtyArray <- mkRegFileFull; Fifo#(1, Data) hitQ <- mkBypassFifo; Reg#(MemReq) missReq <- mkRegU; Reg#(CacheStatus) status <- mkReg(Ready); Fifo#(2, MemReq) memReqQ <- mkCFFifo; Fifo#(2, Line) memRespQ <- mkCFFifo; Tag and valid bits are kept together as a Maybe type CF Fifos are preferable because they provide better decoupling. An extra cycle here may not affect the performance by much November 3, 2014 http://csg.csail.mit.edu/6.175

Req method hit processing It is straightforward to extend the cache interface to include a cacheline flush command method Action req(MemReq r) if(status == Ready); let idx = getIdx(r.addr); let tag = getTag(r.addr); Bit#(2) wOffset = truncate(r.addr >> 2); let currTag = tagArray.sub(idx); let hit = isValid(currTag)? fromMaybe(?,currTag)==tag : False; if(hit) begin let x = dataArray.sub(idx); if(r.op == Ld) hitQ.enq(x[wOffset]); else begin x[wOffset]=r.data; dataArray.upd(idx, x); dirtyArray.upd(idx, True); end else begin missReq <= r; status <= StartMiss; end endmethod overwrite the appropriate word of the line November 3, 2014 http://csg.csail.mit.edu/6.175

Rest of the methods Memory side methods method ActionValue#(Data) resp; hitQ.deq; return hitQ.first; endmethod method ActionValue#(MemReq) memReq; memReqQ.deq; return memReqQ.first; method Action memResp(Line r); memRespQ.enq(r); Memory side methods November 3, 2014 http://csg.csail.mit.edu/6.175

Start-miss and Send-fill rules Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule startMiss(status == StartMiss); let idx = getIdx(missReq.addr); let tag=tagArray.sub(idx); let dirty=dirtyArray.sub(idx); if(isValid(tag) && dirty) begin // write-back let addr = {fromMaybe(?,tag), idx, 4'b0}; let data = dataArray.sub(idx); memReqQ.enq(MemReq{op: St, addr: addr, data: data}); end status <= SendFillReq; endrule Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule sendFillReq (status == SendFillReq); memReqQ.enq(missReq); status <= WaitFillResp; endrule November 3, 2014 http://csg.csail.mit.edu/6.175

Wait-fill rule Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule waitFillResp(status == WaitFillResp); let idx = getIdx(missReq.addr); let tag = getTag(missReq.addr); let data = memRespQ.first; tagArray.upd(idx, Valid (tag)); if(missReq.op == Ld) begin dirtyArray.upd(idx,False);dataArray.upd(idx, data); hitQ.enq(data[wOffset]); end else begin data[wOffset] = missReq.data; dirtyArray.upd(idx,True); dataArray.upd(idx, data); end memRespQ.deq; status <= Ready; endrule Is there a problem with waitFill? What if the hitQ is blocked? Should we not at least write it in the cache? November 3, 2014 http://csg.csail.mit.edu/6.175

Hit and miss performance Combinational read/write, i.e. 0-cycle response Requires req and resp methods to be concurrently schedulable, which in turn requires hitQ.enq < {hitQ.deq, hitQ.first} i.e., hitQ should be a bypass Fifo Miss No evacuation: memory load latency plus combinational read/write Evacuation: memory store followed by memory load latency plus combinational read/write Adding an extra cycle here and there in the miss case should not have a big negative performance impact November 3, 2014 http://csg.csail.mit.edu/6.175

Non-blocking cache cache req req proc mReq mReqQ Processor cache resp mResp Out-of-Order responses mRespQ Requests have to be tagged because responses come out-of-order (OOO) We will assume that all tags are unique and the processor is responsible for reusing tags properly November 3, 2014 http://csg.csail.mit.edu/6.175

Non-blocking Cache Behavior to be described by 2 concurrent FSMs to process input requests and memory responses, respectively St req goes into StQ and waits until data can be written into the cache req hitQ resp V D W Tag Data St Q Ld Buff load reqs waiting for data An extra bit in the cache to indicate if the data for a line is present wbQ mRespQ November 3, 2014 http://csg.csail.mit.edu/6.175 mReqQ 14

Incoming req Type of request st ld cache state V? In StQ? yes no yes bypass hit Cache state V? StQ empty? yes no yes no hit Cache W? yes no Write in cache Put in StQ Cache state W? Put in LdBuf Put in LdBuf send memReq set W If (evacuate) send wbResp yes no Put in StQ Put in StQ send memReq set W If (evacuate) send wbResp November 3, 2014 http://csg.csail.mit.edu/6.175

Mem Resp (line) 1. Update cache line (set V and unset W) 2. Process all matching ldBuff entries and send responses 3. L: If cachestate(oldest StQ entry address) = V then update the cache word with StQ entry; remove the oldest entry; Loop back to L else if cachestate(oldest StQ entry address) = !W then if(evacuate) wbResp; memReq for this store entry; set W November 3, 2014 http://csg.csail.mit.edu/6.175