Caches-2 Constructive Computer Architecture Arvind

Slides:

Advertisements

Similar presentations

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Advertisements

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.

Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.

DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Cache Coherence Constructive Computer Architecture Arvind

6.175 Final Project Part 0: Understanding Non-Blocking Caches and Cache Coherency Answers.

Caches-2 Constructive Computer Architecture Arvind

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

Realistic Memories and Caches

CSE 351 Section 9 3/1/12.

Constructive Computer Architecture Tutorial 6 Cache & Exception

Bluespec-3: A non-pipelined processor Arvind

Concurrency properties of BSV methods and rules

Realistic Memories and Caches

Bluespec-6: Modeling Processors

Folded “Combinational” circuits

Scheduling Constraints on Interface methods

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Caches-2 Constructive Computer Architecture Arvind

Notation Addresses are ordered triples:

Cache Coherence Constructive Computer Architecture Arvind

Pipelining combinational circuits

Constructive Computer Architecture: Guards

Constructive Computer Architecture Tutorial 7 Final Project Overview

Pipelining combinational circuits

Realistic Memories and Caches

Cache Coherence Constructive Computer Architecture Arvind

Constructive Computer Architecture: Well formed BSV programs

Modules with Guarded Interfaces

Pipelining combinational circuits

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Cache Coherence Constructive Computer Architecture Arvind

Adapted from slides by Sally McKee Cornell University

CC protocol for blocking caches

Bluespec-3: A non-pipelined processor Arvind

Caches-2 Constructive Computer Architecture Arvind

Modular Refinement - 2 Arvind

Caches and store buffers

Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics

Constructive Computer Architecture: Guards

Control Hazards Constructive Computer Architecture: Arvind

CS 3410, Spring 2014 Computer Science Cornell University

Modeling Processors Arvind

Modular Refinement Arvind

Bluespec SystemVerilog™

Modular Refinement Arvind

Realistic Memories and Caches

Cache Coherence Constructive Computer Architecture Arvind

Caches III CSE 351 Spring 2019 Instructor: Ruth Anderson

Presentation transcript:

Caches-2 Constructive Computer Architecture Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking vs. Non-Blocking cache At most one outstanding miss Cache must wait for memory to respond Cache does not accept requests in the meantime Non-blocking cache: Multiple outstanding misses Cache can continue to process requests while waiting for memory to respond to misses We will first design a write-back, Write-miss allocate, Direct-mapped, blocking cache November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking Cache Interface req status memReq mReqQ missReq DRAM or next level cache Processor cache memResp resp hitQ mRespQ interface Cache; method Action req(MemReq r); method ActionValue#(Data) resp; method ActionValue#(MemReq) memReq; method Action memResp(Line r); endinterface November 3, 2014 http://csg.csail.mit.edu/6.175

Interface dynamics The cache either gets a hit and responds immediately, or it gets a miss, in which case it takes several steps to process the miss Reading the response dequeues it Requests and responses follow the FIFO order Methods are guarded, e.g., the cache may not be ready to accept a request because it is processing a miss A status register keeps track of the state of the cache while it is processing a miss typedef enum {Ready, StartMiss, SendFillReq, WaitFillResp} CacheStatus deriving (Bits, Eq); November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking Cache code structure module mkCache(Cache); RegFile#(CacheIndex, Line) dataArray <- mkRegFileFull; … rule startMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(Data) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(Line r) … endmethod; endmodule November 3, 2014 http://csg.csail.mit.edu/6.175

Extracting cache tags & index tag index L 2 Cache size in bytes Byte addresses Processor requests are for a single word but internal communications are in line sizes (2L words, typically L=2) AddrSz = CacheTagSz + CacheIndexSz + LS + 2 Need getIdx, getTag, getOffset functions function CacheIndex getIdx(Addr addr) = truncate(addr>>4); function Bit#(2) getOffset(Addr addr) = truncate(addr >> 2); function CacheTag getTag(Addr addr) = truncateLSB(addr); truncate = truncateMSB November 3, 2014 http://csg.csail.mit.edu/6.175

Blocking cache state elements RegFile#(CacheIndex, Line) dataArray <- mkRegFileFull; RegFile#(CacheIndex, Maybe#(CacheTag)) tagArray <- mkRegFileFull; RegFile#(CacheIndex, Bool) dirtyArray <- mkRegFileFull; Fifo#(1, Data) hitQ <- mkBypassFifo; Reg#(MemReq) missReq <- mkRegU; Reg#(CacheStatus) status <- mkReg(Ready); Fifo#(2, MemReq) memReqQ <- mkCFFifo; Fifo#(2, Line) memRespQ <- mkCFFifo; Tag and valid bits are kept together as a Maybe type CF Fifos are preferable because they provide better decoupling. An extra cycle here may not affect the performance by much November 3, 2014 http://csg.csail.mit.edu/6.175

Req method hit processing It is straightforward to extend the cache interface to include a cacheline flush command method Action req(MemReq r) if(status == Ready); let idx = getIdx(r.addr); let tag = getTag(r.addr); Bit#(2) wOffset = truncate(r.addr >> 2); let currTag = tagArray.sub(idx); let hit = isValid(currTag)? fromMaybe(?,currTag)==tag : False; if(hit) begin let x = dataArray.sub(idx); if(r.op == Ld) hitQ.enq(x[wOffset]); else begin x[wOffset]=r.data; dataArray.upd(idx, x); dirtyArray.upd(idx, True); end else begin missReq <= r; status <= StartMiss; end endmethod overwrite the appropriate word of the line November 3, 2014 http://csg.csail.mit.edu/6.175

Rest of the methods Memory side methods method ActionValue#(Data) resp; hitQ.deq; return hitQ.first; endmethod method ActionValue#(MemReq) memReq; memReqQ.deq; return memReqQ.first; method Action memResp(Line r); memRespQ.enq(r); Memory side methods November 3, 2014 http://csg.csail.mit.edu/6.175

Start-miss and Send-fill rules Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule startMiss(status == StartMiss); let idx = getIdx(missReq.addr); let tag=tagArray.sub(idx); let dirty=dirtyArray.sub(idx); if(isValid(tag) && dirty) begin // write-back let addr = {fromMaybe(?,tag), idx, 4'b0}; let data = dataArray.sub(idx); memReqQ.enq(MemReq{op: St, addr: addr, data: data}); end status <= SendFillReq; endrule Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule sendFillReq (status == SendFillReq); memReqQ.enq(missReq); status <= WaitFillResp; endrule November 3, 2014 http://csg.csail.mit.edu/6.175

Wait-fill rule Ready -> StartMiss -> SendFillReq -> WaitFillResp -> Ready rule waitFillResp(status == WaitFillResp); let idx = getIdx(missReq.addr); let tag = getTag(missReq.addr); let data = memRespQ.first; tagArray.upd(idx, Valid (tag)); if(missReq.op == Ld) begin dirtyArray.upd(idx,False);dataArray.upd(idx, data); hitQ.enq(data[wOffset]); end else begin data[wOffset] = missReq.data; dirtyArray.upd(idx,True); dataArray.upd(idx, data); end memRespQ.deq; status <= Ready; endrule Is there a problem with waitFill? What if the hitQ is blocked? Should we not at least write it in the cache? November 3, 2014 http://csg.csail.mit.edu/6.175

Hit and miss performance Combinational read/write, i.e. 0-cycle response Requires req and resp methods to be concurrently schedulable, which in turn requires hitQ.enq < {hitQ.deq, hitQ.first} i.e., hitQ should be a bypass Fifo Miss No evacuation: memory load latency plus combinational read/write Evacuation: memory store followed by memory load latency plus combinational read/write Adding an extra cycle here and there in the miss case should not have a big negative performance impact November 3, 2014 http://csg.csail.mit.edu/6.175

Non-blocking cache cache req req proc mReq mReqQ Processor cache resp mResp Out-of-Order responses mRespQ Requests have to be tagged because responses come out-of-order (OOO) We will assume that all tags are unique and the processor is responsible for reusing tags properly November 3, 2014 http://csg.csail.mit.edu/6.175

Non-blocking Cache Behavior to be described by 2 concurrent FSMs to process input requests and memory responses, respectively St req goes into StQ and waits until data can be written into the cache req hitQ resp V D W Tag Data St Q Ld Buff load reqs waiting for data An extra bit in the cache to indicate if the data for a line is present wbQ mRespQ November 3, 2014 http://csg.csail.mit.edu/6.175 mReqQ 14

Incoming req Type of request st ld cache state V? In StQ? yes no yes bypass hit Cache state V? StQ empty? yes no yes no hit Cache W? yes no Write in cache Put in StQ Cache state W? Put in LdBuf Put in LdBuf send memReq set W If (evacuate) send wbResp yes no Put in StQ Put in StQ send memReq set W If (evacuate) send wbResp November 3, 2014 http://csg.csail.mit.edu/6.175

Mem Resp (line) 1. Update cache line (set V and unset W) 2. Process all matching ldBuff entries and send responses 3. L: If cachestate(oldest StQ entry address) = V then update the cache word with StQ entry; remove the oldest entry; Loop back to L else if cachestate(oldest StQ entry address) = !W then if(evacuate) wbResp; memReq for this store entry; set W November 3, 2014 http://csg.csail.mit.edu/6.175