Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1

Slides:

Advertisements

Similar presentations

Multi-cycle Implementations Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 13, 2012L6-1

Advertisements

Data Hazards and Multistage Pipeline Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 18, 2012L8-1.

Computer Architecture: A Constructive Approach Instruction Representation Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.

Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.

Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Constructive Computer Architecture Store Buffers and Non-blocking Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.

6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Caches-2 Constructive Computer Architecture Arvind

Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.

Realistic Memories and Caches

Constructive Computer Architecture Tutorial 6 Cache & Exception

6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…

Bluespec-3: A non-pipelined processor Arvind

Control Hazards Constructive Computer Architecture: Arvind

Realistic Memories and Caches

Bluespec-6: Modeling Processors

Branch Prediction Constructive Computer Architecture: Arvind

Branch Prediction Constructive Computer Architecture: Arvind

Caches-2 Constructive Computer Architecture Arvind

Multistage Pipelined Processors and modular refinement

in Pipelined Processors

Non-Pipelined Processors - 2

Realistic Memories and Caches

Modular Refinement Arvind

Modular Refinement Arvind

Non-Pipelined and Pipelined Processors

in Pipelined Processors

Multi-cycle SMIPS Implementations

Control Hazards Constructive Computer Architecture: Arvind

Branch Prediction Constructive Computer Architecture: Arvind

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Multistage Pipelined Processors and modular refinement

Modular Refinement Arvind

Realistic Memories and Caches

Bluespec-3: A non-pipelined processor Arvind

Branch Prediction: Direction Predictors

Caches-2 Constructive Computer Architecture Arvind

Modular Refinement - 2 Arvind

Caches and store buffers

Multistage Pipelined Processors and modular refinement

Pipelined Processors Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Modular Refinement Arvind

Realistic Memories and Caches

Branch Predictor Interface

Caches-2 Constructive Computer Architecture Arvind

Presentation transcript:

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1

Three-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? The use of magic memories makes this design unrealistic January 19, 2012 L9-2http://csg.csail.mit.edu/SNU

A Simple Memory Model Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) MAGIC RAM ReadData WriteData Address WriteEnable Clock In a real DRAM the data will be available several cycles after the address is supplied January 19, 2012 L9-3http://csg.csail.mit.edu/SNU

Memory Hierarchy size: RegFile << SRAM << DRAM why? latency:RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data  fast memory)  low latency access miss (data  fast memory)  long latency access (DRAM) Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM AB holds frequently used data January 19, 2012 L9-4http://csg.csail.mit.edu/SNU

Plan What do simple caches look like Incorporating caches in processor pipleline January 19, 2012 L9-5http://csg.csail.mit.edu/SNU

Data Cache - Interface interface DCache; method Action req(MemReq r); method ActionValue#(MemResp) resp; method ActionValue#(MemReq) memReq; method Action memResp(MemResp r); endinterface cache req guard resp guard memReq guard memResp guard Processor DRAM hitQ mReqQ mRespQ missReq January 19, 2012 L9-6http://csg.csail.mit.edu/SNU

Direct-Mapped Cache Tag Data Block V = Offset TagIndex t k b t HIT Data Word or Byte 2 k lines Block number Block offset What is a bad reference pattern? Strided at size of cache req address January 19, 2012 L9-7http://csg.csail.mit.edu/SNU

Data Cache – code structure module mkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); … rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(MemResp r) … endmethod; endmodule January 19, 2012 L9-8http://csg.csail.mit.edu/SNU

Data Cache state declarations Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU); Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU); FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1; PipeReg#(MemReq) hitQ <- mkPipeReg; Reg#(MemReq) missReq <- mkRegU; Reg#(Bit#(2)) status <- mkReg(0); January 19, 2012 L9-9http://csg.csail.mit.edu/SNU

Data Cache processor-side methods method Action req(MemReq req) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr); Bool valid = vArray[idx]; Bool tagMatch = tagArray[idx]==tag; if(valid && tagMatch && hitQ.notFull) hitQ.enq(req); else begin missReq <= req; status <= 1; end endmethod method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2); if(r.op==St) dataArray[idx] <= r.data; return dataArray[idx]; endmethod January 19, 2012 L9-10http://csg.csail.mit.edu/SNU

Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq; return mReqQ.first; endmethod method Action memResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res); endmethod January 19, 2012 L9-11http://csg.csail.mit.edu/SNU

Data Cache Rule to process a cache-miss rule doMiss (status!=0); Index idx = truncate(missReq.addr>>2); if(status==1 && mReqQ.notFull) begin if(vArray[idx]) mReqQ.enq( MemReq{op:St, addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2; end if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) begin if(vArray[idx]) mRespQ.deq; mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3; end January 19, 2012 L9-12http://csg.csail.mit.edu/SNU

Data Cache Rule to process a cache-miss rule doMiss (status!=0); … if(status==3 && mRespQ.notEmpty && hitQ.notFull) begin let data = mRespQ.first; mRespQ.deq; Tag tag = truncateLSB(missReq.addr); vArray[idx] <= True; tagArray[idx] <= tag; dataArray[idx] <= data; hitQ.enq(missReq); status <= 0; end endrule January 19, 2012 L9-13http://csg.csail.mit.edu/SNU

Five-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? dr er In this organization memory can take any amount of time January 19, 2012 L9-14http://csg.csail.mit.edu/SNU

Five-Stage SMIPS state elements module mkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; rule doProc; … January 19, 2012 L9-15http://csg.csail.mit.edu/SNU

Five-Stage SMIPS instruction fetch rule doProc; Bool iAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end enque instruction fetch request if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc) January 19, 2012 L9-16http://csg.csail.mit.edu/SNU

if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin let dInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq; end Five-Stage SMIPS decode decode the fetched instruction January 19, 2012 L9-17http://csg.csail.mit.edu/SNU

Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin Bool eStall = … Bool wbStall = … if(!eStall && !wbStall) begin let eInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) begin redirPC … end; er.enq(EBundle{…}; dr.deq; end else dr.deq; end Five-Stage SMIPS code structure for killing wrongly fetched instructions kill successful January 19, 2012 L9-18http://csg.csail.mit.edu/SNU

Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin let dInst = dr.first.dInst; Bool eStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst)); Bool wbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst)); Five-Stage SMIPS stall signal January 19, 2012 L9-19http://csg.csail.mit.edu/SNU

if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC = eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end Five-Stage SMIPS Execute if not stall successful execution January 19, 2012 L9-20http://csg.csail.mit.edu/SNU

if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end Five-Stage SMIPS execute and memory responses to WB January 19, 2012 L9-21http://csg.csail.mit.edu/SNU

if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endrule endmodule Five-Stage SMIPS writeback January 19, 2012 L9-22http://csg.csail.mit.edu/SNU

Summary Lot of room for making errors  verification and testing is essential Memory systems or dealing with load latencies is a major aspect of computer design The 5-stage design presented here is different from H&P and is much more realistic next Branch prediction January 19, 2012 L9-23http://csg.csail.mit.edu/SNU