Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1
Three-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? The use of magic memories makes this design unrealistic January 19, 2012 L9-2http://csg.csail.mit.edu/SNU
A Simple Memory Model Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) MAGIC RAM ReadData WriteData Address WriteEnable Clock In a real DRAM the data will be available several cycles after the address is supplied January 19, 2012 L9-3http://csg.csail.mit.edu/SNU
Memory Hierarchy size: RegFile << SRAM << DRAM why? latency:RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data fast memory) low latency access miss (data fast memory) long latency access (DRAM) Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM AB holds frequently used data January 19, 2012 L9-4http://csg.csail.mit.edu/SNU
Plan What do simple caches look like Incorporating caches in processor pipleline January 19, 2012 L9-5http://csg.csail.mit.edu/SNU
Data Cache - Interface interface DCache; method Action req(MemReq r); method ActionValue#(MemResp) resp; method ActionValue#(MemReq) memReq; method Action memResp(MemResp r); endinterface cache req guard resp guard memReq guard memResp guard Processor DRAM hitQ mReqQ mRespQ missReq January 19, 2012 L9-6http://csg.csail.mit.edu/SNU
Direct-Mapped Cache Tag Data Block V = Offset TagIndex t k b t HIT Data Word or Byte 2 k lines Block number Block offset What is a bad reference pattern? Strided at size of cache req address January 19, 2012 L9-7http://csg.csail.mit.edu/SNU
Data Cache – code structure module mkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); … rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(MemResp r) … endmethod; endmodule January 19, 2012 L9-8http://csg.csail.mit.edu/SNU
Data Cache state declarations Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU); Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU); FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1; PipeReg#(MemReq) hitQ <- mkPipeReg; Reg#(MemReq) missReq <- mkRegU; Reg#(Bit#(2)) status <- mkReg(0); January 19, 2012 L9-9http://csg.csail.mit.edu/SNU
Data Cache processor-side methods method Action req(MemReq req) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr); Bool valid = vArray[idx]; Bool tagMatch = tagArray[idx]==tag; if(valid && tagMatch && hitQ.notFull) hitQ.enq(req); else begin missReq <= req; status <= 1; end endmethod method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2); if(r.op==St) dataArray[idx] <= r.data; return dataArray[idx]; endmethod January 19, 2012 L9-10http://csg.csail.mit.edu/SNU
Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq; return mReqQ.first; endmethod method Action memResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res); endmethod January 19, 2012 L9-11http://csg.csail.mit.edu/SNU
Data Cache Rule to process a cache-miss rule doMiss (status!=0); Index idx = truncate(missReq.addr>>2); if(status==1 && mReqQ.notFull) begin if(vArray[idx]) mReqQ.enq( MemReq{op:St, addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2; end if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) begin if(vArray[idx]) mRespQ.deq; mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3; end January 19, 2012 L9-12http://csg.csail.mit.edu/SNU
Data Cache Rule to process a cache-miss rule doMiss (status!=0); … if(status==3 && mRespQ.notEmpty && hitQ.notFull) begin let data = mRespQ.first; mRespQ.deq; Tag tag = truncateLSB(missReq.addr); vArray[idx] <= True; tagArray[idx] <= tag; dataArray[idx] <= data; hitQ.enq(missReq); status <= 0; end endrule January 19, 2012 L9-13http://csg.csail.mit.edu/SNU
Five-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? dr er In this organization memory can take any amount of time January 19, 2012 L9-14http://csg.csail.mit.edu/SNU
Five-Stage SMIPS state elements module mkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; rule doProc; … January 19, 2012 L9-15http://csg.csail.mit.edu/SNU
Five-Stage SMIPS instruction fetch rule doProc; Bool iAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end enque instruction fetch request if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc) January 19, 2012 L9-16http://csg.csail.mit.edu/SNU
if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin let dInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq; end Five-Stage SMIPS decode decode the fetched instruction January 19, 2012 L9-17http://csg.csail.mit.edu/SNU
Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin Bool eStall = … Bool wbStall = … if(!eStall && !wbStall) begin let eInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) begin redirPC … end; er.enq(EBundle{…}; dr.deq; end else dr.deq; end Five-Stage SMIPS code structure for killing wrongly fetched instructions kill successful January 19, 2012 L9-18http://csg.csail.mit.edu/SNU
Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin let dInst = dr.first.dInst; Bool eStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst)); Bool wbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst)); Five-Stage SMIPS stall signal January 19, 2012 L9-19http://csg.csail.mit.edu/SNU
if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC = eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end Five-Stage SMIPS Execute if not stall successful execution January 19, 2012 L9-20http://csg.csail.mit.edu/SNU
if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end Five-Stage SMIPS execute and memory responses to WB January 19, 2012 L9-21http://csg.csail.mit.edu/SNU
if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endrule endmodule Five-Stage SMIPS writeback January 19, 2012 L9-22http://csg.csail.mit.edu/SNU
Summary Lot of room for making errors verification and testing is essential Memory systems or dealing with load latencies is a major aspect of computer design The 5-stage design presented here is different from H&P and is much more realistic next Branch prediction January 19, 2012 L9-23http://csg.csail.mit.edu/SNU