Presentation is loading. Please wait.

Presentation is loading. Please wait.

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1

Similar presentations


Presentation on theme: "Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1"— Presentation transcript:

1 Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1 http://csg.csail.mit.edu/SNU

2 Three-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? The use of magic memories makes this design unrealistic January 19, 2012 L9-2http://csg.csail.mit.edu/SNU

3 A Simple Memory Model Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) MAGIC RAM ReadData WriteData Address WriteEnable Clock In a real DRAM the data will be available several cycles after the address is supplied January 19, 2012 L9-3http://csg.csail.mit.edu/SNU

4 Memory Hierarchy size: RegFile << SRAM << DRAM why? latency:RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data  fast memory)  low latency access miss (data  fast memory)  long latency access (DRAM) Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM AB holds frequently used data January 19, 2012 L9-4http://csg.csail.mit.edu/SNU

5 Plan What do simple caches look like Incorporating caches in processor pipleline January 19, 2012 L9-5http://csg.csail.mit.edu/SNU

6 Data Cache - Interface interface DCache; method Action req(MemReq r); method ActionValue#(MemResp) resp; method ActionValue#(MemReq) memReq; method Action memResp(MemResp r); endinterface cache req guard resp guard memReq guard memResp guard Processor DRAM hitQ mReqQ mRespQ missReq January 19, 2012 L9-6http://csg.csail.mit.edu/SNU

7 Direct-Mapped Cache Tag Data Block V = Offset TagIndex t k b t HIT Data Word or Byte 2 k lines Block number Block offset What is a bad reference pattern? Strided at size of cache req address January 19, 2012 L9-7http://csg.csail.mit.edu/SNU

8 Data Cache – code structure module mkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); … rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod; method Action memResp(MemResp r) … endmethod; endmodule January 19, 2012 L9-8http://csg.csail.mit.edu/SNU

9 Data Cache state declarations Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU); Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU); FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1; PipeReg#(MemReq) hitQ <- mkPipeReg; Reg#(MemReq) missReq <- mkRegU; Reg#(Bit#(2)) status <- mkReg(0); January 19, 2012 L9-9http://csg.csail.mit.edu/SNU

10 Data Cache processor-side methods method Action req(MemReq req) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr); Bool valid = vArray[idx]; Bool tagMatch = tagArray[idx]==tag; if(valid && tagMatch && hitQ.notFull) hitQ.enq(req); else begin missReq <= req; status <= 1; end endmethod method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2); if(r.op==St) dataArray[idx] <= r.data; return dataArray[idx]; endmethod January 19, 2012 L9-10http://csg.csail.mit.edu/SNU

11 Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq; return mReqQ.first; endmethod method Action memResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res); endmethod January 19, 2012 L9-11http://csg.csail.mit.edu/SNU

12 Data Cache Rule to process a cache-miss rule doMiss (status!=0); Index idx = truncate(missReq.addr>>2); if(status==1 && mReqQ.notFull) begin if(vArray[idx]) mReqQ.enq( MemReq{op:St, addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2; end if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) begin if(vArray[idx]) mRespQ.deq; mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3; end January 19, 2012 L9-12http://csg.csail.mit.edu/SNU

13 Data Cache Rule to process a cache-miss rule doMiss (status!=0); … if(status==3 && mRespQ.notEmpty && hitQ.notFull) begin let data = mRespQ.first; mRespQ.deq; Tag tag = truncateLSB(missReq.addr); vArray[idx] <= True; tagArray[idx] <= tag; dataArray[idx] <= data; hitQ.enq(missReq); status <= 0; end endrule January 19, 2012 L9-13http://csg.csail.mit.edu/SNU

14 Five-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? dr er In this organization memory can take any amount of time January 19, 2012 L9-14http://csg.csail.mit.edu/SNU

15 Five-Stage SMIPS state elements module mkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; rule doProc; … January 19, 2012 L9-15http://csg.csail.mit.edu/SNU

16 Five-Stage SMIPS instruction fetch rule doProc; Bool iAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end enque instruction fetch request if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc) January 19, 2012 L9-16http://csg.csail.mit.edu/SNU

17 if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin let dInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq; end Five-Stage SMIPS decode decode the fetched instruction January 19, 2012 L9-17http://csg.csail.mit.edu/SNU

18 Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin Bool eStall = … Bool wbStall = … if(!eStall && !wbStall) begin let eInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) begin redirPC … end; er.enq(EBundle{…}; dr.deq; end else dr.deq; end Five-Stage SMIPS code structure for killing wrongly fetched instructions kill successful January 19, 2012 L9-18http://csg.csail.mit.edu/SNU

19 Addr redirPc = ?; Bool redirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin let dInst = dr.first.dInst; Bool eStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst)); Bool wbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst)); Five-Stage SMIPS stall signal January 19, 2012 L9-19http://csg.csail.mit.edu/SNU

20 if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC = eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end Five-Stage SMIPS Execute if not stall successful execution January 19, 2012 L9-20http://csg.csail.mit.edu/SNU

21 if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end Five-Stage SMIPS execute and memory responses to WB January 19, 2012 L9-21http://csg.csail.mit.edu/SNU

22 if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endrule endmodule Five-Stage SMIPS writeback January 19, 2012 L9-22http://csg.csail.mit.edu/SNU

23 Summary Lot of room for making errors  verification and testing is essential Memory systems or dealing with load latencies is a major aspect of computer design The 5-stage design presented here is different from H&P and is much more realistic next Branch prediction January 19, 2012 L9-23http://csg.csail.mit.edu/SNU


Download ppt "Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 19, 2012L9-1"

Similar presentations


Ads by Google