Modular Refinement Arvind

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Data Hazards and Multistage Pipeline Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 18, 2012L8-1.
Computer Architecture: A Constructive Approach Branch Prediction - 1 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.
Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Elastic Pipelines: Concurrency Issues
Caches-2 Constructive Computer Architecture Arvind
Control Hazards Constructive Computer Architecture: Arvind
Bluespec-6: Modeling Processors
Bluespec-6: Modules and Interfaces
Scheduling Constraints on Interface methods
Tutorial 7: SMIPS Epochs Constructive Computer Architecture
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Branch Prediction Constructive Computer Architecture: Arvind
Branch Prediction Constructive Computer Architecture: Arvind
Caches-2 Constructive Computer Architecture Arvind
Multistage Pipelined Processors and modular refinement
in Pipelined Processors
Non-Pipelined Processors - 2
Modular Refinement Arvind
Modular Refinement Arvind
Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Lab 4 Overview: 6-stage SMIPS Pipeline
Non-Pipelined and Pipelined Processors
in Pipelined Processors
Control Hazards Constructive Computer Architecture: Arvind
Branch Prediction Constructive Computer Architecture: Arvind
6.375 Tutorial 5 RISC-V 6-Stage with Caches Ming Liu
Multistage Pipelined Processors and modular refinement
Modular Refinement Arvind
Realistic Memories and Caches
Branch Prediction: Direction Predictors
Modular Refinement - 2 Arvind
Multistage Pipelined Processors and modular refinement
in Pipelined Processors
Pipelined Processors Arvind
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Tutorial 4: RISCV modules Constructive Computer Architecture
Modeling Processors Arvind
Modeling Processors Arvind
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Branch Predictor Interface
Bluespec-8: Modules and Interfaces
Pipelined Processors: Control Hazards
Presentation transcript:

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 18, 2013 http://csg.csail.mit.edu/6.375

Successive refinement & Modular Structure fetch execute iMem rf CPU decode memory pc write- back dMem fetch & decode execute pc rf CPU bu Can we derive a multi-stage pipeline by successive refinement of a 2-stage pipeline? March 18, 2013 http://csg.csail.mit.edu/6.375

Architectural refinements Separating Fetch and Decode Replace magic memory by multicycle memory Multicycle functional units … Nirav Dave, M.C. Ng, M. Pellauer, Arvind [Memocode 2010] A design flow based on modular refinement March 18, 2013 http://csg.csail.mit.edu/6.375

A 2-stage Processor Pipeline doFetch doExecute d2e redirect Register File Scoreboard remove search insert wr rd1 rd2 Suppose we encapsulate each rule into its own module and restrict the communication between modules only via FIFOs This will require us to include the rf and sb into some module. If rf and sb are included in the Fetch module then Execute must send some extra information to Fetch to update the rf and sb March 18, 2013 http://csg.csail.mit.edu/6.375

A modular structure Fetch & Decode Execute d2e redirect rf sb pc fEpoch iMem dMem wbQ eEpoch We have encapsulated iMem within the Fetch and dMem within the Execute modules Execute instead of updating rf and sb, sends the relevant information to Fetch via wbQ Atomicity requirements for updating rf and sb? An instruction should be deleted from sb only after rf has been updated March 18, 2013 http://csg.csail.mit.edu/6.375

Modular Processor module mkModularProc(Proc); IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkPipelineFifo; Fifo#(Addr) execRedirect <- mkBypassFifo; Fifo#(Tuple2#(Maybe#(FullIndx),Data) wbQ<-mkBypassFifo; Fetch fetch <- mkFetch(iMem, d2e, execRedirect, wbQ); Execute execute <- mkExecute(dMem, d2e, execRedirect, wbQ); endmodule Remember that an address is enqued in execRedirect whenever there is a misprediction. We will assume the eEpoch has been modified on a misprediction and fEpoch would be modified when pc is updated. March 18, 2013 http://csg.csail.mit.edu/6.375 6

Fetch Module module mkFetch(Imemory iMem, Fifo#(2, Decode2Execute) d2e, Fifo#(1, Addr) execRedirect, Fifo#(1, Tuple2#(Maybe#(FullIndx), Data)) wbQ); Reg#(Addr) pc <- mkRegU; Reg#(Bool) fEpoch <- mkReg(False); RFile rf <- mkBypassRFile; Scoreboard#(1) sb <- mkPipelineScoreboard; rule writeback; match {.idx, .val} = wbQ.first; if(isValid(idx)) rf.write(fromMaybe(?, idx), val); wbQ.deq; sb.remove; endrule rule fetch ; if(execRedirect.notEmpty) begin .... endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module continued rule fetch ; if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let inst = iMem.req(pc); let dInst = decode(inst); let ppc = nextAddrPredictor(pc); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); || sb.search3(dInst.dst); if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc, dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.dst); pc <= ppc; end end endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Execute Module module mkExecute(Dmemory dMem, Fifo#(2, Decode2Execute) d2e, Fifo#(1, Addr) execRedirect, Fifo#(1, Tuple2#(Maybe#(FullIndx), Data)) wbQ); Reg#(Bool) eEpoch <- mkReg(False); rule doExecute; let x=d2e.first; let dInst=x.dInst; let pc=x.pc; let ppc=x.ppc; let epoch = x.epoch; let rVal1 = x.rVal1; let rVal2 = x.rVal2; if(epoch == eEpoch) begin let eInst = exec(dInst, rVal1, rVal2, pc, ppc); if(eInst.iType == Ld) eInst.data <- dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?}); else if (eInst.iType == St) let d <- dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data}); wbQ.enq(tuple2(eInst.dst, eInst.data); if(eInst.mispredict) begin execRedirect.enq(eInst.addr); eEpoch <= !eEpoch; end end else wbQ.enq(tuple2(Invalid, ?)); d2e.deq; endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Interface issues For better safety only partial interfaces should to be passed to a module, e.g., Fetch module needs only deq and first methods of execRedirect and wbQ, and enq method of d2e interface FifoEnq#(t); method Action enq(t x); endinterface interface FifoDeq#(t); method Action deq; method t first; endinterface function FifoEnq#(t) getFifoEnq(Fifo#(n, t) f); return interface FifoEnq#(t); method Action enq(t x) = f.enq(x); endinterface endfunction function FifoDeq#(t) getFifoDeq(Fifo#(n, t) f); return interface FifoDeq#(t); method Action deq=f.deq; method t first=f.first; endinterface endfunction module mkModularProc(Proc); Fetch fetch <- mkFetch(iMem, getFifoEnq(d2e), getFifoDeq(execRedirect), getFifoDeq(wbQ)); March 18, 2013 http://csg.csail.mit.edu/6.375

Observation The way we modified the 2-stage code for modularization is very similar to what we did to go from 2-stage to 3-stage code, except that we used pipeline fifo’s instead of bypass fifo’s to communicate from execute to commit stage March 18, 2013 http://csg.csail.mit.edu/6.375

Modular refinement: Separating Fetch and Decode iMem rf decode pc March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module refinement module mkFetch(...iMem, ...d2e, ...execRedirect, ...wbQ); Reg#(Addr) pc <- mkRegU; Reg#(Bool) fEpoch <- mkReg(False); RFile rf <- mkBypassRFile; Scoreboard#(1) sb <- mkPipelineScoreboard; Fifo#(Fetch2Decode) f2d <- mkPipelineFifo; rule writeback; match {.idx, .val} = wbQ.first; if(isValid(idx)) rf.write(fromMaybe(?, idx), val); wbQ.deq; sb.remove; endrule rule fetch ; .... endrule rule decode ; .... endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module: Fetch rule rule fetch ; if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let inst = iMem.req(pc); let ppc = nextAddrPredictor(pc); f2d.enq(Fetch2Decode{pc: pc, ppc: ppc, inst: inst, epoch: fEpoch); pc <= ppc end endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module: Decode rule rule decode ; let x = f2d.first; let inst = x.inst; let inPC = x.pc; let ppc = x.ppc let inEp = x.epoch let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); || sb.search3(dInst.dst); if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: inPC, ppc: ppc, dIinst: dInst, epoch: inEp; rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.dst); f2d.deq end endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Concurrency Can writeback, fetch and decode rules execute concurrently? decode < fetch because of pipelined d2e writeback CF fetch writeback < decode because bypass rf and pipeline sb Interaction between writeback and decode is subtle. What is the issue? Depending upon the choice of d2e, execRedirect and wbQ FIFOs, either Fetch&Decode CF Execute Execute < Fetch&Decode Fetch&Decode < Execute (non-pipelined system) March 18, 2013 http://csg.csail.mit.edu/6.375

Separate refinement Notice our refine Fetch&Decode module should work correctly with the old Execute module or its refinements This is a very important aspect of modular refinements March 18, 2013 http://csg.csail.mit.edu/6.375

Modular refinement: Replace magic memory by multicycle memory pc fetch2 fetch1 iMem March 18, 2013 http://csg.csail.mit.edu/6.375

Memory and Caches Suppose iMem is replaced by a cache which takes 0 or 1 cycle in case of a hit and unknown number of variable cycles on a cache miss View iMem as a request/response system and split the fetch stage into two rules – to send a req and to receive a res PC iMem Decode f2d Epoch Next Addr Pred March 18, 2013 http://csg.csail.mit.edu/6.375

Splitting the fetch stage To split the fetch stage into two rules, insert a bypass FIFO’s to deal with (0,n) cycle memory response PC iMem Decode f2d Epoch Next Addr Pred f12f2 March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module: 2nd refinement PC iMem f2d Epoch Pred f12f2 Fetch Module: 2nd refinement module mkFetch(...iMem,...d2e,...execRedirect,...wbQ); Reg#(Addr) pc <- mkRegU; Reg#(Bool) fEpoch <- mkReg(False); RFile rf <- mkBypassRFile; Scoreboard#(1) sb <- mkPipelineScoreboard; Fifo#(Fetch2Decode) f2d <- mkPipelineFifo; Fifo#(Fetch2Decode) f12f2 <- mkBypassFifo; rule writeback; ... endrule rule fetch1 ; .... endrule rule fetch2 ; .... endrule rule decode ; .... endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module: Fetch1 rule PC iMem f2d Epoch Pred f12f2 Fetch Module: Fetch1 rule rule fetch1 ; if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); pc <= ppc; iCache.req(MemReq{op: Ld, addr: pc, data:?}); f12f2.enq(Fetch2Decoode{pc: pc, ppc: ppc, inst: ?, epoch: fEpoch}); end endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Fetch Module: Fetch2 rule PC iMem f2d Epoch Pred f12f2 Fetch Module: Fetch2 rule rule doFetch2; let inst <- iCache.resp; let x = f12f2.first; x.inst = inst; f12f2.deq; f2d.enq(x); endrule March 18, 2013 http://csg.csail.mit.edu/6.375

Takeaway Modular refinement is a powerful idea; lets different teams work on different modules with only an early implementation of other modules BSV compiler currently does not permit separate compilation of modules with interface parameters Recursive call structure amongst modules is supported by the compiler in a limited way. The syntax is complicated Compiler detects and rejects recursive call structures March 18, 2013 http://csg.csail.mit.edu/6.375