Modular Refinement - 2 Arvind

Slides:



Advertisements
Similar presentations
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Advertisements

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Control Hazards Constructive Computer Architecture: Arvind
Bluespec-6: Modeling Processors
Bluespec-6: Modules and Interfaces
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Sequential Circuits: Constructive Computer Architecture
Caches-2 Constructive Computer Architecture Arvind
Multistage Pipelined Processors and modular refinement
Performance Specifications
Pipelining combinational circuits
Constructive Computer Architecture: Guards
Pipelining combinational circuits
Modular Refinement Arvind
Modular Refinement Arvind
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Bluespec-7: Scheduling & Rule Composition
Control Hazards Constructive Computer Architecture: Arvind
Modeling Processors: Concurrency Issues
Modules with Guarded Interfaces
Pipelining combinational circuits
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Multistage Pipelined Processors and modular refinement
Modular Refinement Arvind
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Caches-2 Constructive Computer Architecture Arvind
IP Lookup: Some subtle concurrency issues
Elastic Pipelines: Concurrency Issues
Pipelined Processors Arvind
Elastic Pipelines and Basics of Multi-rule Systems
Bluespec-5: Modeling Processors
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Bluespec-7: Scheduling & Rule Composition
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Simple Synchronous Pipelines
IP Lookup: Some subtle concurrency issues
Tutorial 4: RISCV modules Constructive Computer Architecture
Modeling Processors Arvind
Modeling Processors Arvind
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
Caches-2 Constructive Computer Architecture Arvind
Bluespec-8: Modules and Interfaces
Presentation transcript:

Modular Refinement - 2 Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 25, 2009 http://csg.csail.mit.edu/6.375

Successive refinement & Modular Structure fetch execute iMem rf CPU decode memory pc write- back dMem fetch & decode execute pc rf CPU bu Can we derive the 5-stage pipeline by successive refinement of a 2-stage pipeline? February 25, 2009 http://csg.csail.mit.edu/6.375

A Modular organization CPU iMem RFile rf FIFO bu dMem modules call each other (recursive) execute WB fetch & decode setPc pc stall enqIt Suppose we include rf and pc in Fetch and bu in Execute Fetch delivers decoded instructions to Execute and needs to consult Execute for the stall condition Execute writes back data in rf and supplies the pc value in case of a branch misprediction February 25, 2009 http://csg.csail.mit.edu/6.375

Recursive modular organization module mkCPU2#(Mem iMem, Mem dMem)(); Execute execute <- mkExecute(dMem, fetch); Fetch fetch <- mkFetch(iMem, execute); endmodule interface Fetch; method Action setPC (Iaddress cpc); method Action writeback (RName dst, Value v); endinterface interface Execute; method Action enqIt(InstTemplate it); method Bool stall(Instr instr) recursive calls This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instuction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. Unfortunately, recursive module syntax is not as simple February 25, 2009 http://csg.csail.mit.edu/6.375 8 8 6 8

Fetch Module module mkFetch#(IMem iMem, Execute execute) (Fetch); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkBypassRegFile(); rule fetch_and_decode (!execute.stall(instr)); execute.enqIt(newIt(instr,rf)); pc <= predIa; endrule method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

Execute Module module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) bu <- mkSLoopyFifo(findf); InstTemplate it = bu.first; rule execute … method Action enqIt(InstTemplate it); bu.enq(it); endmethod method Bool stall(Instr instr); return (stallFunc(instr, bu)); endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

Execute Module Rule rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin fetch.writeback(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin fetch.setPC(av); bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin fetch.writeback(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); endcase endrule February 25, 2009 http://csg.csail.mit.edu/6.375

Fetch Module Refinement Separating Fetch and Decode module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(Instr) buD <- mkLoopyFIFO(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; rule fetch(True); pc <= predIa; buD.enq(instr); endrule rule decode(!execute.stall(buD.first())); execute.enqIt(newIt(buD.first(),rf)); buD.deq(); method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; buD.clear(); endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

Execute Module Refinement module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) bu <- mkSLoopyFifo(findf); InstTemplate it = bu.first; rule execute … method Action enqIt(InstTemplate it); bu.enq(it); endmethod method Bool stall(Instr instr); return (stallFunc(instr, bu)); endmodule Introduce two more queues and break this rule into execute, memory operation and commit rules February 25, 2009 http://csg.csail.mit.edu/6.375

Execute Module Refinement module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) buE <- mkSLoopyFifo(findfE); SFIFO#(InstTemplateM) buM <- mkSLoopyFIFO(findfM); SFIFO#(InstTemplateC) buC <- mkSLoopyFIFO(findfC); rule execute … rule memory … rule commit … method Action enqIt(InstTemplate it); buE.enq(it); endmethod method Bool stall(Instr instr); return(stallFuncE(instr, buE) && stallFuncM(instr, buM) && stallFuncC(instr, buC)); endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

New Types for Instruction Templates typedef union tagged { struct {RName dst; Value val;} MWB; void MNop; struct {RName dst; IAddress addr;} MLoad; struct {IAddress addr; Value val} MStore; } InstTemplateM deriving(Bits, Eq); struct {RName dst; Value val;} CWB; void CNop; InstTemplateC deriving(Bits, Eq); February 25, 2009 http://csg.csail.mit.edu/6.375

The New Execute Rule Does not do mem-ops and writebacks rule execute(True); case (buE.first()) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin buM.enq(tagged MWB{dst:rd, val:va+vb}); buE.deq();end tagged EBz {cond:0,addr:.av}: begin fetch.setPC(av); buE.clear(); buM.enq(tagged MNop{});end tagged EBz {cond:.cv,addr:.av}: begin buM.enq(tagged MNop{}); buE.deq(); end tagged ELoad{dst:.rd,addr:.av}: begin buM.enq(tagged MLoad{dst:rd,addr:av}); tagged EStore{value:.vv,addr:.av}: begin buM.enq(tagged MStore{addr:av,val:vv}); endcase endrule February 25, 2009 http://csg.csail.mit.edu/6.375

The Memory Rule rule memory(True); case (buM.first()) matches tagged MWB{dst:.rd,val:.v}: begin buC.enq(tagged CWB{dst:rd, val:v}); buM.deq(); end tagged MNop {}: begin buC.enq(tagged CNop); tagged MLoad{dst:.rd,addr:.av}: begin buC.enq(tagged CWB{dst:rd, val:dMem.read(av)}); buM.deq();end tagged MStore{addr:.av, val:vv}}: begin buC.enq(tagged CNop{}); dMem.write(av,vv); buM.deq();end endcase endrule February 25, 2009 http://csg.csail.mit.edu/6.375

The Commit Rule rule commit(True); case (buC.first()) matches tagged CWB{dst:.rd,val:.v}: fetch.writeback(rd,v); tagged CNop {}: noAction; endcase buC.deq(); endrule February 25, 2009 http://csg.csail.mit.edu/6.375

Stall Functions Same as before, save variation in types in Queue. February 25, 2009 http://csg.csail.mit.edu/6.375

Synchronous Refinements pc rf fetch decode execute memory write- back iMem dMem CPU Realistic processor Pipelines often have synchronous pieces connected asynchronously, i.e., using FIFOs February 25, 2009 http://csg.csail.mit.edu/6.375

Sync. Execute Module module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) buE <- mkSLoopyFifo(findf); Reg#(Maybe#(InstTemplateM) buRegM <- mkReg(Invalid); Reg#(Maybe#(InstTemplateC) buRegC <- mkReg(Invalid); rule executePipe … method Action enqIt(InstTemplate it); buE.enq(it); endmethod method Bool stall(Instr instr); return(stallFuncE(instr, buE) && stallFuncM(instr, buRegM) && stallFuncC(instr, buRegC)); endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

Sync. Execute Pipeline From Lecture 5 rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inQ.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endcase endrule Use the following Mapping: inQ -> buE sReg1 -> buRegM sReg2 -> buRegC outQ.enq -> fetch.writeback February 25, 2009 http://csg.csail.mit.edu/6.375

Sync. Execute Pipeline rule executePipe (True); if (buE.notEmpty()) begin buRegM <= Valid f1(buE.first()); buE.deq(); end else buRegM <= Invalid; case (buRegM) matches tagged Valid .sx1: buRegC <= Valid f2(sx1); tagged Invalid: buRegC <= Invalid; endcase case (buRegC) matches tagged Valid .sx2: fetch.writeback(f3(sx2)); endcase endrule Exec Action Mem Action Commit Action In Processor pipelines, the stage functions require additional side effects (actions) e.g. setPC/dMem.write/etc. For clarity, we will show these actions one by one February 25, 2009 http://csg.csail.mit.edu/6.375

Execute Action “Extra effects” if (buE.notEmpty()) begin buRegM <= Valid f1(buE.first()); buE.deq(); end else buRegM <= Invalid; case (buE.first()) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin buRegM <= Valid MWB{dst:rd, val:va+vb}; buE.deq();end tagged EBz {cond:0,addr:.av}: begin fetch.setPC(av); buE.clear(); buRegM <= Valid MNop{};end tagged EBz {cond:.cv,addr:.av}: begin buRegM <= Valid MNop{}; buE.deq(); end tagged ELoad{dst:.rd,addr:.av}: begin buRegM <= Valid MLoad{dst:rd,addr:av}; buE.deq();end tagged EStore{value:.vv,addr:.av}: begin buRegM <= Valid MStore{addr:av,val:vv}; buE.deq();end endcase “Extra effects” February 25, 2009 http://csg.csail.mit.edu/6.375

Memory Action case (itM) matches tagged MWB{dst:.rd,val:.v}: case (buRegM) matches tagged Valid .itM: buRegC <= Valid f2(itM); tagged Invalid: buRegC <= Invalid; case (itM) matches tagged MWB{dst:.rd,val:.v}: buRegC <= tagged CWB{dst:rd, val:v}; tagged MNop {}: buRegC <= tagged CNop {}; tagged MLoad{dst:.rd,addr:.av}: buRegC <= tagged CWB{dst:rd, val:dMem.read(av)}; tagged MStore{addr:.av, val:vv}}: begin buRegC <= tagged CNop{}; dMem.write(av,vv);end endcase “Extra effects” February 25, 2009 http://csg.csail.mit.edu/6.375

Commit Action case (itC) matches tagged CWB{dst:.rd,val:.v}: case (buRegC) matches tagged Valid .itC: fetch.writeback(f3(itC)); endcase case (itC) matches tagged CWB{dst:.rd,val:.v}: fetch.writeback(rd,v); tagged CNop {}: noAction; endcase February 25, 2009 http://csg.csail.mit.edu/6.375

Sync. Execute Pipeline Execute Action; Memory Action; Commit Action; rule executePipe (True); if (buE.notEmpty()) begin buRegM <= Valid f1(buE.first()); buE.deq(); end else buRegM <= Invalid; case (buRegM) matches tagged Valid .sx1: buRegC <= Valid f2(sx1); tagged Invalid: buRegC <= Invalid; endcase case (buRegC) matches tagged Valid .sx2: fetch.writeback(f3(sx2)); endcase endrule Execute Action; Memory Action; Commit Action; yes February 25, 2009 http://csg.csail.mit.edu/6.375

Refining the Memory Subsystem Non-combinational memory reads February 25, 2009 http://csg.csail.mit.edu/6.375

Realistic memory systems No combinational reads: The request to read the memory is a different event from the event when the response comes back The response to the request should be treated as an asynchronous event because its time can vary (cache misses) Request and response events or actions have to be dealt in different rules iMem.read(pc) or dMem.read(addr) are no longer read methods – rules using them have to be split up February 25, 2009 http://csg.csail.mit.edu/6.375

Request-Response Interface for Synchronous Memory Data Ack Ready req deq peek Addr Ready ctr (ctr > 0) ctr++ ctr-- deq Enable enq Synch Mem Latency N interface Mem#(type addrT, type dataT); method Action req(addrT x); method Action deq(); method dataT peek(); endinterface Making a synchronous component latency- insensitive February 25, 2009 http://csg.csail.mit.edu/6.375

Request-Response Memory Interface interface Mem#(type addrT, type dataT); method Action req(MemReq#(addrT,dataT) x); method Action deq(); method dataT peek(); endinterface typedef union tagged { struct {addrT addr; dataT val;} Write; struct {addrT addr;} Read; } MemReq#(type addrT, type dataT) deriving(Bits, Eq); We will assume that the memory processes read and write requests in the order received February 25, 2009 http://csg.csail.mit.edu/6.375

Non-combinational reads dMem.read(a) execute write-back dMem execute dMem.req(a) dMem.peek/deq write-back dMem execute dMem.req(a) write-back dMem.peek/deq dMem February 25, 2009 http://csg.csail.mit.edu/6.375

The Modified Execute Rule rule execute(True); case (buE.first()) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin buC.enq(tagged CWB{dst:rd, val:va+vb}); buE.deq();end tagged EBz {cond:0,addr:.av}: begin fetch.setPC(av); buE.clear(); buC.enq(tagged CNop{});end tagged EBz {cond:.cv,addr:.av}: begin buC.enq(tagged CNop{}); buE.deq(); end tagged ELoad{dst:.rd,addr:.av}: begin buC.enq(tagged CLoad{dst:rd}); dMem.req(Read{addr:av}); tagged EStore{value:.vv,addr:.av}: dMem.req(Write{addr:av, val:vv}); endcase endrule February 25, 2009 http://csg.csail.mit.edu/6.375

The modified Commit Rule rule commit(True); case (buC.first()) matches tagged CWB{dst:.rd,val:.v}: fetch.writeback(rd,v); tagged CNop {}: noAction; tagged CLoad{dst:.rd}: begin fetch.writeback(rd,dMem.peek()); dMem.deq(); end endcase buC.deq(); endrule February 25, 2009 http://csg.csail.mit.edu/6.375

Fetch Module Are any changes needed in the methods? module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(void) buD <- mkLoopyFIFO(); Instr instr = iMem.peek(); Iaddress predIa = pc + 1; … rule fetch(True); pc <= predIa; iMem.req(Read{addr:pc}); buD.enq(void); endrule rule decode (!execute.stall(instr)); execute.enqIt(newIt(instr,rf)); iMem.deq(); buD.deq(); method Action setPC … method Action writeback … endmodule Are any changes needed in the methods? February 25, 2009 http://csg.csail.mit.edu/6.375

Fetch Module Refinement module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(void) buD <- mkFIFO(); … Iaddress predIa = pc + 1; method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; buD.clear(); endmodule Does this work? Must kill or drain the instruction fetches already in progress February 25, 2009 http://csg.csail.mit.edu/6.375

Fetch Module module mkFetch#(IMem iMem, Execute execute) (Fetch) Reg#(Bool) drainingIMem <- mkReg(False); … rule fetch(!drainingIMem); pc <= predIa; iMem.req(Read{addr:pc}); buD.enq(void); endrule rule decode(!drainingIMem && !execute.stall(instr)); iMem.deq(); buD.deq(); execute.enqIt(newIt(instr,rf)); rule drain(drainingIMem); if(buD.notEmpty)begin iMem.deq(); buD.deq(); end else drainingIMem <= False; method Action setPC(Iaddress newPC); pc <= newPC; drainingIMem <= buD.notEmpty(); endmethod method Action writeback … endmodule February 25, 2009 http://csg.csail.mit.edu/6.375

Power of Modular Refinement Any Fetch-Decode module refinements we’ve shown, will work with each of the Execute module refinements we’ve shown Some refinements require interface changes (e.g. magic memory vs. realistic memory) The Correctness of refinements often depends on extra-linguistic properties (architectural reasoning) It should be easy to add caches to new memory abstraction February 25, 2009 http://csg.csail.mit.edu/6.375