Bluespec-6: Modeling Processors

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

Advertisements

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Elastic Pipelines: Concurrency Issues

February 28, 2005http://csg.csail.mit.edu/6.884/L09-1 Bluespec-3: Modules & Interfaces Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Bluespec-3: A non-pipelined processor Arvind

Concurrency properties of BSV methods and rules

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modules and Interfaces

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

Caches-2 Constructive Computer Architecture Arvind

Performance Specifications

Pipelining combinational circuits

Multirule Systems and Concurrent Execution of Rules

Constructive Computer Architecture: Guards

Sequential Circuits Constructive Computer Architecture Arvind

Pipelining combinational circuits

Modular Refinement Arvind

Modular Refinement Arvind

EHR: Ephemeral History Register

Constructive Computer Architecture: Well formed BSV programs

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Pipelining combinational circuits

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

Caches-2 Constructive Computer Architecture Arvind

Modular Refinement - 2 Arvind

IP Lookup: Some subtle concurrency issues

Elastic Pipelines: Concurrency Issues

Elastic Pipelines and Basics of Multi-rule Systems

Bluespec-5: Modeling Processors

Constructive Computer Architecture: Guards

Elastic Pipelines and Basics of Multi-rule Systems

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

IP Lookup: Some subtle concurrency issues

Bluespec-5: Scheduling & Rule Composition

Tutorial 4: RISCV modules Constructive Computer Architecture

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Implementing for Correct Concurrency

Modular Refinement Arvind

Caches-2 Constructive Computer Architecture Arvind

Bluespec-8: Modules and Interfaces

Presentation transcript:

Bluespec-6: Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 10, 2005 http://csg.csail.mit.edu/6.375/

Instruction set typedef enum {R0;R1;R2;…;R31} RName; typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName cond; RName addr;} Bz; struct {RName dst; RName addr;} Load; struct {RName value; RName addr;} Store } Instr deriving(Bits, Eq); typedef Bit#(32) Iaddress; typedef Bit#(32) Daddress; typedef Bit#(32) Value; An instruction set can be implemented using many different microarchitectures March 10, 2006 http://csg.csail.mit.edu/6.375/

Processors Non-pipelined processor Two-stage pipeline Performance issues In the rest of the talk, I want to say a little bit more about how to describe hardware as an TRS Next I will explain how to synthesize these operation-centric descriptions. I have a few more design that I can show you in the result section. Finally, I will say a fews things about related work and future work March 10, 2006 http://csg.csail.mit.edu/6.375/

Non-pipelined Processor pc rf CPU fetch & execute iMem dMem module mkCPU#(Mem iMem, Mem dMem)(); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; rule fetch_Execute ... endmodule March 10, 2006 http://csg.csail.mit.edu/6.375/

Non-pipelined processor rule rule fetch_Execute (True); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: begin rf.upd(rd, rf[ra]+rf[rb]); pc <= predIa end tagged Bz {cond:.rc,addr:.ra}: begin pc <= (rf[rc]==0) ? rf[ra] : predIa; tagged Load {dest:.rd,addr:.ra}: begin rf.upd(rd, dMem.read(rf[ra])); pc <= predIa; tagged Store {value:.rv,addr:.ra}: begin dMem.write(rf[ra],rf[rv]); pc <= predIa; endcase endrule my syntax rf[r]  rf.sub(r) Assume “magic memory”, i.e. responds to a read request in the same cycle and a write updates the memory at the end of the cycle March 10, 2006 http://csg.csail.mit.edu/6.375/

Processor Pipelines and FIFOs pc rf decode execute memory write- back fetch iMem dMem CPU March 10, 2006 http://csg.csail.mit.edu/6.375/

SFIFO (glue between stages) interface SFIFO#(type t, type tr); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Bool find(tr); // search FIFO endinterface n enab enq rdy not full n = # of bits needed to represent the values of type “t“ m = # of bits needed values of type “tr" enab rdy deq SFIFO module not empty n first not empty rdy enab more on searchable FIFOs later clear bool m find March 10, 2006 http://csg.csail.mit.edu/6.375/

Two-Stage Pipeline module mkCPU#(Mem iMem, Mem dMem)(Empty); fetch & decode execute pc rf CPU bu module mkCPU#(Mem iMem, Mem dMem)(Empty); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull(); SFIFO#(InstTemplate, RName) bu <- mkSFifo(findf); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; InstTemplate it = bu.first(); rule fetch_decode ... endmodule This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Instructions & Templates typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName cond; RName addr} Bz; struct {RName dst; RName addr} Load; struct {RName value; RName addr} Store; } Instr deriving(Bits, Eq); typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value data; Daddress addr} EStore; } InstTemplate deriving(Eq, Bits); typedef Bit#(32) Iaddress; typedef Bit#(32) Daddress; typedef Bit#(32) Value; March 10, 2006 http://csg.csail.mit.edu/6.375/

Rules for Add CPU implicit check: pc rf fetch & decode execute bu rule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); pc <= predIa; endrule implicit check: This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. bu notfull rule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb}) rf.upd(rd, va + vb); bu.deq(); endrule bu notempty March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 6 8 8

Fetch & Decode Rule: Reexamined execute pc rf CPU bu rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa; endrule This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. Wrong! Because instructions in bu may be modifying ra or rb stall ! March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 6 8 8

Fetch & Decode Rule: corrected execute pc rf CPU bu rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb} bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa; endrule &&& !bu.find(ra) &&& !bu.find(rb)) This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Rules for Branch CPU rule-atomicity ensures that pc update, and fetch & decode execute pc rf CPU bu rule-atomicity ensures that pc update, and discard of pre- fetched instrs in bu, are done consistently rule decodeBz(instr matches Bz{cond:.rc,addr:.addr}) &&& !bu.find(rc) &&& !bu.find(addr)); bu.enq (EBz{cond:rf[rc],addr:rf[addr]}); pc <= predIa; endrule This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. rule bzTaken(it matches EBz{cond:.vc,addr:.va}) &&& (vc==0)); pc <= va; bu.clear(); endrule rule bzNotTaken (it matches EBz{cond:.vc,addr:.va}) &&& (vc != 0)); bu.deq; endrule March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 6 8 8

The Stall Signal Bool stall = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}: return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}: return (bu.find(addr)); tagged Store {value:.v,addr:.addr}: return (bu.find(v)) || bu.find(addr)); endcase; This first rule describe what happens in the fetch of this pipeline. This rules says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this rule can fire forever, but because bf is bounded, when a rule enqueue into bf, there is an implied predicate that say bf must not be full. Need to extend the fifo interface with the “find” method where “find” searches the fifo using the findf function March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Parameterization: The Stall Function function Bool stallfunc (Instr instr, SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}: return (bu.find(rc) || bu.find(addr)); tagged Load {dst:.rd,addr:.addr}: return (bu.find(addr)); tagged Store {value:.v,addr:.addr}: return (bu.find(v)) || bu.find(addr)); endcase endfunction This first rule describe what happens in the fetch of this pipeline. This rules says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this rule can fire forever, but because bf is bounded, when a rule enqueue into bf, there is an implied predicate that say bf must not be full. We need to include the following call in the mkCPU module Bool stall = stallfunc(instr, bu); no extra gates! March 10, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

The findf function function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.ra,op2:.rb}: return (r == rd); tagged EBz {cond:.c,addr:.a}: return (False); tagged ELoad{dst:.rd,addr:.a}: return (r == rd); tagged EStore{value:.v,addr:.a}: return (False); endcase endfunction This first rule describe what happens in the fetch of this pipeline. This rules says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this rule can fire forever, but because bf is bounded, when a rule enqueue into bf, there is an implied predicate that say bf must not be full. SFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf); mkSFifo can be parameterized by the search function! March 10, 2006 http://csg.csail.mit.edu/6.375/ no extra gates! 8 8 6 8

Fetch & Decode Rule rule fetch_and_decode(!stall); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: bu.enq(EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); tagged Bz {cond:.rc,addr:.addr}: bu.enq(EBz{cond:rf[rc],addr:rf[addr]}); tagged Load {dst:.rd,addr:.addr}: bu.enq(ELoad{dst:rd,addr:rf[addr]}); tagged Store{value:.v,addr:.addr}: bu.enq(EStore{value:rf[v],addr:rf[addr]}); endcase pc<= predIa; endrule March 10, 2006 http://csg.csail.mit.edu/6.375/

Fetch & Decode Rule another style InstrTemplate newIt = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:rf[rc],addr:rf[addr]}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:rf[addr]}; tagged Store{value:.v,addr:.addr}: return EStore{value:rf[v],addr:rf[addr]}; endcase; rule fetch_and_decode (!stall); bu.enq(newIt); pc <= predIa; endrule March 10, 2006 http://csg.csail.mit.edu/6.375/

Execute Rule rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); endcase endrule March 10, 2006 http://csg.csail.mit.edu/6.375/

Searchable FIFOs and Concurrency Issues March 10, 2005 http://csg.csail.mit.edu/6.375/

One-Element Searchable FIFO parameterized by findf module mkSFIFO1#(function Bool findf(tr rd, t x)) (SFIFO#(type t, type tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; method t first() if (full); return (data); method Action clear(); method Bool find(tr rd); return full && findf(rd, data); endmodule Conflict Matrix enq2 first2 deq2 clear2 find2 enq1 C > C < > first1 < CF < < CF deq1 C > C < > clear1 > > > C > find1 < CF < < CF March 10, 2006 http://csg.csail.mit.edu/6.375/

Two-Element Searchable FIFO module mkSFIFO2#(function Bool findf(tr rd, t x)) (SFIFO#(type t, type tr)); Reg#(t) data0 <-mkRegU; Reg#(Bool) full0 <- mkReg(False); Reg#(t) data1 <-mkRegU; Reg#(Bool) full1 <- mkReg(False); method Action enq(t x) if (!full1); if (!full0) begin data0 <= x; full0 <= True; end else begin data1 <= x; full1 <= True; end endmethod // Shift register implementation; method Action deq() if (full0); full0 <= full1; data0 <= data1; full1 <= False; endmethod method t first() if (full0); return data0; method Action clear(); full0 <= False; full1 <= False; method Bool find(tr rd); return ((full0 && findf(rd, data0) || (full1 && findf(rd, data1)); endmethod endmodule Conflict Matrix is the same as one-element SFIFO March 10, 2006 http://csg.csail.mit.edu/6.375/

Concurrency requirements rule fetch_and_decode (!stall); bu.enq(tuple2(pc, newIt)); pc <= predIa; endrule The two rules must fire together whenever possible, and execute < fetch_and_decode rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule March 10, 2006 http://csg.csail.mit.edu/6.375/

Intra-Rule Requirements fetch_and_decode rule two read methods (ports) on the rf register file (rf[ra],rf[rb]) two find methods on the bu SFIFO (bu.find(ra), bu.find(rb)) read methods must come before action methods, e.g. bu.find < bu.enq execute rule read methods must come before action methods, e.g., bu.first < bu.deq March 10, 2006 http://csg.csail.mit.edu/6.375/

Inter-rule concurrency requirements: case analysis rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end ... endcase endrule rf.upd < rf.sub and rf needs two read ports and one write port bu.first < {bu.deq, bu.clear} < bu.find < bu.enq rule fetch_and_decode (!stall); bu.enq(tuple2(pc, newIt)); pc <= predIa; endrule InstrTemplate newIt = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; ... endcase; March 10, 2006 http://csg.csail.mit.edu/6.375/

Numbering the methods worksheet rule fetch_and_decode (!stall); bu.enq(newIt); pc <= predIa; endrule execute < fetch_and_decode  rf.upd0 < rf.sub1 bu.first0 < {bu.deq0, bu.clear0} < bu.find1 < bu.enq1 rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule March 10, 2006 http://csg.csail.mit.edu/6.375/

Numbering the methods rule fetch_and_decode (!stall1); bu.enq1(newIt); pc <= predIa; endrule execute < fetch_and_decode  rf.upd0 < rf.sub1 bu.first0 < {bu.deq0, bu.clear0} < bu.find1 < bu.enq1 rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd0(rd, va+vb); bu.deq0(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin pc <= av; bu.clear0(); end else bu.deq0(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd0(rd, dMem.read(av)); bu.deq0(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq0(); end endcase endrule March 10, 2006 http://csg.csail.mit.edu/6.375/