Modeling Processors Arvind

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Advertisements

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.

1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Computer Architecture: A Constructive Approach Implementing SMIPS Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

February 27, 2006http://csg.csail.mit.edu/6.375/L08-1 Bluespec-2: Types Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

CSCI206 - Computer Organization & Programming

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modeling Processors

Bluespec-6: Modules and Interfaces

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Morgan Kaufmann Publishers The Processor

Caches-2 Constructive Computer Architecture Arvind

Performance Specifications

Lecture 6: Advanced Pipelines

Pipelining review.

Non-Pipelined Processors - 2

Computer Organization “Central” Processing Unit (CPU)

Modular Refinement Arvind

Modular Refinement Arvind

Lab 4 Overview: 6-stage SMIPS Pipeline

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Non-Pipelined and Pipelined Processors

Bluespec-7: Scheduling & Rule Composition

Pipelining in more detail

Control Hazards Constructive Computer Architecture: Arvind

CSCI206 - Computer Organization & Programming

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Pipelining Basic concept of assembly line

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Pipeline control unit (highly abstracted)

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

Caches-2 Constructive Computer Architecture Arvind

Modular Refinement - 2 Arvind

Elastic Pipelines: Concurrency Issues

Pipelined Processors Arvind

Bluespec-5: Modeling Processors

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

Pipeline control unit (highly abstracted)

Pipeline Control unit (highly abstracted)

Pipelining Basic concept of assembly line

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Modular Refinement Arvind

Caches-2 Constructive Computer Architecture Arvind

COMS 361 Computer Organization

Bluespec-8: Modules and Interfaces

Presentation transcript:

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010 http://csg.csail.mit.edu/6.375 1

Instruction set typedef enum {R0;R1;R2;…;R31} RName; typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName cond; RName addr;} Bz; struct {RName dst; RName addr;} Load; struct {RName value; RName addr;} Store } Instr deriving(Bits, Eq); typedef Bit#(32) Iaddress; typedef Bit#(32) Daddress; typedef Bit#(32) Value; An instruction set can be implemented using many different microarchitectures March 1, 2010 http://csg.csail.mit.edu/6.375 2

Deriving Bits typedef struct { … } Foo deriving (Bits); To store datatypes in register, fifo, etc. we need to know how to represent them as bits (pack) and interpret their bit representation (unpack) Deriving annotation automatically generates the “pack” and “unpack” operations on the type (simple concatenation of bit representations of components) It is possible to customize the pack/unpack operations to any specific desired representation March 1, 2010 http://csg.csail.mit.edu/6.375 3

Tagged Unions: Bit Representation typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName cond; RName addr;} Bz; struct {RName dst; RName addr;} Load; struct {RName dst; Immediate imm;} AddImm; } Instr deriving(Bits, Eq); 00 dst src1 src2 01 cond addr 10 dst addr 11 dst imm Automatically derived representation; can be customized by the user written pack and unpack functions March 1, 2010 http://csg.csail.mit.edu/6.375 4

The Plan Non-pipelined processor Two-stage Inelastic pipeline Two-stage Elastic pipeline – next lecture In the rest of the talk, I want to say a little bit more about how to describe hardware as an TRS Next I will explain how to synthesize these operation-centric descriptions. I have a few more design that I can show you in the result section. Finally, I will say a fews things about related work and future work Some understanding of simple processor pipelines is needed to follow this lecture March 1, 2010 http://csg.csail.mit.edu/6.375 5

Non-pipelined Processor pc rf CPU fetch & execute iMem dMem module mkCPU#(Mem iMem, Mem dMem)(); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; rule fetch_Execute ... endmodule March 1, 2010 http://csg.csail.mit.edu/6.375 6

Non-pipelined processor rule rule fetch_Execute (True); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: begin rf.upd(rd, rf[ra]+rf[rb]); pc <= predIa end tagged Bz {cond:.rc,addr:.ra}: begin pc <= (rf[rc]==0) ? rf[ra] : predIa; tagged Load {dest:.rd,addr:.ra}: begin rf.upd(rd, dMem.read(rf[ra])); pc <= predIa; tagged Store {value:.rv,addr:.ra}: begin dMem.write(rf[ra],rf[rv]); pc <= predIa; endcase endrule my syntax rf[r]  rf.sub(r) Assume “magic memory”, i.e. responds to a read request in the same cycle and a write updates the memory at the end of the cycle March 1, 2010 http://csg.csail.mit.edu/6.375 7

Register File How many read ports? How many write ports? Concurrency properties? March 1, 2010 http://csg.csail.mit.edu/6.375

The Plan Non-pipelined processor Two-stage Inelastic pipeline  Two-stage Elastic pipeline In the rest of the talk, I want to say a little bit more about how to describe hardware as an TRS Next I will explain how to synthesize these operation-centric descriptions. I have a few more design that I can show you in the result section. Finally, I will say a fews things about related work and future work March 1, 2010 http://csg.csail.mit.edu/6.375 9

Two-stage Inelastic Pipeline fetch & decode execute buReg pc rf dMem time t0 t1 t2 t3 t4 t5 t6 t7 . . . . FDstage FD1 FD2 FD3 FD4 FD5 EXstage EX1 EX2 EX3 EX4 EX5 Actions to be performed in parallel every cycle: Fetch Action: Decodes the instruction at the current pc and fetches operands from the register file and stores the result in buReg Execute Action: Performs the action specified in buReg and updates the processor state (pc, rf, dMem) rule InelasticPipeline2(True); fetchAction; executeAction; endrule March 1, 2010 http://csg.csail.mit.edu/6.375 10

Instructions & Templates buReg contains instruction templates, i.e., decoded instructions typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName cond; RName addr} Bz; struct {RName dst; RName addr} Load; struct {RName value; RName addr} Store; } Instr deriving(Bits, Eq); typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value value; Daddress addr} EStore; } InstTemplate deriving(Eq, Bits); March 1, 2010 http://csg.csail.mit.edu/6.375 11

Fetch & Decode Action Fills the buReg with a decoded instruction buReg <= newIt(instr); function InstrTemplate newIt(Instr instr); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:rf[rc],addr:rf[addr]}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:rf[addr]}; tagged Store{value:.v,addr:.addr}: return EStore{value:rf[v],addr:rf[addr]}; endcase endfunction March 1, 2010 http://csg.csail.mit.edu/6.375 12

Execute Action: Reads buReg and modifies state (rf,dMem,pc) case (buReg) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); pc <= predIa; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); tagged EBz {cond:.cv,addr:.av}: if (cv != 0) then pc <= predIa; else begin pc <= av; Invalidate buReg end endcase What does this mean? March 1, 2010 http://csg.csail.mit.edu/6.375 13

Issues with buReg buReg may not always contain an instruction. Why? fetch & decode execute buReg pc rf dMem buReg may not always contain an instruction. Why? start cycle Execute stage may kill the fetched instructions because of branch misprediction Can’t update buReg in two concurrent actions fetchAction; executeAction March 1, 2010 http://csg.csail.mit.edu/6.375 14

Inelastic Pipeline first attempt fetch & decode execute pc rf CPU buReg Inelastic Pipeline first attempt rule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1; Action fetchAction = action buReg <= Valid newIt(instr); pc <= predIa; endaction; case (buReg) matches each instruction execution calls fetchAction or puts Invalid in buReg … endcase endcase endrule March 1, 2010 http://csg.csail.mit.edu/6.375 15

Execute case (buReg) matches tagged Valid .it: case (it) matches fetch & decode execute pc rf CPU buReg Execute case (buReg) matches tagged Valid .it: case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); fetchAction; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction; end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction; end tagged EBz {cond:.cv,addr:.av}: if (cv != 0) then fetchAction; else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction; March 1, 2010 http://csg.csail.mit.edu/6.375 16

Pipeline Hazards time t0 t1 t2 t3 t4 t5 t6 t7 . . . . fetch & decode execute buReg pc rf dMem time t0 t1 t2 t3 t4 t5 t6 t7 . . . . FDstage FD1 FD2 FD3 FD4 FD5 EXstage EX1 EX2 EX3 EX4 EX5 I1 Add(R1,R2,R3) I2 Add(R4,R1,R2) I2 must be stalled until I1 updates the register file time t0 t1 t2 t3 t4 t5 t6 t7 . . . . FDstage FD1 FD2 FD2 FD3 FD4 FD5 EXstage EX1 EX2 EX3 EX4 EX5 March 1, 2010 http://csg.csail.mit.edu/6.375 17

Inelastic Pipeline corrected fetch & decode execute pc rf CPU buReg Inelastic Pipeline corrected rule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1; Action fetchAction = action if stallFunc(instr, buReg) then buReg <=Invalid else begin buReg <= Valid newIt(instr); pc <= predIa; end endaction; case (buReg) matches … no change … endcase endcase endrule How do we detect stalls? March 1, 2010 http://csg.csail.mit.edu/6.375 18

The Stall Function function Bool stallFunc (Instr instr, Maybe#(InstTemplate) mit); case (mit) matches tagged Invalid: return False; tagged Valid .it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); tagged Bz {cond:.rc,addr:.addr}: return (findf(rc,it) || findf(addr,it)); tagged Load {dst:.rd,addr:.addr}: return (findf(addr,it)); tagged Store {value:.v,addr:.addr}: return (findf(v,it) || findf(addr,it)); endcase endfunction This first rule describe what happens in the fetch of this pipeline. This rules says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this rule can fire forever, but because bf is bounded, when a rule enqueue into bf, there is an implied predicate that say bf must not be full. March 1, 2010 http://csg.csail.mit.edu/6.375 8 8 19 6 8

The findf function function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.v1,op2:.v2}: return (r == rd); tagged EBz {cond:.c,addr:.a}: return (False); tagged ELoad{dst:.rd,addr:.a}: tagged EStore{value:.v,addr:.a}: endcase endfunction March 1, 2010 http://csg.csail.mit.edu/6.375 20

Bypassing After decoding the newIt function must read the new register values if available (i.e., the values that are still to be committed in the register file) We pass the value being written to decoding action (the newIt function) March 1, 2010 http://csg.csail.mit.edu/6.375 21

Updated newIt function InstrTemplate newIt(Maybe#(RName) mrd, Value val, Instr instr); let nrf(a)=(Valid a == mrd) ? val: rf.sub(a); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:nrf(ra),op2:nrf(rb)}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:nrf(rc),addr:nrf(addr)}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:nrf(addr)}; tagged Store{value:.v,addr:.addr}: return EStore{value:nrf(v),addr:nrf(addr)}; endcase endfunction March 1, 2010 http://csg.csail.mit.edu/6.375

New fetchAction function Action fetchAction(Maybe#(RName) mrd, Value data); action if stallFunc(instr, buReg) then buReg <= Invalid; else begin buReg <= Valid newIt(mrd, data, instr); pc <= predIa; end endaction endfunction March 1, 2010 http://csg.csail.mit.edu/6.375

New Execute Action case (buReg) matches tagged Valid .it: case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); fetchAction(Valid rd, va+vb); end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction(Valid rd, dMem.read(av)); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction(Invalid, ?); end tagged EBz {cond:.cv,addr:.av}: if (cv != 0) then fetchAction(Invalid, ?); else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction(Invalid, ?); March 1, 2010 http://csg.csail.mit.edu/6.375 24

Bypassing Now that we’ve correctly bypassed data, we do not have to stall as often The current stall function is correct, but inefficient Should not stall if the value is now being bypassed March 1, 2010 http://csg.csail.mit.edu/6.375 25

The stall function for the Inelastic pipeline function Bool newStallFunc (Instr instr, Reg#(Maybe#(InstTemplate)) buReg); case (buReg) matches tagged Invalid: return False; tagged Valid .it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); … This first rule describe what happens in the fetch of this pipeline. This rules says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instruction into bf. And at the same time the pc field is incremende by one It may look like this rule can fire forever, but because bf is bounded, when a rule enqueue into bf, there is an implied predicate that say bf must not be full. March 1, 2010 http://csg.csail.mit.edu/6.375 8 8 26 6 8

Inelastic Pipelines Notoriously difficult to get right Imagine the cases to be analyzed if it was a five stage pipeline Difficult to refine for better clock timing elastic pipelines March 1, 2010 http://csg.csail.mit.edu/6.375 27