Bypassing Computer Architecture: A Constructive Approach Joel Emer

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
DLX Instruction Format
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
CMPE 421 Parallel Computer Architecture
Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Elastic Pipelines: Concurrency Issues
CDA3101 Recitation Section 8
CSCI206 - Computer Organization & Programming
Control Hazards Constructive Computer Architecture: Arvind
Bluespec-6: Modeling Processors
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Decode and Operand Read
Caches-2 Constructive Computer Architecture Arvind
in Pipelined Processors
CS 5513 Computer Architecture Pipelining Examples
Lecture 6: Advanced Pipelines
Pipelining combinational circuits
Single-cycle datapath, slightly rearranged
Pipelining combinational circuits
Modular Refinement Arvind
Modular Refinement Arvind
Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Lab 4 Overview: 6-stage SMIPS Pipeline
in Pipelined Processors
Control Hazards Constructive Computer Architecture: Arvind
CSCI206 - Computer Organization & Programming
Modules with Guarded Interfaces
Pipelining combinational circuits
\course\cpeg323-05F\Topic6b-323
Realistic Memories and Caches
Pipeline control unit (highly abstracted)
Caches-2 Constructive Computer Architecture Arvind
Modular Refinement - 2 Arvind
Caches and store buffers
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
The Processor Lecture 3.5: Data Hazards
Bluespec-5: Modeling Processors
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Pipeline control unit (highly abstracted)
Tutorial 4: RISCV modules Constructive Computer Architecture
Pipeline Control unit (highly abstracted)
Modeling Processors Arvind
Modeling Processors Arvind
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
CS 3853 Computer Architecture Pipelining Examples
Bluespec-8: Modules and Interfaces
Presentation transcript:

Bypassing Computer Architecture: A Constructive Approach Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 19, 2012 http://csg.csail.mit.edu/6.S078

Six Stage Pipeline F D R X M W Fetch Decode Reg Read Execute Memory Write- back pc F fr D dr R rr X xr M mr W Writeback feeds back directly to ReadReg rather than through FIFO March 19, 2012 http://csg.csail.mit.edu/6.S078

Register Read Stage D R W Decode Reg Read update scoreboard Read regs update scoreboard Will show new read_reg – but first a style update… RF Scoreboard update signaled immediately writeback register March 19, 2012 http://csg.csail.mit.edu/6.S078

Per stage architectural state typedef struct { MemResp instResp; } FBundle; Rindx r_dest; Rindx op1; Rindx op2; ... } DecBundle; Data src1; Data src2; } RegBundle; typedef struct { Bool cond; Addr addr; Data data; } Ebundle; No MemBundle – data added to Ebundle No WbBundle – nothing added in WB Updated style for handling architectural and micro-architectural state. March 19, 2012 http://csg.csail.mit.edu/6.S078 4

Per stage micro-architectural state Architectural state from prior stages typedef struct { FBundle fInst; DecBundle decInst; RegBundle regInst; Inum inum; Epoch epoch; } RegData deriving(Bits,Eq); function RegData newRegData(DecData decData); RegData regData = ?; regData.fInst = decData.fInst; regData.decInst = decData.decInst; regData.inum = decData.inum; regData.epoch = decData.epoch; return regData; endfunction New architectural state for this stage Passthrough and (optional) new micro-architectural state Utility function Struct per stage with state to use in current stage and pass to next stage. Copy architectural state from prior stages Copy uarch state from prior stages In “uarch_types” March 19, 2012 http://csg.csail.mit.edu/6.S078 5

Example of stage state use Initialize stage state rule doRegRead; let regData = newRegData(dr.first()); let decInst = regData.decInst; case (reg_read.readRegs(decInst)) matches tagged Valid .rfdata: begin if( writes_register(decInst)) reg_read.markUnavail(decInst.r_dest); regData.regInst.src1 = rfdata.src1; regData.regInst.src2 = rfdata.src2; rr.enq(regData); dr.deq(); end endcase endrule Set utility variables* Set architectural state Set uarch state if any Enqueue outgoing state Dequeue incoming state *Don’t try to update!!! March 19, 2012 http://csg.csail.mit.edu/6.S078 6

Resource-based diagram key epoch change (color change) 0 1 Fetch (Fuschia) ζ η ζ ε δ γ α pc redirect Decode (Denim) ε δ = killed Reg Read (Red) δ Set R1 not busy R1 busy Execute (Emerald) γ Memory (Mustard) β Writeback (Wheat) α α = stalled writeback feedback March 19, 2012 http://csg.csail.mit.edu/6.S078

Scoreboard clear bug! α β α α γ β δ γ β α ε δ γ β α ζ ε δ γ β α η ζ ε 0 1 2 3 4 5 6 7 8 9 η waits for write of R1 by ζ α β α α γ β δ γ β α ε δ γ β α ζ ε δ γ β α η ζ ε β η ζ η ζ η ζ F D R X M W α = 00: add r1,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 Old code with flash clear of scoreboard on epoch change is broken….if α is delayed Looks okay, but what if α is delayed? March 19, 2012 http://csg.csail.mit.edu/6.S078

Scoreboard clear bug! α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ α η ζ ε α 0 1 2 3 4 5 6 7 8 9 η is released by earlier write of R1 α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ α η ζ ε α η ζ α η ζ β α η ζ β F D R X M W α = 00: ld r1,100(r0) β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 Old code with flash clear of scoreboard on epoch change is broken….if α is delayed Mistreatment of what did of data dependence? WAW March 19, 2012 http://csg.csail.mit.edu/6.S078

So don’t clear scoreboard! 0 1 2 3 4 5 6 7 8 9 ζ sees register is free by feedback, then sets it busy α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ α η ζ ε δ α η ζ ε α ζ β α η ζ β F D R X M W α = 00: ld r1,100(r0) β = 04: j 40 γ = 08: add ... δ = 12: add ... ε = 16: add ... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 Old code with flash clear of scoreboard on epoch change is broken….if α is delayed Zeta executes because it sees the register is free (via arrow) and then sets it busy again… Looks okay, but what if R1 written in branch shadow? March 19, 2012 http://csg.csail.mit.edu/6.S078

Dead instruction deadlock! 0 1 2 3 4 5 6 7 8 9 ζ never sees register R1 α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ β α η ζ ε δ β η ζ ε ζ ζ F D R X M W α = 00: add r3,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add r1,... ε = 16: add,... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 So, what can we do? March 19, 2012 http://csg.csail.mit.edu/6.S078

Poison bit uarch state In exec: In subsequent stages: instead of dropping killed instructions mark them ‘poisoned’ In subsequent stages: DO NOT make architectural changes DO necessary bookkeeping March 19, 2012 http://csg.csail.mit.edu/6.S078

Poison-bit solution α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ β α η ζ ε δ Poisoned δ frees register, but value same, ζ marks it busy. 0 1 2 3 4 5 6 7 8 9 α β α γ β α δ γ β α ε δ γ β α ζ ε δ γ β α η ζ ε δ γ β η ζ ε δ γ η ζ ε δ η ζ ε F D R X M W α = 00: add r3,r0,r0 β = 04: j 40 γ = 08: add ... δ = 12: add r1,... ε = 16: add,... ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 Poisoned δ frees register, which is immediately made busy by ζ. Then η will wait for ζ. What stage generates poison bit? March 19, 2012 http://csg.csail.mit.edu/6.S078

Poison-bit generation Initialize stage state rule doExecute; let execData = newExecData(rr.first); let ...; if (! epochChange) begin execData.execInst = exec.exec(decInst,src1,src2); if (execInst.cond) ... end else execData.poisoned = True; xr.enq(execData); rr.deq(); endrule Set utility variables Set architectural state Set uarch state Enqueue outgoing state Dequeue incoming state What stages need to handle poison bit? March 19, 2012 http://csg.csail.mit.edu/6.S078 14

Poison-bit usage Architectural activity if not poisoned rule doMemory; let memData = newMemData(xr.first); let ...; if(! poisoned && memop(decInst)) memData.execInst.data <- mem.port(...); mr.enq(memData); xr.deq(); endrule rule doWriteBack; let wbData = newWBData(mr.first); reg_read.markAvailable(rdst); if (!poisoned && regwrite(decInst) rf.wr(rdst, data); mr.deq; Unconditionally do bookkeeping Architectural activity if not poisoned March 19, 2012 http://csg.csail.mit.edu/6.S078 15

Data dependence stalls 0 1 2 3 4 5 6 7 8 9 ζ η ζ η ζ η ζ η ζ η ζ η η η F D R X M W ζ = 40: add r1,r0, r0 η = 44: add r2, r1,r0 This is a data dependence. What kind? RAW March 19, 2012 http://csg.csail.mit.edu/6.S078

Bypass RegRead Execute What does RegRead need to do? rr Execute What does RegRead need to do? If scoreboard says register is not available then RegRead needs to stall, unless it sees what it needs on the bypass line…. March 19, 2012 http://csg.csail.mit.edu/6.S078

Register Read Stage D R X W Generate bypass Update scoreboard Decode Generate bypass Read regs Update scoreboard Bypass Net RF Writeback register March 19, 2012 http://csg.csail.mit.edu/6.S078

Bypass Network Does bypass solve WAW hazard No! typedef struct { Rindx regnum; Data value;} BypassValue; module mkBypass(BypassNetwork) Rwire#(BypassValue) bypass; method produceBypass(Rindx regnum, Data value); bypass.wset(BypassValue{regname: regnum, value:value}); endmethod method Maybe#(Data) consumeBypass1(Rindx regnum); if (bypass matches tagged Valid .b && b.regnum == regnum) return tagged Valid b.value; else return tagged Invalid; endmodule Does bypass solve WAW hazard No! March 19, 2012 http://csg.csail.mit.edu/6.S078

Register Read (with bypass) module mkRegRead#(RFile rf, BypassNetwork bypass)(RegRead); method Maybe#(Sources) readRegs(DecBundle decInst); let op1 = decInst.op1; let op2 = decInst.op2; let rfdata1 = rf.rd1(op1); let rfdata2 = rf.rd2(op2); let bypass1 = bypass.consumeBypass1(op1); let bypass2 = bypass.consumeBypass2(op2); let stall = isWAWhazard(scoreboard) || (isBusy(op1,scoreboard) && !isJust(bypass1)) || (isBusy(op2,scoreboard) && !isJust(bypass2))); let s1 = fromMaybe(rfdata1, bypass1); let s2 = fromMaybe(rfdata2, bypass2); if (stall) return tagged Invalid; else return tagged Valid Sources{src1:s1, src2:s2}; endmethod March 19, 2012 http://csg.csail.mit.edu/6.S078 20

Bypass generation in execute rule doExecute; let execData = newExecData(rr.first); let ...; if (! epochChange) begin execData.execInst = exec.exec(decInst,src1,src2); if (execInst.cond) ... if ( ) bypass.generateBypass(decInst.r_dst, execInst.data); end else begin execData.poisoned = True; end xr.enq(execData); rr.deq(); endrule decInst.i_type == ALU If have data destined for a register bypass it back to register read Does bypass solve all RAW delays? No – not for ld/st! March 19, 2012 http://csg.csail.mit.edu/6.S078 21

Full bypassing pc F fr D dr R rr X xr M mr W Every stage that generates register writes can generate bypass data March 19, 2012 http://csg.csail.mit.edu/6.S078

Multi-cycle execute D C η R X M B ζ A rr X xr M B m1 M2 m2 M3 M1 ζ A Incorporating a multi-cycle operation into a stage March 19, 2012 http://csg.csail.mit.edu/6.S078

Multi-cycle function unit module mkMultistage(Multistage); State1 m1 <- mkReg(); State2 m2 <- mkReg(); method Action request(Operand operand); m1.enq(doM1(operand)); endmethod rule s1; m2.enq(doM2(m1.first)); m1.deq(); endrule method ActionValue Result response(); return doM3(m2.first); m2.deq(); endmodule Perform first stage of pipeline Perform middle stage(s) of pipeline Perform last stage of pipeline and return result March 19, 2012 http://csg.csail.mit.edu/6.S078 24

Single/Multi-cycle pipe stage rule doExec; ... initialization let done = False; if (! waiting) begin if(isMulticycle(...)) begin multicycle.request(...); waiting <= True; end else begin result = singlecycle.exec(...); done = True; end result <- multicycle.response(); done = True; waiting <= False; if (done) ...finish up endrule If not busy start multicycle operation, or… …perform single cycle operation If busy, wait for response Finish up if have a result March 19, 2012 http://csg.csail.mit.edu/6.S078 25