in Pipelined Processors

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

EHRs: Designing modules with concurrent methods Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 27,
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Data Hazards and Multistage Pipeline Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology January 18, 2012L8-1.
Computer Architecture: A Constructive Approach Branch Prediction - 1 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.
Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.
Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Elastic Pipelines: Concurrency Issues
EHRs: Designing modules with concurrent methods
EHRs: Designing modules with concurrent methods
Control Hazards Constructive Computer Architecture: Arvind
Bluespec-6: Modeling Processors
Scheduling Constraints on Interface methods
Tutorial 7: SMIPS Epochs Constructive Computer Architecture
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Branch Prediction Constructive Computer Architecture: Arvind
Branch Prediction Constructive Computer Architecture: Arvind
Caches-2 Constructive Computer Architecture Arvind
Multistage Pipelined Processors and modular refinement
in Pipelined Processors
Non-Pipelined Processors - 2
Modular Refinement Arvind
Modular Refinement Arvind
Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Lab 4 Overview: 6-stage SMIPS Pipeline
Non-Pipelined and Pipelined Processors
in Pipelined Processors
Multi-cycle SMIPS Implementations
Constructive Computer Architecture Tutorial 3 RISC-V Processor
Control Hazards Constructive Computer Architecture: Arvind
Branch Prediction Constructive Computer Architecture: Arvind
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Multistage Pipelined Processors and modular refinement
Modular Refinement Arvind
Realistic Memories and Caches
Branch Prediction: Direction Predictors
Modular Refinement - 2 Arvind
Multistage Pipelined Processors and modular refinement
Pipelined Processors Arvind
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Tutorial 4: RISCV modules Constructive Computer Architecture
Modeling Processors Arvind
Modeling Processors Arvind
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Branch Predictor Interface
Pipelined Processors: Control Hazards
Presentation transcript:

in Pipelined Processors Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 13, 2013 http://csg.csail.mit.edu/6.375

A different 2-Stage pipeline: 2-Stage-DH pipeline Fetch, Decode, RegisterFetch Execute, Memory, WriteBack fEpoch Register File eEpoch redirect PC Decode Execute pred d2e Fifos Inst Memory Data Memory Use the same epoch solution for control hazards as before March 13, 2013 http://csg.csail.mit.edu/6.375 2

Data Hazards d2e pc rf dMem time t0 t1 t2 t3 t4 t5 t6 t7 . . . . fetch & decode execute d2e pc rf dMem time t0 t1 t2 t3 t4 t5 t6 t7 . . . . FDstage FD1 FD2 FD3 FD4 FD5 EXstage EX1 EX2 EX3 EX4 EX5 I1 Add(R1,R2,R3) I2 Add(R4,R1,R2) I2 must be stalled until I1 updates the register file time t0 t1 t2 t3 t4 t5 t6 t7 . . . . FDstage FD1 FD2 FD2 FD3 FD4 FD5 EXstage EX1 EX2 EX3 EX4 EX5 March 13, 2013 http://csg.csail.mit.edu/6.375 3

Dealing with data hazards Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i.e., will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard Stall the Fetch from dispatching the instruction as long as RAW hazard prevails RAW hazard will disappear as the pipeline drains Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage March 13, 2013 http://csg.csail.mit.edu/6.375

Data Hazard Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline The matching of source and destination must take into account whether the source and destination registers are valid function Bool isFound (Maybe#(RIndx) dst, Maybe#(RIndx) src); return isValid(dst) && isValid(src) && (validValue(dst)==validValue(src)); endfunction March 13, 2013 http://csg.csail.mit.edu/6.375

Scoreboard: Keeping track of instructions in execution Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded method search1(src): searches the scoreboard for a data hazard method search2(src): same as search1 method remove: deletes the oldest entry when an instruction commits March 13, 2013 http://csg.csail.mit.edu/6.375

2-Stage-DH pipeline: Scoreboard and Stall logic fEpoch Register File eEpoch redirect PC Decode Execute pred d2e Inst Memory Data Memory scoreboard March 13, 2013 http://csg.csail.mit.edu/6.375 7

2-Stage-DH pipeline corrected module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkFifo; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkFifo; Scoreboard#(1) sb <- mkScoreboard; // contains only one slot because Execute // can contain at most one instruction rule doFetch … rule doExecute … March 13, 2013 http://csg.csail.mit.edu/6.375 8

2-Stage-DH pipeline doFetch rule second attempt rule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); pc <= ppc; let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc, dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); end end endrule What should happen to pc when Fetch stalls? pc should change only when the instruction is enqueued in d2e March 13, 2013 http://csg.csail.mit.edu/6.375

2-Stage-DH pipeline doFetch rule corrected rule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); pc <= ppc; let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc, dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); end end endrule pc <= ppc; end March 13, 2013 http://csg.csail.mit.edu/6.375

2-Stage-DH pipeline doExecute rule corrected rule doExecute; let x = d2e.first; let dInst = x.dInst; let pc = x.pc; let ppc = x.ppc; let epoch = x.epoch; let rVal1 = x.rVal1; let rVal2 = x.rVal2; if(epoch == eEpoch) begin let eInst = exec(dInst, rVal1, rVal2, pc, ppc); if(eInst.iType == Ld) eInst.data <- dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?}); else if (eInst.iType == St) let d <- dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data}); if (isValid(eInst.dst)) rf.wr(validRegValue(eInst.dst), eInst.data); if(eInst.mispredict) begin execRedirect.enq(eInst.addr); eEpoch <= !eEpoch; end end d2e.deq; sb.remove; endrule March 13, 2013 http://csg.csail.mit.edu/6.375

Scoreboard method calls: concurrency and correctness issues rule doFetch; ... let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); if(!stall) begin ... sb.insert(dInst.rDst); pc <= ppc; end end endrule scoreboard must permit concurrent execution of search1, search2 and insert for this rule to execute. Should the result of search be affected by concurrent insert? possibly concurrent remove? No  search {<, CF} insert If search is affected by remove then the effect of removed instruction must have taken place. Otherwise make (search CF remove) March 13, 2013 http://csg.csail.mit.edu/6.375

Assumptions about method calls doFetch doExecute d2e redirect Register File Scoreboard remove search insert wr rd1 rd2 Within a rule d2e.first ? d2e.deq redirect.first ? redirect.deq sb.search ? sb.insert High-level correctness What must be true if sb.search > sb.remove ? < < < The effect of the removed instruction must have taken place, i.e., any updates to the register file must be visible to the subsequent register reads March 13, 2013 http://csg.csail.mit.edu/6.375

Concurrency analysis normal Register File (rf.rd < rf.wr) doFetch doExecute d2e redirect Register File Scoreboard remove search insert wr rd1 rd2 Assume both redirect and d2e are CF FIFOs, for concurrent execution  doFetch ? doExecute  {sb.search, sb.insert} ? sb.remove < In this case Bypass scoreboard, i.e., insert<remove is useless because the bypass path will never be used {<, CF} March 13, 2013 http://csg.csail.mit.edu/6.375

Concurrency analysis Bypass Register File (rf.rw < rf.wd) doFetch doExecute d2e redirect Register File Scoreboard remove search insert wr rd1 rd2 Assume both redirect and d2e are CF FIFOs, for concurrent execution  doExecute ? doFetch  sb.remove ? {sb.search, sb.insert} < {<, CF} March 13, 2013 http://csg.csail.mit.edu/6.375

Performance analysis Register File rf.wr < rf.rd doFetch doExecute d2e redirect Register File Scoreboard remove search insert wr rd1 rd2 rf.wr < rf.rd doExecute < doFetch {d2e.first, d2e.deq} {<, CF} d2e.enq Will pipelined d2e Fifo improve performance? redirect.enq {<, CF} {redirect.first, redirect.deq} Will bypassed redirect Fifo improve performance ? sb.remove {<, CF} {sb.search, sb.insert} Why CF Scoreboard is bad for performance? No Yes, by redirecting the PC a cycle earlier It negates the advantage of bypass register file by unnecessarily stalling on an instruction whose register updates are visible to fetch March 13, 2013 http://csg.csail.mit.edu/6.375

2-Stage-DH pipeline with proper specification of Fifos, rf, scoreboard module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkBypassRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(Decode2Execute) d2e <- mkPipelineFifo; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkBypassFifo; Scoreboard#(1) sb <- mkPipelineScoreboard; // contains only one slot because Execute // can contain at most one instruction rule doFetch … rule doExecute … Can a destination register name appear more than once in the scoreboard ? March 13, 2013 http://csg.csail.mit.edu/6.375 17

Need to check for WAW hazards If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructions This issue can be avoided if only one instruction in the pipeline is allowed to update a particular register This means we have to stall if an instruction writes to the same destination as a previous instruction (WAW hazard) This may negate some advantage of bypassing A more advanced solution to avoid WAW hazards is register renaming WAW – Write-after-write Hazard March 13, 2013 http://csg.csail.mit.edu/6.375

Fetch rule with bypassing rule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2); || sb.search3(dInst.dst); if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc, dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); pc <= ppc; end end endrule March 13, 2013 http://csg.csail.mit.edu/6.375

Multi-stage pipeline with Data Hazards March 13, 2013 http://csg.csail.mit.edu/6.375

3-Stage-DH pipeline Register File PC Decode Execute Inst Data Memory fEpoch Register File eEpoch nextPC e2c PC Decode Execute pred d2e Inst Memory Data Memory scoreboard Exec2Commit{Maybe#(RIndx)dst, Data data}; March 13, 2013 http://csg.csail.mit.edu/6.375 21

3-Stage-DH pipeline module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkBypassRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; Fifo#(1, Decode2Execute) d2e <- mkPipelineFifo; Fifo#(1, Exec2Commit) e2c <- mkPipelineFifo; Scoreboard#(2) sb <- mkPipelineScoreboard; // contains two instructions Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); Fifo#(Addr) execRedirect <- mkBypassFifo; March 13, 2013 http://csg.csail.mit.edu/6.375 22

3-Stage-DH pipeline doFetch rule rule doFetch; let inst = iMem.req(pc); if(execRedirect.notEmpty) begin fEpoch <= !fEpoch; pc <= execRedirect.first; execRedirect.deq; end else begin let ppc = nextAddrPredictor(pc); let dInst = decode(inst); let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2) || sb.search3(dInst.dst);; if(!stall) begin let rVal1 = rf.rd1(validRegValue(dInst.src1)); let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc, dIinst: dInst, epoch: fEpoch, rVal1: rVal1, rVal2: rVal2}); sb.insert(dInst.rDst); pc <= ppc; end end endrule Unchanged from 2-stage DH March 13, 2013 http://csg.csail.mit.edu/6.375

3-Stage-DH pipeline doExecute rule rule doExecute; let x = d2e.first; let dInst = x.dInst; let pc = x.pc; let ppc = x.ppc; let epoch = x.epoch; let rVal1 = x.rVal1; let rVal2 = x.rVal2; if(epoch == eEpoch) begin let eInst = exec(dInst, rVal1, rVal2, pc, ppc); if(eInst.iType == Ld) eInst.data <- dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?}); else if (eInst.iType == St) let d <- dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data}); if (isValid(eInst.dst)) rf.wr(validRegValue(eInst.dst), eInst.data); if(eInst.mispredict) begin execRedirect.enq(eInst.addr); eEpoch <= !eEpoch; end end d2e.deq; sb.remove; endrule e2c.enq(Exec2Commit{dst:eInst.dst, data:eInst.data}); else e2c.enq(Exec2Commit{dst:Invalid, data:?}); March 13, 2013 http://csg.csail.mit.edu/6.375

3-Stage-DH pipeline doCommit rule rule doCommit; let dst = eInst.first.dst; let data = eInst.first.data; if(isValid(dst)) rf.wr(tuple2(validValue(dst), data); e2c.deq; sb.remove; endrule March 13, 2013 http://csg.csail.mit.edu/6.375 25

6-Stage-DH pipeline Register File PC Decode Execute Inst Memory Data fEpoch Register File eEpoch nextPC e2c PC Decode Execute pred d2e Inst Memory Data Memory scoreboard Fetch Decode RegRead Execute Mem WB March 13, 2013 http://csg.csail.mit.edu/6.375 26

Take away Systematic application of pipelining requires a deep understanding of hazards especially in the presence of concurrent operations Performance issues are subtle For instance, the value of having a bypass network depends on how frequently it is exercised by programs March 13, 2013 http://csg.csail.mit.edu/6.375

Module implementations scoreboard can be implemented using searchable Fifos in a straightforward manner March 13, 2013 http://csg.csail.mit.edu/6.375

Normal Register File module mkRFile(RFile); Vector#(32,Reg#(Data)) rfile <- replicateM(mkReg(0)); method Action wr(RIndx rindx, Data data); if(rindx!=0) rfile[rindx] <= data; endmethod method Data rd1(RIndx rindx) = rfile[rindx]; method Data rd2(RIndx rindx) = rfile[rindx]; endmodule {rd1, rd2} < wr March 13, 2013 http://csg.csail.mit.edu/6.375 29

Bypass Register File using EHR module mkBypassRFile(RFile); Vector#(32,Ehr#(2, Data)) rfile <- replicateM(mkEhr(0)); method Action wr(RIndx rindx, Data data); if(rindex!==0) (rfile[rindex])[0] <= data; endmethod method Data rd1(RIndx rindx) = (rfile[rindx])[1]; method Data rd2(RIndx rindx) = (rfile[rindx])[1]; endmodule wr < {rd1, rd2} March 13, 2013 http://csg.csail.mit.edu/6.375 30

Bypass Register File with external bypassing module mkBypassRFile(BypassRFile); RFile rf <- mkRFile; Fifo#(1, Tuple2#(RIndx, Data)) bypass <- mkBypassSFifo; rule move; begin rf.wr(bypass.first); bypass.deq end; endrule method Action wr(RIndx rindx, Data data); if(rindex!==0) bypass.enq(tuple2(rindx, data)); endmethod method Data rd1(RIndx rindx) = return (!bypass.search1(rindx)) ? rf.rd1(rindx) : bypass.read1(rindx); method Data rd2(RIndx rindx) = return (!bypass.search2(rindx)) ? rf.rd2(rindx) : bypass.read2(rindx); endmodule {rf.rd1, rf.rd2} < rf.wr wr < {rd1, rd2} March 13, 2013 http://csg.csail.mit.edu/6.375 31

Scoreboard implementation using searchable Fifos function Bool isFound (Maybe#(RIndx) dst, Maybe#(RIndx) src); return isValid(dst) && isValid(src) && (validValue(dst)==validValue(src)); endfunction module mkCFScoreboard(Scoreboard#(size)); SFifo#(size, Maybe#(RIndx), Maybe#(RIndx)) f <- mkCFSFifo(isFound); method insert = f.enq; method remove = f.deq; method search1 = f.search1; method search2 = f.search2; endmodule March 13, 2013 http://csg.csail.mit.edu/6.375