Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Branch Prediction: Direction Predictors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Advertisements

Computer Architecture: A Constructive Approach Instruction Representation Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Computer Architecture: A Constructive Approach Branch Prediction - 1 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

Interrupts / Exceptions / Faults Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 30, 2012L21-1

Non-Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 6, 2013

Computer Architecture: A Constructive Approach One-Cycle Implementation Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Caches and in-order pipelines Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 11, 2012L24-1.

Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.

Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.

Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Elastic Pipelines: Concurrency Issues

6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modeling Processors

Tutorial 7: SMIPS Epochs Constructive Computer Architecture

Non-Pipelined Processors

Branch Prediction Constructive Computer Architecture: Arvind

Multistage Pipelined Processors and modular refinement

in Pipelined Processors

Non-Pipelined Processors

Pipelining combinational circuits

Non-Pipelined Processors - 2

Non-Pipelined Processors

Non-Pipelined Processors

Pipelining combinational circuits

Modular Refinement Arvind

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Lab 4 Overview: 6-stage SMIPS Pipeline

Non-Pipelined and Pipelined Processors

in Pipelined Processors

Multi-cycle SMIPS Implementations

Control Hazards Constructive Computer Architecture: Arvind

Pipelining combinational circuits

Branch Prediction Constructive Computer Architecture: Arvind

Multistage Pipelined Processors and modular refinement

Modular Refinement Arvind

Realistic Memories and Caches

Branch Prediction: Direction Predictors

Modular Refinement - 2 Arvind

Multistage Pipelined Processors and modular refinement

in Pipelined Processors

Pipelined Processors Arvind

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

Tutorial 4: RISCV modules Constructive Computer Architecture

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Modular Refinement Arvind

Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture

Non-Pipelined Processors

Branch Predictor Interface

Pipelined Processors: Control Hazards

Presentation transcript:

Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 11, 2012L17-1

Two-Stage pipeline A robust two-rule solution PC Inst Memory Decode Register File Execute Data Memory +4 ir Bypass FIFO Pipeline FIFO nextPC fEpoch eEpoch Either fifo can be a normal (>1 element) fifo April 11, 2012 L17-2http://csg.csail.mit.edu/6.S078

April 11, 2012 L Decoupled Fetch and Execute Fetch Execute Properly decoupled systems permit greater freedom in independent refinement of blocks FIFOs must permit concurrent enq and deq For pipelined behavior ir behavior must be deq<enq For proper scheduling nextPC behavior must be enq<deq (deq < enq would be just wrong) ir nextPC

April 11, 2012 L Three one-element FIFOs Ordinary: No concurrent enq/deq Pipeline: deq before enq, combinational path Bypass: enq before deq, combinational path Pipeline and Bypass fifos can create combinational cycles in the presence of feedback notEmptynotFull deq enq notEmptynotFull deq enq or notEmpty notFull deq enq or Ordinary FIFO Pipeline FIFO Bypass FIFO

April 11, 2012 L Multi-element FIFOs Normal FIFO Permits concurrent enq and deq when notFull and notEmpty Unlike a pipeline FIFO, does not permit enq when full, even if there is a concurrent deq Unlike a bypass FIFO, does not permit deq when empty, even if there is a concurrent enq Normal FIFO implementations have at least two elements, but they do not have combinational paths => make it easier to reduce critical paths at the expense of area

A decoupled solution using epoch Add fEpoch and eEpoch registers to the processor state; initialize them to the same value The epoch changes whenever Execute determines that the pc prediction is wrong. This change is reflected immediately in eEpoch and eventually in fEpoch via nextPC FIFO Associate the fEpoch with every instruction when it is fetched In the execute stage, reject, i.e., kill, the instruction if its epoch does not match eEpoch April 11, 2012 L17-6http://csg.csail.mit.edu/6.S078

Two-stage pipeline Decoupled module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple2#(Addr,bool)) nextPC <- mkBypassFIFOF; rule doFetch … endrule rule doExecute … endrule endmodule April 11, 2012 L17-7http://csg.csail.mit.edu/6.S078

Two-stage pipeline doFetch rule rule doFetch (ir.notFull); let inst = iMem(pc); ir.enq(TypeFetch2Decode {pc:pc, epoch:fEpoch, inst:inst}); if(nextPC.notEmpty) begin match{.ipc,.epoch} = nextPC.first; pc<=ipc; fEpoch<=epoch; nextPC.deq; end else pc <= pc + 4; endrule explicit guard simple branch prediction April 11, 2012 L17-8http://csg.csail.mit.edu/6.S078

Two-stage pipeline doExecute rule rule doExecute (ir.notEmpty); let irpc = ir.first.pc; let inst = ir.first.inst; if(ir.first.epoch==eEpoch) begin let eInst = decodeExecute(irpc, inst, rf); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if (eInst.brTaken) begin nepoch = next(epoch); eEpoch <= nepoch; nextPC.enq(tuple2(eInst.addr, nepoch); end ir.deq; endrule endmodule April 11, 2012 L17-9http://csg.csail.mit.edu/6.S078

Two-Stage pipeline with a Branch Predictor PC Inst Memory Decode Register File Execute Data Memory ir + ppc nextPC fEpoch eEpoch Branch Predictor April 11, 2012 L17-10http://csg.csail.mit.edu/6.S078

Branch Predictor Interface interface NextAddressPredictor; method Addr prediction(Addr pc); method Action update(Addr pc, Addr target); endinterface April 11, 2012 L17-11http://csg.csail.mit.edu/6.S078

Example Null Branch Prediction module mkNeverTaken(NextAddressPredictor); method Addr prediction(Addr pc); return pc+4; endmethod method Action update(Addr pc, Addr target); noAction; endmethod endmodule Replaces PC+4 with … Already implemented in the pipeline Right most of the time Why? April 11, 2012 L17-12http://csg.csail.mit.edu/6.S078

Example Branch Target Prediction (BTB) module mkBTB(NextAddressPredictor); RegFile#(LineIdx, Addr) tagArr <- mkRegFileFull; RegFile#(LineIdx, Addr) targetArr <- mkRegFileFull; method Addr prediction(Addr pc); LineIdx index = truncate(pc >> 2); let tag = tagArr.sub(index); let target = targetArr.sub(index); if (tag==pc) return target; else return (pc+4); endmethod method Action update(Addr pc, Addr target); LineIdx index = truncate(pc >> 2); tagArr.upd(index, pc); targetArr.upd(index, target); endmethod endmodule April 11, 2012 L17-13http://csg.csail.mit.edu/6.S078

Two-stage pipeline + BP module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple3#(Addr,Addr,Bool)) nextPC <- mkBypassFIFOF; NextAddressPredictor bpred <- mkNeverTaken; The definition of TypeFetch2Decode is changed to include predicted pc typedef struct { Addr pc; Addr ppc; Bool epoch; Data inst; } TypeFetch2Decode deriving (Bits, Eq); Some target predictor April 11, 2012 L17-14http://csg.csail.mit.edu/6.S078

Two-stage pipeline + BP Fetch rule rule doFetch (ir.notFull); let ppc = bpred.prediction(pc); let inst = iMem(pc); ir.enq(TypeFetch2Decode {pc:pc, ppc:ppc, epoch:fEpoch, inst:inst}); if(nextPC.notEmpty) begin match{.ipc,.ippc,.epoch} = nextPC.first; pc <= ippc; fEpoch <= epoch; nextPC.deq; bpred.update(ipc, ippc); end else pc <= ppc; endrule April 11, 2012 L17-15http://csg.csail.mit.edu/6.S078

Two-stage pipeline + BP Execute rule rule doExecute (ir.notEmpty); let irpc = ir.first.pc; let inst = ir.first.inst; let irppc = ir.first.ppc; if(ir.first.epoch==eEpoch) begin let eInst = decodeExecute(irpc, irppc, inst, rf); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if (eInst.missPrediction) begin nepoch = next(eEpoch); eEpoch <= nepoch; nextPC.enq(tuple3(irpc, eInst.brTaken ? eInst.addr : irpc+4), nepoch)); end end ir.deq; endrule endmodule April 11, 2012 L17-16http://csg.csail.mit.edu/6.S078 Requires changes in decodeExecute to return missPrediction as opposed to brTaken information

Execute Function function ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc, Addr ppc); ExecInst einst = ?; let aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.missPrediction = brTaken ? brAddr!=ppc : (pc+4)!=ppc; einst.rDst = dInst.rDst; return einst; endfunction April 11, 2012 L17-17http://csg.csail.mit.edu/6.S078

Multiple predictors For multiple predictors to make sense we first need to have more than two stage pipeline With a slightly different (even a 2-satge) pipeline we also need to resolve data-hazards simultaneously Plan Present a different two stage pipeline with data hazards Present a three stage pipeline with  One branch predictor  Two branch predictors April 11, 2012 L17-18http://csg.csail.mit.edu/6.S078

A different 2-Stage pipeline PC Inst Memory Decode Register File Execute Data Memory itr nextPC fEpoch eEpoch April 11, 2012 L Branch Predictor stall

TypeDecode2Execute typedef struct { Addr pc; Addr ppc; Bool epoch; DecodedInst dInst; Data rVal1; Data rVal2 } TypeDecode2Execute deriving (Bits, Eq); April 11, 2012 L value instead of register names

The stall function function Bool stall(Maybe#(Rindx) src1, Maybe#(Rindx) src2, PipeReg#(TypeDecode2Execute) itr); dst = itr.first.dInst.rDst; return (itr.notEmpty && isValid(dst) && ((validValue(dst)==validValue(src1) && isValid(src1)) || (validValue(dst)==validValue(src2) && isValid(src2)))); endfunction April 11, 2012 L src1, src2 and rDst in DecodedInst are changed from Rindx to Maybe#(Rindx) to determine the stall condition

A different 2-Stage pipeline module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkConfigRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeDecode2Execute) itr <- mkConfigPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple3#(Addr,Addr,Bool)) nextPC <- mkBypassFIFOF; NextAddressPredictor bpred <- mkNeverTaken; April 11, 2012 L

A different 2-Stage pipeline doFetch rule rule doFetch (itr.notFull); let inst = iMem(pc); let dInst = decode(inst); if(!stall(dInst.src1, dInst.src2, itr)) begin let ppc = bpred.prediction(pc); let rVal1 = rf.rd1(validValue(dInst.src1)); let rVal2 = rf.rd2(validValue(dInst.src2)); itr.enq(TypeDecode2Execute{pc:pc, ppc:ppc, epoch:fEpoch, dInst:dInst, rVal1:rVal1, rVal2:rVal2}); if(nextPC.notEmpty) begin match{.ipc,.ippc,.epoch} = nextPC.first; pc <= ippc; fEpoch <= epoch; nextPC.deq; bpred.update(ipc, ippc); end else pc <= ppc; end endrule April 11, 2012 L

A different 2-Stage pipeline doExecute rule rule doExecute (itr.notEmpty); let itrpc=itr.first.pc; let dInst=itr.first.dInst; let itrppc=itr.first.ppc; let rVal1=itr.first.rVal1; let rVal2=itr.first.rVal2; if(itr.first.epoch==eEpoch) begin let eInst = execute(dInst, rVal1, rVal2, itrpc); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if(eInst.missPrediction) begin nepoch = next(epoch); eEpoch <= nepoch; nextPC.enq(tuple3(itrpc, eInst.brTaken ? eInst.addr : itrpc+4) nepoch); end end itr.deq; endrule endmodule April 11, 2012 L

April 11, 2012 L Concurrency analysis nextPC bypass fifo functionality: enq < deq Hence doExecute happens before doFetch every cycle itr pipeline fifo functionality: deq < enq Hence doExecute happens before doFetch every cycle itr pipeline fifo functionality: first < deq Hence doFetch happens before doExecute every cycle to determine the stall condition Use config pipeline fifo to remove scheduling constraint mkRFile functionality: {rd1, rd2} < wr Hence doFetch happens before doExecute every cycle Use mkConfigRFile to remove scheduling constraint

3-Stage pipeline – 1 predictor PC Inst Memory Decode Register File Execute Data Memory itr nextPC fEpoch eEpoch April 11, 2012 L Branch Predictor stall ir nextPC dEpoch

3-Stage pipeline – 1 predictor module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkConfigRFile; IMemory iMem <- mkIMemory; DMemory dMem <- mkDMemory; PipeReg#(TypeFetch2Decode) ir <- mkPipeReg; PipeReg#(TypeDecode2Execute) itr <- mkConfigPipeReg; Reg#(Bool) fEpoch <- mkReg(False); Reg#(Bool) dEpoch <- mkReg(False); Reg#(Bool) eEpoch <- mkReg(False); FIFOF#(Tuple2#(Addr,Addr)) nextPCE2D <-mkBypassFIFOF; FIFOF#(Tuple2#(Addr,Addr)) nextPCD2F <-mkBypassFIFOF; NextAddressPredictor bpred <- mkNeverTaken; April 11, 2012 L

3-Stage pipeline – 1 predictor rule doFetch (ir.notFull); let inst = iMem(pc); let ppc = bpred.prediction(pc); ir.enq(TypeFetch2Decode{ pc:pc, ppc:ppc, epoch:fEpoch, inst:inst}); if(nextPCD2F.notEmpty) begin match{.ipc,.ippc} = nextPCD2F.first; pc <= ippc; fEpoch <= !fEpoch; nextPCD2F.deq; bpred.update(ipc, ippc); end else pc <= ppc; end endrule April 11, 2012 L

3-Stage pipeline – 1 predictor rule doDecode (itr.notFull && ir.notEmpty); let irpc=ir.first.pc; let irppc=ir.first.ppc; let inst=ir.first.inst; if(nextPCE2D.notEmpty) begin dEpoch <= !dEpoch; nextPCD2F.enq(nextPCE2D.first); nextPCE2D.deq; ir.deq; end else if(ir.first.epoch==dEpoch) begin let dInst = decode(inst); if(!stall(dInst.src1, dInst.src2, itr)) begin let rVal1 = rf.rd1(validValue(dInst.src1)); let rVal2 = rf.rd2(validValue(dInst.src2)); itr.enq(TypeDecode2Execute{pc:irpc, ppc:irppc, epoch:dEpoch, dInst:dInst, rVal1:rVal1, rVal2:rVal2}); ir.deq; end end else ir.deq; endrule April 11, 2012 L

3-Stage pipeline – 1 predictor rule doExecute (itr.notEmpty); let itrpc=itr.first.pc; let dInst=itr.first.dInst; let itrppc=itr.first.ppc; let rVal1=itr.first.rVal1; let rVal2=itr.first.rVal2; if(itr.first.epoch==eEpoch) begin let eInst = execute(dInst, rVal1, rVal2, itrpc); let memData <- dMemAction(eInst, dMem); regUpdate(eInst, memData, rf); if(eInst.missPrediction) begin nextPCE2D.enq(tuple2(itrpc, eInst.brTaken ? eInst.addr : itrpc+4)); eEpoch <= !eEpoch; end itr.deq; endrule endmodule April 11, 2012 L