Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Advertisements

Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.

March 11, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering.

CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

EECS 470 Pipeline Control Hazards Lecture 5 Coverage: Chapter 3 & Appendix A.

CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.

Computer Architecture: A Constructive Approach Branch Prediction - 1 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.

Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Constructive Computer Architecture Virtual Memory and Interrupts Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Branch Prediction - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Branch Prediction CS 3220 Fall 2014 Hadi Esmaeilzadeh Georgia Institute of Technology Some slides adopted from Prof. Milos Prvulovic.

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.

6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.

Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modeling Processors

Tutorial 7: SMIPS Epochs Constructive Computer Architecture

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Branch Prediction Constructive Computer Architecture: Arvind

Branch Prediction Constructive Computer Architecture: Arvind

Caches-2 Constructive Computer Architecture Arvind

Multistage Pipelined Processors and modular refinement

TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble

Modular Refinement Arvind

Modular Refinement Arvind

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Lab 4 Overview: 6-stage SMIPS Pipeline

Control Hazards Constructive Computer Architecture: Arvind

Branch Prediction Constructive Computer Architecture: Arvind

Krste Asanovic Electrical Engineering and Computer Sciences

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Branch Prediction: Direction Predictors

Branch Prediction: Direction Predictors

Caches-2 Constructive Computer Architecture Arvind

Modular Refinement - 2 Arvind

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

Branch Prediction: Direction Predictors

Recovery: Redirect fetch unit to T path if actually T.

Tutorial 4: RISCV modules Constructive Computer Architecture

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Modular Refinement Arvind

Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture

Caches-2 Constructive Computer Architecture Arvind

Presentation transcript:

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 18, L18-1

Six Stage Pipeline March 19, F Fetch D Decode R Reg Read X Execute M Memory W Write- back L12-2 Need to add a next address prediction

Next Address Prediction April 18, 2012 L18-3http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction Feedback is now redirect and prediction feedback not just branch target PC

Branch Target Buffer April 18, 2012 L18-4http://csg.csail.mit.edu/6.S078 F stage: If (hit) then nPC=target else nPC=PC+4 X stage: Check prediction, if wrong then kill younger instructions and train BTB (sometimes even if prediction correct) IMEM PC Branch Target Buffer (2 k entries) k predicted target tag = hit

BTB Interface typedef Addr NaInfo; typedef Tuple2#(Addr, NaInfo) Prediction; interface NextAddrPred; method ActionValue#(Prediction) predict(Addr addr); method Action train(NaInfo naInfo, Bool correct, Addr realTarget); endinterface April 18, 2012 L18-5http://csg.csail.mit.edu/6.S078 In lab code, NaInfo has more elements and “train” takes more arguments to allow for more sophisticated predictors  Predictor-specific information to save and use later to train predictor

BTB State typedef 64 BTBRows; typedef Bit#(TLog#(BTBRows)) LineIndex; module mkNextAddrPred(NextAddrPred); // BTB State RegFile#(LineIndex, Addr) tagArray <- mkRegFileFull(); RegFile#(LineIndex, Addr) targetArray <- mkRegFileFull(); April 18, 2012 L18-6http://csg.csail.mit.edu/6.S078

BTB Prediction method ActionValue#(Prediction) predict(Addr currentAddr); LineIndex index = truncate(CurrentAddr >> 2); let tag = tagArray.sub(index); let target = targetArray.sub(index); Addr predNextAddr = ?; if (tag == currentAddr) predNextAddr = target; else predNextAddr = currentAddr+4; return tuple2(predNextAddr, currentAddr); endmethod April 18, 2012 L18-7http://csg.csail.mit.edu/6.S078

BTB Training method Action train(NaInfo naInfo, Bool correct, Addr target); let tag = naInfo; LineIndex index = truncate(naInfo >> 2); if (! correct) begin tagArray.upd(index, tag); targetArray.upd(index, target); end endmethod endmodule April 18, 2012 L18-8http://csg.csail.mit.edu/6.S078  Note: if BTB had been 2-way set associative naInfo would include ‘way’ and train() would not need to do a lookup to do its job.

Epoch management April 18, 2012 L18-9http://csg.csail.mit.edu/6.S078 FDRXMWFDRXMW α.1 1 β.1 α.1 1 γ.1 β.1 α.1 1 δ.1 γ.1 β.1 α δ.1 γ.1 β.1 α ε.2 δ.1 γ.1 β.1 α ζ.2 ε.2 δ.1 γ.1 β η.2 ζ.2 ε.2 δ.1 γ η.2 ζ.2 ε.2 δ η.2 ζ.2 ε α = 00: j 40 β = 80: add … γ = 84: add... δ = 88: add... ε = 40: add... ζ = 44: add... η = 48: add...  Next address mispredict on ‘jmp’. Corrected in execute

Pipeline feedback // Epoch state Reg#(Epoch) feEpoch <- mkReg(0); // epoch at Fetch Reg#(Epoch) eeEpoch <- mkReg(0); // epoch at Execute // Feedback information and mechanism typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; } Feedback deriving (Bits, Eq); FIFOF#(Tuple2#(Epoch, Feedback)) execFeedback <- mkFIFOF; April 18, 2012 L18-10http://csg.csail.mit.edu/6.S078

Integration into Fetch rule doFetch(); function Action enqInst(); action let d <- mem.side(MemReq{op: Ld, addr: fetchPC, data:?}; match {.nAddrPred,.naPredInfo}<-naPred.predict(fetchPc); FBundle fInst = FBundle{instResp: d}; FData fData = FData{pc: fetchPc, fInst: fInst, inum: iNum, execEpoch: feEpoch, naPredInfo: naPredInfo, nextAddrPred: nAddrPred}; iNum <= iNum + 1; fetchPc <= nAddrPred; fr.enq(fData); endaction endfunction April 18, 2012 L18-11http://csg.csail.mit.edu/6.S078 FetchPC generation to FetchPC use is a tight dependency loop

Fetch (continued) if (execFeedback.notEmpty) begin execFeedback.deq; match {.execEpoch,.fb} = execFeedback.first; naPred.train(fb.naPredInfo, fb.correct, fb.nextAddr); if(!fb.correct) begin feEpoch <= execEpoch; fetchPc <= fb.nextAddr; end else begin enqInst(); end else enqInst(); endrule April 18, 2012 L18-12http://csg.csail.mit.edu/6.S078  Since we train() and predict() [in enqInst()] in the same cycle naPredInfo helps avoid conflicts inside predictor.  Train() and redirect on mispredict. Bubble!  Train() and fetch next inst on correct prediction.

Execute rule doExecute; ExecData execData = newExecData(rr.first()); let decInst = execData.decInst; execData.poisoned = (eeEpoch != execData.execEpoch); if (! execData.poisoned) begin let src1 = execData.regInst.src1; let src2 = execData.regInst.src2; execData.execInst = exec.exec(decInst, src1, src2); let cond = execData.execInst.cond; let target = execData.execInst.addr; let nPc = cond ? target: execData.pc+4; let naPredInfo = execData.naPredInfo; let correctPred = (nPC == execData.nextAddrPred); April 18, 2012 L18-13http://csg.csail.mit.edu/6.S078  Instruction execution  Check predicted  next address

Execute (continued) let newEeEpoch = eeEpoch; if (! correctPred) newEeEpoch = eeEpoch + 1; execFeedback.enq( tuple2(newEeEpoch, Feedback{correct: correctPred, naPredInfo: naPredInfo, nextAddr: nPC})); eeEpoch <= newEeEpoch; end // not poisoned xr.enq(execData); rr.deq(); endrule April 18, 2012 L18-14http://csg.csail.mit.edu/6.S078 If !correctPred, which instructions are bad and must be dropped?  Always send feedback to allow training for correctly predicted next addresses  Change epoch if next address mispredict  Always pass instruction to next stage

Next Address Prediction April 18, 2012 L18-15http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction Where else can we figure out that the prediction is wrong?

Feedback from decode April 18, 2012 L18-16http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction

Decode detected mispredicts Non-branch When nextPC != PC+4 => use PC+4 Unconditional target known at decode When nextPC != known target => use known target Conditional branch When nextPC != PC+4 or decoded target => use PC+4 April 18, 2012 L18-17http://csg.csail.mit.edu/6.S078

Add a ‘decode’ epoch Reg#(Epoch) fdEpoch <- mkReg(0); // decode fetch Reg#(Epoch) feEpoch <- mkReg(0); // exec fetch Reg#(Epoch) ddEpoch <- mkReg(0); // decode decode Reg#(Epoch) deEpoch <- mkReg(0); // exec decode Reg#(Epoch) eeEpoch <- mkReg(0); // exec exec typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback <- mkFIFOF; April 18, 2012 L18-18http://csg.csail.mit.edu/6.S078  Send back both decode and exec epochs as feedback from decode.

NA mispredict - jmp April 18, 2012 L18-19http://csg.csail.mit.edu/6.S078 γ.1.2 β.1.1 α.1.1 β.1.1 α.1.1 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 α.1.1 δ.1.2 γ.1.2 α.1.1 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 α.1.1 η.1.2 ζ.1.2 ε.1.2 δ.1.2 FDRXMWFDRXMW α = 00: j 40 β = 04: add … γ = 40: add... δ = 44: add... ε = 48: add... ζ = 52: add... η = 56: add  Next address mispredict on ‘jmp’. Corrected in decode!

NA mispredict - add April 18, 2012 L18-20http://csg.csail.mit.edu/6.S078 γ.1.2 β.1.1 α.1.1 β.1.1 α.1.1 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 α.1.1 δ.1.2 γ.1.2 α.1.1 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 η.1.2 ζ.1.2 ε.1.2 δ.1.2 γ.1.2 α.1.1 η.1.2 ζ.1.2 ε.1.2 δ.1.2 FDRXMWFDRXMW α = 00: add... β = 80: add … γ = 04: add... δ = 08: add... ε = 12: add... ζ = 16: add... η = 20: add  Next address mispredict on ‘add’ corrected in decode

NA mispredict - beq April 18, 2012 L18-21http://csg.csail.mit.edu/6.S078 γ.1.1 β.1.1 α.1.1 β.1.1 α.1.1 ζ.2.1 ε.2.1 δ.1.1 γ.1.1 β.1.1 α.1.1 δ.1.1 γ.1.1 β.1.1 α.1.1 η.2.1 ζ.2.1 ε.2.1 δ.1.1 γ.1.1 β.1.1 η.2.1 ζ.2.1 ε.2.1 δ.1.1 γ.1.1 η.2.1 ζ.2.1 ε.2.1 δ.1.1 γ.1.1 β.1.1 α.1.1 η.2.1 ζ.2.1 ε.2.1 δ.1.1 FDRXMWFDRXMW α = 00: beq r0,r0 40 β = 04: add … γ = 08: add... δ = 12: add... ε = 40: add... ζ = 44: add... η = 48: add  Next address mispredict on ‘beq’. Corrected in execute.

NA mispredict – late shadow April 18, 2012 L18-22http://csg.csail.mit.edu/6.S078 γ.1.1 β.1.1 α.1.1 β.1.1 α.1.1 ζ.2.1 ε.2.1 γ.1.1 β.1.1 α.1.1 δ.1.1 γ.1.1 β.1.1 α.1.1 η.2.1 ζ.2.1 ε.2.1 γ.1.1 β.1.1 η.2.1 ζ.2.1 ε.2.1 γ.1.1 η.2.1 ζ.2.1 ε.2.1 δ.1.1 γ.1.1 β.1.1 α.1.1 η.2.1 ζ.2.1 ε.2.1 FDRXMWFDRXMW α = 00: beq r0,r0,40 β = 04: add … γ = 08: add... δ = 80: add... ε = 40: add... ζ = 16: add... η = 20: add  Next address mispredict on ‘beq’. Corrected in execute.  With next address mispredict late in shadow.

NA mispredict – early shadow April 18, 2012 L18-23http://csg.csail.mit.edu/6.S078 γ.1.1 β.1.1 α.1.1 β.1.1 α.1.1 ζ.2.2 ε.2.2 δ.1.2 β.1.1 α.1.1 δ.1.2 γ.1.1 β.1.1 α.1.1 η.2.2 ζ.2.2 ε.2.2 δ.1.2 β.1.1 η.2.2 ζ.2.1 ε.2.2 δ.1.2 η.2.2 ζ.2.2 ε.2.2 δ.1.2 β.1.1 α.1.1 η.2.2 ζ.2.2 ε.2.2 δ.1.2 FDRXMWFDRXMW α = 00: beq r0,r0,40 β = 04: add … γ = 80: add... δ = 84: add... ε = 40: add... ζ = 16: add... η = 20: add  Next address mispredict on ‘beq’. Corrected in execute.  With next address mispredict earlier in shadow.

Epoch management Fetch On exec redirect – update to new exec epoch On decode redirect – if for current exec epoch then update to new decode epoch Decode On new exec epoch – update exec and decode epochs Otherwise,  On decode epoch mismatch – drop instruction Always, on next addr mispredict – move to new decode epoch and redirect. Execute On exec epoch mismatch - poison instruction Otherwise, on mispredict – move to new exec epoch and redirect. April 18, 2012 L18-24http://csg.csail.mit.edu/6.S078

Decode with mispredict detect rule doDecode; let decData = newDecData(fr.first); let correctPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); let instResp = decData.fInst.instResp; let pcPlus4 = decData.pc+4; if (correctPath) begin decData.decInst = decode(instResp, pcPlus4); let target = knownTargetAddr(decData.decInst); let decodedTarget = ?; let brClass = getBrClass(decData.decInst); let predTarget = decData.nextAddrPred; April 18, 2012 L18-25http://csg.csail.mit.edu/6.S078  Determine if epoch of incoming instruction is on good path  New exec epoch  Same dec epoch

Decode with mispredict detect if (brClass == NonBranch) decodedTarget = pcPlus4 else if(brClass == CondBranch) decodedTarget = target; else if(brClass == UncondKnown) decodedTarget = target; else decodedTarget = decData.nextAddrPred; if ((decodedTarget != predTarget) || (brClass == CondBranch && pcPlus4 != predTarget)) begin decData.decEpoch = decData.decEpoch + 1; decData.nextAddrPred = decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget})); end dr.enq(decData); end // of correct path April 18, 2012 L18-26http://csg.csail.mit.edu/6.S078  Wrong next address?  Tell exec addr of next instruction!  Send feedback  New dec epoch  Enqueue to next stage on correct path

Decode with mispredict detect else begin // incorrect path decData.decEpoch = ddEpoch; decData.execEpoch = deEpoch; end ddEpoch <= decData.decEpoch; deEpoch <= decData.execEpoch; fr.deq; endrule April 18, 2012 L18-27http://csg.csail.mit.edu/6.S078  Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them.

Handling redirect from decode if(execFeedback.notEmpty) begin /* same as before */ end else if(decFeedback.notEmpty) begin decFeedback.deq; match {.eEpoch,.dEpoch,.feedback} = decFeedback.first; if (eEpoch == feEpoch) begin if (!feedback.correct) begin fdEpoch <= dEpoch; fetchPc <= feedback.nextAddr; end else enqInst; // decode feedback for correct prediction end else enqInst; // decode feedback for wrong exec epoch end else enqInst; // no feedback from anyone endrule April 18, 2012 L18-28http://csg.csail.mit.edu/6.S078  Note: no training since it will be done by feedback from exec  Respond if decode feedback is for current exec epoch