Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

Slides:

Advertisements

Similar presentations

Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Advertisements

Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.

Dynamic Branch Prediction

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.

Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.

CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

Goal: Reduce the Penalty of Control Hazards

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

Dynamic Branch Prediction

Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.

Computer Architecture: A Constructive Approach Branch Prediction - 1 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of.

Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

Constructive Computer Architecture: Branch Prediction Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.

Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #13 (10/28/15) Course.

6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.

Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

CSL718 : Pipelined Processors

CS203 – Advanced Computer Architecture

6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…

Computer Structure Advanced Branch Prediction

Computer Architecture Advanced Branch Prediction

Bluespec-6: Modeling Processors

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Branch Prediction Constructive Computer Architecture: Arvind

Caches-2 Constructive Computer Architecture Arvind

CMSC 611: Advanced Computer Architecture

Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor

Lecture: Branch Prediction

Branch Prediction Constructive Computer Architecture: Arvind

Krste Asanovic Electrical Engineering and Computer Sciences

Dynamic Branch Prediction

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Branch Prediction: Direction Predictors

Branch Prediction: Direction Predictors

Modular Refinement - 2 Arvind

Control Hazards Constructive Computer Architecture: Arvind

Branch Prediction: Direction Predictors

Pipelining: dynamic branch prediction Prof. Eric Rotenberg

Adapted from the slides of Prof

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Dynamic Hardware Prediction

Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture

Presentation transcript:

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 23, 2012L19-1

NA pred with decode feedback April 23, 2012 L19-2http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction

Decode detected mispredicts Non-branch When nextPC != PC+4 => use PC+4 Unconditional target known at decode When nextPC != known target => use known target Conditional branch When nextPC != PC+4 or decoded target => use PC+4 April 23, 2012 L19-3http://csg.csail.mit.edu/6.S078 Can we do better than PC+4?

Dynamic Branch Prediction April 23, 2012 L12-4http://csg.csail.mit.edu/6.S078 Branch direction prediction: Learn and predict the direction a branch will go Standard prediction principles: Temporal correlation The way a branch resolves may be a good predictor of the way it will resolve at the next execution Spatial correlation Several branches may resolve in a highly correlated manner (a preferred path of execution)

One-bit predictor April 23, 2012 L19-5http://csg.csail.mit.edu/6.S Fetch PC Branch? Target PC + I-Cache Opcodeoffset Instruction k BHT Index 2 k -entry BHT, 1 bits/entry Taken/¬Taken? Fetch Decode Predict branch will go same direction it went last time

One-bit predictor // Interface interface DirectionPred; method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); method Action train(DirInfo dirInfo, Bool taken); endinterface // Feedback information typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedef DirLineIndex DirInfo; April 23, 2012 L19-6http://csg.csail.mit.edu/6.S078

One-bit predictor (continued) module mkDirectionPredictor(DirectionPred); RegFile#(DirLineIndex, Bool) dirArray <- mkRegFileFull(); method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); DirLineIndex index = truncate(addr >> 2); return tuple2(dirArray.sub(index), index); endmethod method Action train(DirInfo dirInfo, Bool taken); DirLineIndex index = dirInfo; dirArray.upd(index, taken); endmethod endmodule April 23, 2012 L19-7http://csg.csail.mit.edu/6.S078  Array of prediction bits  Return prediction saved in array  Update array with last actual behavior When should we train?

Two-bit Predictor Smith, 1981 April 23, 2012 L19-8http://csg.csail.mit.edu/6.S078 Assume 2 direction prediction bits per instruction On ¬taken   On taken 11Strongly taken 10Weakly taken 01Weakly ¬taken 00Strongly ¬taken How well does one-bit predictor do on short trip count loops? Implement using saturating counter

Saturating Counter typedef Bit#(2) Counter; function Counter updateCounter(Bool dir, Counter counter); return dir?saturatingInc(counter) :saturatingDec(counter); endfunction function Counter saturatingInc(Counter counter); let plusOne = counter + 1; return (plusOne == 0)?counter:plusOne; endfunction function Counter saturatingDec(Counter counter); return (counter == 0)?0:counter-1; endfunction April 23, 2012 L19-9http://csg.csail.mit.edu/6.S078 How do we determine prediction from counter?

Two-bit predictor April 23, 2012 L19-10http://csg.csail.mit.edu/6.S Fetch PC k BHT Index 2 k -entry BHT, 1 bits/entry Taken/¬Taken?

Two-bit predictor typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; // DirInfo data typedef struct { DirLineIndex index; Counter counter; } DirInfo deriving(Bits, Eq); module mkDirectionPredictor(DirectionPred); // Direction predictor state RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); April 23, 2012 L19-11http://csg.csail.mit.edu/6.S078  Feedback state for training

Two-bit predictor (continued) method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); DirInfo info = ? info.index = truncate(addr >> 2); info.counter = cntArray.sub(index); Bool taken = (truncate(counter >> 1) == 1); return tuple2(taken, info); endmethod method Action train(DirInfo info, Bool taken); cntArray.upd(info.index, updateCounter(taken, info.counter)); endmethod endmodule April 23, 2012 L19-12http://csg.csail.mit.edu/6.S078  Training information is index and counter  Prediction is high bit of counter  Train by updating counter

Exploiting Spatial Correlation Yeh and Patt, 1992 April 23, 2012 L19-13http://csg.csail.mit.edu/6.S078 Implemented with a history register, ‘hist’, that records the direction of the last N branches executed by the processor. if (x[i] < 7) then y += 1; if (x[i] < 5) then c -= 4; If first condition false, second condition also false Also works well for short trip count loops.

Ghist predictor typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedef Bit#(2) Counter; // DirInfo data typedef struct { DirLineIndex hist; Counter counter; } DirInfo deriving(Bits, Eq); module mkDirectionPredictor(DirectionPred); // Direction predictor state Reg#(DirLineIndex) hist <- mkReg(0); RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); April 23, 2012 L19-14http://csg.csail.mit.edu/6.S078

Global history predictor method ActionValue#(Tuple2#(Bool, DirInfo)) predict(Addr addr); DirInfo info = ?; info.hist = hist; info.counter = cntArray.sub(hist); Bit#(1) pred = truncate(info.counter >> 1); hist <= truncate(hist << 1 | zeroExtend(pred)); return tuple2((pred == 1), info); endmethod April 23, 2012 L19-15http://csg.csail.mit.edu/6.S078 How good are predictions while waiting for training?  Shift new prediction into history register  Calculate feedback information

Global history predictor method Action train(DirInfo info, Bool taken); counterArray.upd(info.hist, updateCounter(taken, info.counter)); endmethod method Action repair(DirInfo info, Bool taken); hist <= truncate((info.hist << 1) | zeroExtend(pack(taken))); endmethod endmodule April 23, 2012 L19-16http://csg.csail.mit.edu/6.S078 What is the state of ‘hist’ after redirects from decode and execute?  Restore history to state it would be in after the desired prediction

NA pred with decode feedback April 23, 2012 L19-17http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction Direction Prediction

Direction prediction recipe Execute Send redirects on mispredicts (unchanged) Send direction prediction training Decode Check if next address matches direction pred Send redirect if different Fetch Generate prediction Learn from feedback Accept redirects from later stages April 23, 2012 L19-18http://csg.csail.mit.edu/6.S078

Add direction feedback typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; DirInfo dirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback <- mkFIFOF; April 23, 2012 L19-19http://csg.csail.mit.edu/6.S078  Feedback needs information for training direction predictor

Execute (branch analysis) // after executing instruction... let nextEeEpoch = eeEpoch; let cond = execData.execInst.cond; let nextPc = cond?execData.execInst.addr : execData.pc+4; if (nextPC != execData.nextAddrPred) nextEeEpoch += 1; eeEpoch <= newEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: (nextPC == execData.nextAddrPred), taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage April 23, 2012 L19-20http://csg.csail.mit.edu/6.S078  Recall: may have been set in decode  Always send feedback

Decode with mispredict detect rule doDecode; let decData = newDecData(fr.first); let correctPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); let instResp = decData.fInst.instResp; let pcPlus4 = decData.pc+4; if (correctPath) begin decData.decInst = decode(instResp, pcPlus4); let target = knownTargetAddr(decData.decInst); let brClass = getBrClass(decData.decInst); let predTarget = decData.nextAddrPred; let predDir = decData.takenPred; April 23, 2012 L19-21http://csg.csail.mit.edu/6.S078  Determine if epoch of incoming instruction is on good path  New exec epoch  Same dec epoch

Decode with mispredict detect let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default: decData.nextAddrPred; endcase; if (decodedTarget != predTarget) begin decData.decEpoch = decData.decEpoch + 1; decData.nextAddrPred = decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path April 23, 2012 L19-22http://csg.csail.mit.edu/6.S078  Wrong next addr?  Tell exec addr of next instruction!  Send feedback  New dec epoch  Enqueue to next stage on correct path  Calculate target as best as decode can

Decode with mispredict detect else begin // incorrect path decData.decEpoch = ddEpoch; decData.execEpoch = deEpoch; end ddEpoch <= decData.decEpoch; deEpoch <= decData.execEpoch; fr.deq; endrule April 23, 2012 L19-23http://csg.csail.mit.edu/6.S078  Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them.

Handling redirect from execute if (execFeedback.notEmpty) begin match {.execEpoch,.fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc <= feedback.nextAddr; end else begin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end April 23, 2012 L19-24http://csg.csail.mit.edu/6.S078 Train and repair on redirect Just train on correct prediction

Handling redirect from decode else if (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch,.decEpoch,.fb} = decFeedback.first; if (execEpoch == feEpoch) begin if (!fb.correct) begin // epoch unchanged fdEpoch <= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc <= feedback.nextAddr; end else // dec feedback on correct prediction enqInst; end else // dec feedback, but in fetch is in new exec epoch enqInst; else // no feedback enqInst; April 23, 2012 L19-25http://csg.csail.mit.edu/6.S078 Just repair never train on feedback from decode