Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Computer Structure Advanced Branch Prediction
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Yiorgos Makris Professor Department of Electrical Engineering University of Texas at Dallas EE (CE) 6304 Computer Architecture Lecture #13 (10/28/15) Course.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Dynamic Branch Prediction
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…
Computer Structure Advanced Branch Prediction
Control Hazards Constructive Computer Architecture: Arvind
Computer Architecture Advanced Branch Prediction
Tutorial 7: SMIPS Epochs Constructive Computer Architecture
COSC3330 Computer Architecture Lecture 15. Branch Prediction
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Branch Prediction Constructive Computer Architecture: Arvind
ECS 154B Computer Architecture II Spring 2009
TIME C1 C2 C3 C4 C5 C6 C7 C8 C9 I1 branch decode exec mem wb bubble
Constructive Computer Architecture Tutorial 5 Epoch & Branch Predictor
Lecture: Branch Prediction
Krste Asanovic Electrical Engineering and Computer Sciences
Dynamic Branch Prediction
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Branch Prediction: Direction Predictors
Branch Prediction: Direction Predictors
Lecture 10: Branch Prediction and Instruction Delivery
Control Hazards Constructive Computer Architecture: Arvind
Branch Prediction: Direction Predictors
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Modeling Processors Arvind
Control Hazards Constructive Computer Architecture: Arvind
So far we have dealt with control hazards in instruction pipelines by:
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Presentation transcript:

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 23, 2012L20-1

NA pred with decode feedback April 23, 2012 L20-2http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction Direction Prediction

Direction prediction recipe Execute Send redirects on mispredicts (unchanged) Send direction prediction training Decode Check if next address matches direction pred Send redirect if different (update naPred) Fetch Generate prediction Learn from feedback Accept redirects from later stages April 23, 2012 L20-3http://csg.csail.mit.edu/6.S078

Epoch management recipe Execute On exec epoch mismatch - poison instruction Otherwise,  On mispredict – change exec epoch and redirect. Decode On new exec epoch – update local exec/decode epochs Otherwise,  On decode epoch mismatch – drop instruction If not dropped,  On next addr mispredict – change decode epoch and redirect. Fetch On exec redirect – update local exec epoch On decode redirect – if for current exec epoch then update local decode epoch April 18, 2012 L20-4http://csg.csail.mit.edu/6.S078

Add direction feedback typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; DirInfo dirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback <- mkFIFOF; April 23, 2012 L20-5http://csg.csail.mit.edu/6.S078  Feedback needs information for training direction predictor  Execute epoch  Decode epoch  Execute epoch

Execute (branch analysis) // after executing instruction... let nextEeEpoch = eeEpoch; let cond = execData.execInst.cond; let nextPc = cond?execData.execInst.addr : execData.pc+4; let correctPred = (nextPC == execData.nextAddrPred); if (!correctPred) nextEeEpoch += 1; eeEpoch <= nextEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: correctPred, taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage April 23, 2012 L20-6http://csg.csail.mit.edu/6.S078  Note: may have been reset in decode  Always send feedback

Decode with mispredict detect rule doDecode; let decData = newDecData(fr.first); let correctPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); let instResp = decData.fInst.instResp; let pcPlus4 = decData.pc+4; if (correctPath) begin decData.decInst = decode(instResp, pcPlus4); let target = knownTargetAddr(decData.decInst); let brClass = getBrClass(decData.decInst); let predTarget = decData.nextAddrPred; let predDir = decData.dirPred; April 23, 2012 L20-7http://csg.csail.mit.edu/6.S078  Determine if epoch of incoming instruction is on good path  New exec epoch  Same dec epoch

Decode with mispredict detect let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default: decData.nextAddrPred; endcase; if (decodedTarget != predTarget) begin decData.decEpoch = decData.decEpoch + 1; decData.nextAddrPred = decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path April 23, 2012 L20-8http://csg.csail.mit.edu/6.S078  Wrong next addr?  Tell exec addr of next instruction!  Send feedback  New dec epoch  Enqueue to next stage on correct path  Calculate target as best as decode can

Decode with mispredict detect else begin // incorrect path decData.decEpoch = ddEpoch; decData.execEpoch = deEpoch; end ddEpoch <= decData.decEpoch; deEpoch <= decData.execEpoch; fr.deq; endrule April 23, 2012 L20-9http://csg.csail.mit.edu/6.S078  Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them.

Integration into Fetch rule doFetch(); function Action enqInst(); action let d <- mem.side(MemReq{op: Ld, addr: fetchPC, data:?}; match {.nAddrPred,.naPredInfo}<-naPred.predict(fetchPc); match {.dirPred,.dirPredInfo}<-dirPred.predict(fetchPc); FBundle fInst = FBundle{instResp: d}; FData fData = FData{pc: fetchPc, fInst: fInst, inum: iNum, execEpoch: feEpoch, naPredInfo:naPredInfo, nextAddrPred:nAddrPred, dirPredInfo:dirPredInfo, dirPred:dirPred }; iNum <= iNum + 1; fetchPc <= nAddrPred; fr.enq(fData); endaction endfunction April 18, 2012 L20-10http://csg.csail.mit.edu/6.S078

Handling redirect from execute if (execFeedback.notEmpty) begin match {.execEpoch,.fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc <= feedback.nextAddr; end else begin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end April 23, 2012 L20-11http://csg.csail.mit.edu/6.S078 Train and repair on redirect Just train on correct prediction

Handling redirect from decode else if (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch,.decEpoch,.fb} = decFeedback.first; if (execEpoch == feEpoch) begin if (!fb.correct) begin // epoch unchanged fdEpoch <= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc <= feedback.nextAddr; end else // dec feedback on correct prediction enqInst; end else // dec feedback, but fetch is in new exec epoch enqInst; else // no feedback enqInst; April 23, 2012 L20-12http://csg.csail.mit.edu/6.S078 Just repair never train on feedback from decode

Immediate update issues If the direction director does not update immediately on predictions things are easy. But if the predictor updates, we will predict and update the predictor on non- branches. Possible solutions: Move direction prediction to decode, so we know not to update on non-branches. But makes timing more critical. Simply use direction predictor even on non-branch instructions.  Note: for superscaler issue designs this is a less significant problem. April 23, 2012 L20-13http://csg.csail.mit.edu/6.S078 Note: In the lab code we communicate the branch type of each instruction to allow training and repair to decide if they want to perform updates or not based on instruction type.

Predictor Primitive Indexed table holding values Operations Predict Update Algebraic notation Prediction = P[Width, Depth](Index; Update) October 24, 2011 L20-14http://csg.csail.mit.edu/6.s078 Index Prediction Update Depth Width P UI

One-bit Predictor October 24, 2011 L20-15http://csg.csail.mit.edu/6.s078 PC Taken Prediction A21064(PC; T) = P[ 1, 2K ](PC; T) P U I 1 bit What happens on loop branches? At best, mispredicts twice for every use of loop. Simple temporal prediction

Two-bit Predictor October 24, 2011 L20-16http://csg.csail.mit.edu/6.s078 PC +/- Adder Taken Prediction Counter[W,D](I; T) = P[W, D](I; if T then P+1 else P-1) A21164(PC; T) = MSB(Counter[2, 2K](PC; T)) P U I 2 bits

History Register October 24, 2011 L20-17http://csg.csail.mit.edu/6.s078 PC Concatenate Taken History History(PC, T) = P(PC; P || T) P U I

Global History October 24, 2011 L20-18http://csg.csail.mit.edu/6.s078 GHist(;T) = MSB(Counter(History(0, T); T)) Ind-Ghist(PC;T) = MSB(Counter(PC || Hist(GHist(;T);T))) Taken 0 Concat Global History +/- Prediction Can we take advantage of a pattern at a particular PC?

Local History October 24, 2011 L20-19http://csg.csail.mit.edu/6.s078 PC Concat Local History +/- Prediction Taken LHist(PC, T) = MSB(Counter(History(PC; T); T)) Can we take advantage of the global pattern at a particular PC?

Two-level Predictor October 24, 2011 L20-20http://csg.csail.mit.edu/6.s078 0 Concat Global History +/- Prediction Taken 2Level(PC, T) = MSB(Counter(History(0; T)||PC; T)) Concat PC

Two-Level Branch Predictor October 24, 2011 L20-21http://csg.csail.mit.edu/6.s078 Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) 00 k Fetch PC Shift in Taken/¬Taken results of each branch 2-bit global branch history shift register Taken/¬Taken?

Gshare Predictor October 24, 2011 L20-22http://csg.csail.mit.edu/6.s078 0 Concat Global History +/- Prediction Taken 2Level(PC, T) = MSB(Counter(History(0; T) PC; T)) xor PC

Choosing Predictors October 24, 2011 L20-23http://csg.csail.mit.edu/6.s078 LHist GHist Chooser Chooser = MSB(P(PC; P + (A==T) - (B==T)) or Chooser = MSB(P(GHist(PC; T); P + (A==T) - (B==T)) Prediction

Tournament Branch Predictor (Alpha 21264) Choice predictor learns whether best to use local or global branch history in predicting next branch Global history is speculatively updated but restored on mispredict Claim % success on range of applications October 24, 2011 L12-24http://csg.csail.mit.edu/6.s078 Local history table (1,024x10b) PC Local prediction (1,024x3b) Global Prediction (4,096x2b) Choice Prediction (4,096x2b) Global History (12b) Prediction