Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 23, 2012L20-1
NA pred with decode feedback April 23, 2012 L20-2http://csg.csail.mit.edu/6.S078 F Fetch D Decode R Reg Read X Execute M Memory W Write- back Next Address Prediction Direction Prediction
Direction prediction recipe Execute Send redirects on mispredicts (unchanged) Send direction prediction training Decode Check if next address matches direction pred Send redirect if different (update naPred) Fetch Generate prediction Learn from feedback Accept redirects from later stages April 23, 2012 L20-3http://csg.csail.mit.edu/6.S078
Epoch management recipe Execute On exec epoch mismatch - poison instruction Otherwise, On mispredict – change exec epoch and redirect. Decode On new exec epoch – update local exec/decode epochs Otherwise, On decode epoch mismatch – drop instruction If not dropped, On next addr mispredict – change decode epoch and redirect. Fetch On exec redirect – update local exec epoch On decode redirect – if for current exec epoch then update local decode epoch April 18, 2012 L20-4http://csg.csail.mit.edu/6.S078
Add direction feedback typedef struct { Bool correct; NaInfo naPredInfo; Addr nextAddr; DirInfo dirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback <- mkFIFOF; April 23, 2012 L20-5http://csg.csail.mit.edu/6.S078 Feedback needs information for training direction predictor Execute epoch Decode epoch Execute epoch
Execute (branch analysis) // after executing instruction... let nextEeEpoch = eeEpoch; let cond = execData.execInst.cond; let nextPc = cond?execData.execInst.addr : execData.pc+4; let correctPred = (nextPC == execData.nextAddrPred); if (!correctPred) nextEeEpoch += 1; eeEpoch <= nextEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: correctPred, taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage April 23, 2012 L20-6http://csg.csail.mit.edu/6.S078 Note: may have been reset in decode Always send feedback
Decode with mispredict detect rule doDecode; let decData = newDecData(fr.first); let correctPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); let instResp = decData.fInst.instResp; let pcPlus4 = decData.pc+4; if (correctPath) begin decData.decInst = decode(instResp, pcPlus4); let target = knownTargetAddr(decData.decInst); let brClass = getBrClass(decData.decInst); let predTarget = decData.nextAddrPred; let predDir = decData.dirPred; April 23, 2012 L20-7http://csg.csail.mit.edu/6.S078 Determine if epoch of incoming instruction is on good path New exec epoch Same dec epoch
Decode with mispredict detect let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default: decData.nextAddrPred; endcase; if (decodedTarget != predTarget) begin decData.decEpoch = decData.decEpoch + 1; decData.nextAddrPred = decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path April 23, 2012 L20-8http://csg.csail.mit.edu/6.S078 Wrong next addr? Tell exec addr of next instruction! Send feedback New dec epoch Enqueue to next stage on correct path Calculate target as best as decode can
Decode with mispredict detect else begin // incorrect path decData.decEpoch = ddEpoch; decData.execEpoch = deEpoch; end ddEpoch <= decData.decEpoch; deEpoch <= decData.execEpoch; fr.deq; endrule April 23, 2012 L20-9http://csg.csail.mit.edu/6.S078 Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them.
Integration into Fetch rule doFetch(); function Action enqInst(); action let d <- mem.side(MemReq{op: Ld, addr: fetchPC, data:?}; match {.nAddrPred,.naPredInfo}<-naPred.predict(fetchPc); match {.dirPred,.dirPredInfo}<-dirPred.predict(fetchPc); FBundle fInst = FBundle{instResp: d}; FData fData = FData{pc: fetchPc, fInst: fInst, inum: iNum, execEpoch: feEpoch, naPredInfo:naPredInfo, nextAddrPred:nAddrPred, dirPredInfo:dirPredInfo, dirPred:dirPred }; iNum <= iNum + 1; fetchPc <= nAddrPred; fr.enq(fData); endaction endfunction April 18, 2012 L20-10http://csg.csail.mit.edu/6.S078
Handling redirect from execute if (execFeedback.notEmpty) begin match {.execEpoch,.fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc <= feedback.nextAddr; end else begin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end April 23, 2012 L20-11http://csg.csail.mit.edu/6.S078 Train and repair on redirect Just train on correct prediction
Handling redirect from decode else if (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch,.decEpoch,.fb} = decFeedback.first; if (execEpoch == feEpoch) begin if (!fb.correct) begin // epoch unchanged fdEpoch <= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc <= feedback.nextAddr; end else // dec feedback on correct prediction enqInst; end else // dec feedback, but fetch is in new exec epoch enqInst; else // no feedback enqInst; April 23, 2012 L20-12http://csg.csail.mit.edu/6.S078 Just repair never train on feedback from decode
Immediate update issues If the direction director does not update immediately on predictions things are easy. But if the predictor updates, we will predict and update the predictor on non- branches. Possible solutions: Move direction prediction to decode, so we know not to update on non-branches. But makes timing more critical. Simply use direction predictor even on non-branch instructions. Note: for superscaler issue designs this is a less significant problem. April 23, 2012 L20-13http://csg.csail.mit.edu/6.S078 Note: In the lab code we communicate the branch type of each instruction to allow training and repair to decide if they want to perform updates or not based on instruction type.
Predictor Primitive Indexed table holding values Operations Predict Update Algebraic notation Prediction = P[Width, Depth](Index; Update) October 24, 2011 L20-14http://csg.csail.mit.edu/6.s078 Index Prediction Update Depth Width P UI
One-bit Predictor October 24, 2011 L20-15http://csg.csail.mit.edu/6.s078 PC Taken Prediction A21064(PC; T) = P[ 1, 2K ](PC; T) P U I 1 bit What happens on loop branches? At best, mispredicts twice for every use of loop. Simple temporal prediction
Two-bit Predictor October 24, 2011 L20-16http://csg.csail.mit.edu/6.s078 PC +/- Adder Taken Prediction Counter[W,D](I; T) = P[W, D](I; if T then P+1 else P-1) A21164(PC; T) = MSB(Counter[2, 2K](PC; T)) P U I 2 bits
History Register October 24, 2011 L20-17http://csg.csail.mit.edu/6.s078 PC Concatenate Taken History History(PC, T) = P(PC; P || T) P U I
Global History October 24, 2011 L20-18http://csg.csail.mit.edu/6.s078 GHist(;T) = MSB(Counter(History(0, T); T)) Ind-Ghist(PC;T) = MSB(Counter(PC || Hist(GHist(;T);T))) Taken 0 Concat Global History +/- Prediction Can we take advantage of a pattern at a particular PC?
Local History October 24, 2011 L20-19http://csg.csail.mit.edu/6.s078 PC Concat Local History +/- Prediction Taken LHist(PC, T) = MSB(Counter(History(PC; T); T)) Can we take advantage of the global pattern at a particular PC?
Two-level Predictor October 24, 2011 L20-20http://csg.csail.mit.edu/6.s078 0 Concat Global History +/- Prediction Taken 2Level(PC, T) = MSB(Counter(History(0; T)||PC; T)) Concat PC
Two-Level Branch Predictor October 24, 2011 L20-21http://csg.csail.mit.edu/6.s078 Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) 00 k Fetch PC Shift in Taken/¬Taken results of each branch 2-bit global branch history shift register Taken/¬Taken?
Gshare Predictor October 24, 2011 L20-22http://csg.csail.mit.edu/6.s078 0 Concat Global History +/- Prediction Taken 2Level(PC, T) = MSB(Counter(History(0; T) PC; T)) xor PC
Choosing Predictors October 24, 2011 L20-23http://csg.csail.mit.edu/6.s078 LHist GHist Chooser Chooser = MSB(P(PC; P + (A==T) - (B==T)) or Chooser = MSB(P(GHist(PC; T); P + (A==T) - (B==T)) Prediction
Tournament Branch Predictor (Alpha 21264) Choice predictor learns whether best to use local or global branch history in predicting next branch Global history is speculatively updated but restored on mispredict Claim % success on range of applications October 24, 2011 L12-24http://csg.csail.mit.edu/6.s078 Local history table (1,024x10b) PC Local prediction (1,024x3b) Global Prediction (4,096x2b) Choice Prediction (4,096x2b) Global History (12b) Prediction