Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012
Motivation Branch Prediction Simulation Setup & Testing Methodology Dynamic Branch Prediction Single Bit Saturating Counter Two Bit Saturating Counter Two Level Local Branch History & Single Bit Prediction Two Level Local Branch History & Two Bit Prediction Comparison of Performances Conclusion Future Work Outline
Why Branch Prediction? Branches (Conditional & Un-conditional) redirect the stream of instructions – results in dead cycles in the front-end Branch Cost increases with – Super-pipeline – delays the branch resolution e.g. Pentium 3 & 4 have 10 and 20 cycles penalty respectively Super-scalar – multiplies the dead instructions e.g. 6-stage MIPS pipe has 3 and 7 dead instructions in their one way and two way implementations respectively
Branch Prediction Minimizes the dead cycles generated by a “taken” branch Essential in modern processors to restore the IPC Two components of prediction – Direction/Outcome of branch (applies to conditional branches only) Target of branch (applies to all branches)
Simulation Setup & Testing Methodology 5 Stage MIPS pipeline Parcv2 instruction set Pv2byp – configuration from Lab Own Assembly Test Micro-benchmarks from Lab Vector-vector Add Complex Multiply Binary Search Masked Filter
Pv2Byp Pipeline Target address of J and JAL known at D stage Target address of JR and JALR known at X stage Branch direction/outcome known at X stage D X M W
Dynamic Branch Prediction Performance = f(accuracy, cost of misprediction) One Level Predictor – Bimodal Prediction Branch History Table Branch Target Buffer Two level Predictor Branch History Register Table Pattern History Table Branch Target Buffer All the tables are read at the F stage for prediction All the tables are written in either D or X stage (depending on the resolution of the branch and correctness of prediction
Hardware Description BHT Indexed by the lower bits of PC Holds the prediction bit(s) (1 or 2) BHR Indexed by lower bits of PC Holds the local branch history bits PHT Indexed by entries of BHR bits Holds the prediction bit(s) BTB Indexed by lower bits of PC Holds the rest of the bits of PC as tag Holds the branch target PC Holds a valid bit for two level predictor
Hardware Description Predict BitsValidTagTarget PC[bht_IndexSize+1:2] PC[btb_IndexSize+1:2] BHTBTB = PC[31:btb_IndexSize+2] BTB Hit
One Bit Saturating Counter Exploits Temporal Correlation between two states – T and NT Always two mispredicts in a backward branch loop Predict T Predict NT T
Two bit Saturating Counter Needs two consecutive T/NT to change prediction state Tolerates one branch going unusual direction, still predicts next branch correctly Works better than One bit Counter in a nested loop Predict T Predict T NT T Predict NT Predict NT Strong Taken Weak Taken Weak Not taken Strong Not taken
Two level Branch Predictor [Yeh & Patt, ’92] Many branches execute repetitive patterns Local/Current branch history patterns Requires Initial settling of counter values 111……….01 S index Pattern History Bit(s) FSM Logic Prediction Bit Branch Result from X stage BHR PHT
Comparison of Performance
Effect of BTB Size 1 Level 2 Bit
Effect of PHT Size 2 Level 2 Bit
Conclusion Predictor Size – Hardware Cost – Better Prediction Accuracy Larger BHTs – Smaller BTBs – Reduces Hardware cost – Reuses branch history even if the entry is not present in BTB Smaller BHTs – Multiple branches alias – degraded prediction All branches reach unique BHT entry – Accuracy saturates BHR width must capture the repetitive pattern in two level predictor – Otherwise performs worse than bimodal scheme
Future Work Global Branch Prediction – Data dependent correlation – nested loops Gshare and Gselect Extending to two way superscalar – Pv2ssc
Thank You! Q & A
Backup Slides