Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.

1 Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker

2 Outline  Overview & Motivation  Hybrid Branch Prediction  The Prophet/Critic Branch Predictor  Results  Conclusions

3 Overview  Better Branch Prediction is a highly desirable technique because it does not require trade- offs between performance, power, and energy  Despite much research on Branch Prediction, it is by no means solved  Branch prediction is liable to become even more important as pipelines deepen and issue-widths increase

4 Overview (continued)  Issue width = # of uOps (micro ops) issued per clock cycle  Using the Branch Target Buffer to look ahead for branches  Perceptrons – simple neural network. Can look at a longer history than simpler counters.

5 Hybrid Branch Predictors Use two or more different branch prediction techniques One may override another (either based on a third selector or one may always override) The predictions may be combined, for instance as a majority vote Example: Tournament predictors (often a branch prediction buffer and a Correlating branch predictor with a third predictor choosing which of the two is used in this situation)

6 Prophet/Critic Branch Predictor  Basic Idea – the Prophet makes a series of predictions of future branches, the Critic critiques and if necessary alters them.  The Prophet makes predictions based on the history of the branch.  The Critic looks at the branches prediction of the Prophet after the prophet has predicted a certain number of steps ahead, then critiques the prophecy

7 Prophet Critic Basics  BTB looks ahead, Prophet predicts whether BTB branches are taken/not taken  Branch Outcome Register – the predictions that are in the critics “future” - the number of future bits allowed.  More future bits allow for more accurate viewing of the future

8 Prophet Critic Basics

9  The branch predictions are kept in the Fetch Target Queue (FTQ)  Once the future bits are received, the critic makes it’s pronouncement  The critic overrides the prophet if it comes to a different conclusion  If so, the FTQ is purged of un-critiqued predictions, and the prophet is redirected to the path shown by the critic

10 Prophet/Critic Architecture (cont)

11 Prophet/Critic: How it works

12 Prophet Critic: How it works  The Prophet will mispredict A  The Critic will note that the Prophet mispredicted in this case the first time the misprediction occurs.  In the future, when the Critic sees the misprediction, it will correct it  More future bits increase the accuracy of prediction, but reduce the history, so there is an important tradeoff here.

13 Prophet/Critic Filtering  The critic can be limited by multiple branches contending for the same resources  In addition, the critic is not always correct  So, easy to predict branches should be filtered out  This is achieved with tags that are set when a mispredict occcurs  If not tagged, the critic’s critique is ignored.

14 Prophet/Critic Filtering

15 Testing and Result  Testing was done on a cycle accurate IA32 with Long Instruction Traces  The simulator had to follow bad branches, as otherwise the critic would not learn.  Branch Predictors: Gshare, 2bc-gskew, perceptron  uPC is uOps per cycle  misp/kuops is misses per thousand uops

16 Hardware Budgets and predictor types

17 Results of varying Future Bits

18 Results



21 Conclusions  Speedup of up to 8% with 12 future bits (using the same amount of branch prediction space)  The mispredict rate can be reduced up to 25- 31%  Adding future bits helps, but more is not always better  Research suggests that the best future bits can be chosen dynamically

22 Thank you Any questions?

