Combining Branch Predictors

Combining Branch Predictors
CS Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993

Bimodal Branch Prediction
Identifies most popular prediction in recent past Updates happen during commit 1 PC 10-bit index 1024 entries 2-bit saturating counters

Results SPEC’89 programs simulated for 10M instrs
(modern studies use hard-to-predict programs) A larger predictor reduces contention for counters Prediction rates saturate at 93.5% (at 2K bytes) (Fig.3)

Local Predictors Two-Level predictor: The first level has history,
the second level has saturating counters History gets updated immediately 1 1 1 PC 1 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table

Results For small predictors, there could be contention
at both levels, resulting in inaccurate predictions Will also take longer to warm up – after every context switch Does very well for large predictors – saturates at 97.1%

Global Predictors A single history register – neighboring branches
have correlated results However, the PC is not used 1 1024 entries 10-bit global history 2-bit saturating counters

Do We Need PC? Note that the global history reveals which branch
is being examined Hence, it outdoes bimodal predictors when the transistor budget is large (Fig.7) Local predictor does better – it is more important to identify the PC and local history than behavior of neighboring branches

Gselect Use a combination of PC and global history
Bimodal and global prediction are special cases (Fig.9) 1 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters

GShare Xor-ing 10 history bits and 10 PC bits has more
info than the concatenation of 5 bits of each and more info than each individual component Branch Address Global History Gselect 4/4 Gshare 8/8

Terminology GAG: Global history indexes into global array
of saturating counters PAG: Per-address history indexes into global array GAP: Global history indexes into each PC’s private array of counters (gselect) PAP: Per-address history indexes into each PC’s private array of counters

Trade-Offs Some predictors warm-up faster than others
Some programs benefit from global history, some from local history Some programs have branches that interfere with each other Note that a 64KB local predictor has fewer saturating counters than a 64KB bimodal predictor – the former won’t be better for every program

Combining Predictors Use an array of saturating counters to pick the
best available predictor for each PC Predictor A 1 PC 1024 entries Predictor B 2-bit saturating counters

Results The combination of local and gshare increases
the prediction accuracy to 98.1% (Fig.16) For smaller transistor budgets, the combination of bimodal and gshare is better (gshare is twice the size to make sure the total is a power of two) A 1KB combined predictor does as well as a 16KB gselect predictor

Future Work Detect conflicts, correlations, and common
predictions through profiling/compiler analysis Functions that compress information in history or PC Pipeline predictions – predict two branches ahead Hierarchical predictors – get a quick prediction in a cycle and a more accurate one two cycles later

Next Week’s Paper “Design Trade-Offs for the Alpha EV8 Conditional
Branch Predictor”, Seznec et al., ISCA’02

Title Bullet

Combining Branch Predictors

Similar presentations

Presentation on theme: "Combining Branch Predictors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Combining Branch Predictors

Similar presentations

Presentation on theme: "Combining Branch Predictors"— Presentation transcript:

Similar presentations

About project

Feedback