Download presentation
Presentation is loading. Please wait.
1
Combining Branch Predictors
CS Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993
2
Bimodal Branch Prediction
Identifies most popular prediction in recent past Updates happen during commit 1 PC 10-bit index 1024 entries 2-bit saturating counters
3
Results SPEC’89 programs simulated for 10M instrs
(modern studies use hard-to-predict programs) A larger predictor reduces contention for counters Prediction rates saturate at 93.5% (at 2K bytes) (Fig.3)
4
Local Predictors Two-Level predictor: The first level has history,
the second level has saturating counters History gets updated immediately 1 1 1 PC 1 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table
5
Results For small predictors, there could be contention
at both levels, resulting in inaccurate predictions Will also take longer to warm up – after every context switch Does very well for large predictors – saturates at 97.1%
6
Global Predictors A single history register – neighboring branches
have correlated results However, the PC is not used 1 1024 entries 10-bit global history 2-bit saturating counters
7
Do We Need PC? Note that the global history reveals which branch
is being examined Hence, it outdoes bimodal predictors when the transistor budget is large (Fig.7) Local predictor does better – it is more important to identify the PC and local history than behavior of neighboring branches
8
Gselect Use a combination of PC and global history
Bimodal and global prediction are special cases (Fig.9) 1 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters
9
GShare Xor-ing 10 history bits and 10 PC bits has more
info than the concatenation of 5 bits of each and more info than each individual component Branch Address Global History Gselect 4/4 Gshare 8/8
10
Terminology GAG: Global history indexes into global array
of saturating counters PAG: Per-address history indexes into global array GAP: Global history indexes into each PC’s private array of counters (gselect) PAP: Per-address history indexes into each PC’s private array of counters
11
Trade-Offs Some predictors warm-up faster than others
Some programs benefit from global history, some from local history Some programs have branches that interfere with each other Note that a 64KB local predictor has fewer saturating counters than a 64KB bimodal predictor – the former won’t be better for every program
12
Combining Predictors Use an array of saturating counters to pick the
best available predictor for each PC Predictor A 1 PC 1024 entries Predictor B 2-bit saturating counters
13
Results The combination of local and gshare increases
the prediction accuracy to 98.1% (Fig.16) For smaller transistor budgets, the combination of bimodal and gshare is better (gshare is twice the size to make sure the total is a power of two) A 1KB combined predictor does as well as a 16KB gselect predictor
14
Future Work Detect conflicts, correlations, and common
predictions through profiling/compiler analysis Functions that compress information in history or PC Pipeline predictions – predict two branches ahead Hierarchical predictors – get a quick prediction in a cycle and a more accurate one two cycles later
15
Next Week’s Paper “Design Trade-Offs for the Alpha EV8 Conditional
Branch Predictor”, Seznec et al., ISCA’02
16
Title Bullet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.