Download presentation
Presentation is loading. Please wait.
Published byGwenda Fisher Modified over 6 years ago
1
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
2
Topic Branch Prediction
3
Bimodal Branch Prediction
PC Address 2N entries (each entry has a 2 bit counter) 1 N bits . table update 2N entries addressed by N-bit PC Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter FSM Update Logic Actual outcome Prediction
4
Gshare Branch Predictor
PHT PC Address 1 . 00 1 . Global BHR MSB = 0 Predict Not Taken
5
Pattern History Table 2N entries addressed by N-bit BHR
Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter Can be initialized in alternate patterns (01, 10, 01, 10, ..) Alias (or interference) problem
6
Idea: Track the History of a Branch
Previous Outcome PC Counter if prev=0 1 3 Counter if prev=1 1 3 3 prev = 1 prediction = N prev = 3 prediction = T prev = 3 1 prediction = N 1 WN SN 2 WT 3 ST prev = 3 prediction = T
7
Idea: Track the History of a Branch
Previous Outcome PC Counter if prev=0 1 3 Counter if prev=1 1 3 1 WN SN 2 WT 3 ST 1 prev = 1 3 prediction = T 3 2 prev = T prediction = 3 2 prev = 1 prediction = T 3 prev = 1 prediction = T
8
Deeper History Covers More Patterns
Last 3 Outcomes Counter if prev=000 Counter if prev=001 PC Counter if prev=010 1 3 1 3 2 2 1 Counter if prev=111 What pattern has this branch predictor entry learned? History Prediction 001 1 011 110 100 1 … (0011)*
9
Tournament Predictors
No predictor is clearly the best Different branches exhibit different behaviors Some “constant”, some global, some local Idea: Let’s have a predictor to predict which predictor will predict better
10
Tournament Hybrid Predictors
Meta- Predictor table of 2-/3-bit counters Pred0 Pred1 Meta Update --- Inc Dec Final Prediction If meta-counter MSB = 0, use pred0 else use pred1
11
Hybrid Branch Predictor [McFarling93]
Branch PC P0 P1 . Final Prediction Choice (or Meta) Predictor Some branches correlated to global history, some correlated to local history Only update the meta-predictor when 2 predictors disagree
12
Compaq Alpha 21264
13
Compaq Alpha 21264 Four-issue superscalar Out of order execution
Speculative execution Branch predictors Average branch misprediction penalty is 11 cycles
14
Compaq Alpha 21264 - Global Predictor
Global Prediction Global predictor has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2-bit predictor 12-bit pattern: ith bit 0 => ith prior branch not taken; => ith prior branch taken; Branch history register State 4096 x 2
15
Per Branch Pattern History Table (Two-Level Predictor)
first level - find history (pattern) 2nd level - predict branch for that pattern Correlating predictors Differs from global pattern history table as each branch has it’s own private history BHT PHT PC State 110110 Prediction 2 bit saturating counters 110110 15
16
Compaq Alpha 21264 – Local Predictor
Local History Table (1024x10) Local Prediction (1024x3) PC Local predictor (2-level predictor): Top level a local history table consisting of bit entries; each 10-bit entry corresponds to the most recent 10 branch outcomes for the entry. 10-bit history allows patterns 10 branches to be discovered and predicted. Next level Selected entry from the local history table is used to index a table of 1K entries consisting a 3-bit saturating counters, which provide the local prediction
17
Local History Table (1024x10)
Compaq Alpha 21264 Local History Table (1024x10) Local Prediction (1024x3) Global Prediction (4096x2) PC Local: previous executions of this branch Global: previous execution of all branches Tournament predictor Choice Prediction (4096x2) Path History prediction 4K 2-bit counters to choose from among a global predictor and a local predictor Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits! (~180,000 transistors) 17
18
Branch Prediction Accuracy
Static branch prediction (compiler) - 70% Per branch 2-bit saturating counters (no history) - 85% Two-level predictor (with history) % accuracy Tournament predictor – a little more accurate than Two-level 18
19
Accuracy v. Size (SPEC89) Slide from David Culler
20
HW2 Programming - Branch Predictor
class local_predictor : public branch_predictor { public: local_update u; local_predictor (void) { } branch_update *predict (branch_info & b) { u.direction_prediction (true); u.target_prediction (0); return &u; void update (branch_update *u, bool taken, unsigned int target){ };
21
Gshare Implementation
class gshare_predictor : public branch_predictor { public: #define HISTORY_LENGTH 15 #define TABLE_BITS 15 gshare_update u; branch_info bi; unsigned int history; unsigned char tab[1<<TABLE_BITS]; gshare_predictor (void) : history(0) { memset (tab, 0, sizeof (tab)); } branch_update *predict (branch_info & b) { bi = b; if (b.br_flags & BR_CONDITIONAL) { u.index = (history << (TABLE_BITS - HISTORY_LENGTH)) ^ (b.address & ((1<<TABLE_BITS)-1)); u.direction_prediction (tab[u.index] >> 1); } else { u.direction_prediction (true); u.target_prediction (0); return &u;
22
Gshare Implementation
void update (branch_update *u, bool taken, unsigned int target) { if (bi.br_flags & BR_CONDITIONAL) { unsigned char *c = &tab[((gshare_update*)u)->index]; if (taken) { if (*c < 3) (*c)++; } else { if (*c > 0) (*c)--; } history <<= 1; history |= taken; history &= (1<<HISTORY_LENGTH)-1;
23
Pentium M
24
Hybrid branch outcome predictors
Branch Prediction Hybrid branch outcome predictors Bimodal predictor Global predictor Loop predictor Branch target predictors BTB iBTB
25
Reverse Engineering
26
Branch Prediction in Pentium M
27
Branch Prediction in Pentium M
28
Bimodal Predictor A table of Bimodal counters – 4096 counters
Indexed by the IP address bits [11:0]
29
Global Predictor A 4-way cache structure with 2048 entries
Accessed with the hash function - PIR XOR conditional branch IP Resultant 9 bits are used as the index, 6 bits as the tag in the Global predictor PIR Organization PIR is the same PIR as the iBTB PIR
30
PIR Organization Width – 15 bits
Affected by the 15 bits of the conditional taken branch IP address Affected by the 15 bits combined from the indirect branch IP address and the indirect branch target address. PIR is shifted for two bits left prior to update (XOR) with the newly occurred program branch. Unconditional, Conditional Not taken and Call/Returns branches do not affect the PIR
31
Hash Function
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.