COSC3330 Computer Architecture Lecture 15. Branch Prediction

Slides:



Advertisements
Similar presentations
Pipelining V Topics Branch prediction State machine design Systems I.
Advertisements

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Dynamic Branch Prediction
Copyright 2001 UCB & Morgan Kaufmann ECE668.1 Adapted from Patterson, Katz and Culler © UCB Csaba Andras Moritz UNIVERSITY OF MASSACHUSETTS Dept. of Electrical.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
EECC551 - Shaaban #1 lec # 5 Spring Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EECC551 - Shaaban #1 lec # 5 Fall Static Conditional Branch Prediction Branch prediction schemes can be classified into static (at compilation.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Branch Target Buffers BPB: Tag + Prediction
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
Branch Prediction CSE 4711 Branch statistics Branches occur every 4-7 instructions on average in integer programs, commercial and desktop applications;
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Computer Architecture 2012 – advanced branch prediction 1 Computer Architecture Advanced Branch Prediction By Dan Tsafrir, 21/5/2012 Presentation based.
Lecture 3. Branch Prediction Prof. Taeweon Suh Computer Science Education Korea University COM506 Computer Design.
Analysis of Branch Predictors
Microbenchmarks and Mechanisms For Reverse Engineering Of Modern Branch Predictor Units Vladimir Uzelac Master’s Thesis.
Korea UniversityG. Lee CRE652 Processor Architecture Dynamic Branch Prediction.
Computer Structure Advanced Branch Prediction
Computer Architecture 2015 – Advanced Branch Prediction 1 Computer Architecture Advanced Branch Prediction By Yoav Etsion and Dan Tsafrir Presentation.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Dynamic Branch Prediction
CSL718 : Pipelined Processors
COSC3330 Computer Architecture Lecture 14. Branch Prediction
Lecture: Out-of-order Processors
COSC6385 Advanced Computer Architecture Lecture 9. Branch Prediction
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Computer Architecture Advanced Branch Prediction
UNIVERSITY OF MASSACHUSETTS Dept
CS5100 Advanced Computer Architecture Advanced Branch Prediction
Samira Khan University of Virginia Dec 4, 2017
CMSC 611: Advanced Computer Architecture
So far we have dealt with control hazards in instruction pipelines by:
CPE 631: Branch Prediction
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Lecture: Branch Prediction
Dynamic Branch Prediction
Pipelining and control flow
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Samira Khan University of Virginia Mar 6, 2019
Computer Structure Advanced Branch Prediction
Lecture 7: Branch Prediction, Dynamic ILP
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

COSC3330 Computer Architecture Lecture 15. Branch Prediction Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston

Topic Branch Prediction

Bimodal Branch Prediction PC Address 2N entries (each entry has a 2 bit counter) 1 . . . . . N bits . table update 2N entries addressed by N-bit PC Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter FSM Update Logic Actual outcome Prediction

Gshare Branch Predictor PHT PC Address 1 . . . . . .  00 1 . . . . . . Global BHR MSB = 0 Predict Not Taken

Pattern History Table 2N entries addressed by N-bit BHR Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter Can be initialized in alternate patterns (01, 10, 01, 10, ..) Alias (or interference) problem

Idea: Track the History of a Branch Previous Outcome PC Counter if prev=0 1 3 Counter if prev=1 1 3 3 prev = 1 prediction = N  prev = 3 prediction = T  prev = 3 1 prediction = N  1 WN SN 2 WT 3 ST prev = 3 prediction = T 

Idea: Track the History of a Branch Previous Outcome PC Counter if prev=0 1 3 Counter if prev=1 1 3 1 WN SN 2 WT 3 ST 1 prev = 1 3 prediction =  T 3 2 prev = T prediction =  3 2 prev = 1 prediction = T  3 prev = 1 prediction = T

Deeper History Covers More Patterns Last 3 Outcomes Counter if prev=000 Counter if prev=001 PC Counter if prev=010 1 3 1 3 2 2 1 Counter if prev=111 What pattern has this branch predictor entry learned? History Prediction 001 1 011 110 100 1 00110011001… (0011)*

Tournament Predictors No predictor is clearly the best Different branches exhibit different behaviors Some “constant”, some global, some local Idea: Let’s have a predictor to predict which predictor will predict better 

Tournament Hybrid Predictors Meta- Predictor table of 2-/3-bit counters Pred0 Pred1 Meta Update  ---  Inc Dec Final Prediction If meta-counter MSB = 0, use pred0 else use pred1

Hybrid Branch Predictor [McFarling93] Branch PC P0 P1 . Final Prediction Choice (or Meta) Predictor Some branches correlated to global history, some correlated to local history Only update the meta-predictor when 2 predictors disagree

Compaq Alpha 21264

Compaq Alpha 21264 Four-issue superscalar Out of order execution Speculative execution Branch predictors Average branch misprediction penalty is 11 cycles

Compaq Alpha 21264 - Global Predictor Global Prediction Global predictor has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2-bit predictor 12-bit pattern: ith bit 0 => ith prior branch not taken; 1 => ith prior branch taken; Branch history register 101101101101 State 4096 x 2

Per Branch Pattern History Table (Two-Level Predictor) first level - find history (pattern) 2nd level - predict branch for that pattern Correlating predictors Differs from global pattern history table as each branch has it’s own private history BHT PHT PC State 110110 Prediction 2 bit saturating counters 110110 15

Compaq Alpha 21264 – Local Predictor Local History Table (1024x10) Local Prediction (1024x3) PC Local predictor (2-level predictor): Top level a local history table consisting of 1024 10-bit entries; each 10-bit entry corresponds to the most recent 10 branch outcomes for the entry. 10-bit history allows patterns 10 branches to be discovered and predicted. Next level Selected entry from the local history table is used to index a table of 1K entries consisting a 3-bit saturating counters, which provide the local prediction

Local History Table (1024x10) Compaq Alpha 21264 Local History Table (1024x10) Local Prediction (1024x3) Global Prediction (4096x2) PC Local: previous executions of this branch Global: previous execution of all branches Tournament predictor Choice Prediction (4096x2) Path History prediction 4K 2-bit counters to choose from among a global predictor and a local predictor Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits! (~180,000 transistors) 17

Branch Prediction Accuracy Static branch prediction (compiler) - 70% Per branch 2-bit saturating counters (no history) - 85% Two-level predictor (with history) - 90-95% accuracy Tournament predictor – a little more accurate than Two-level 18

Accuracy v. Size (SPEC89) Slide from David Culler

HW2 Programming - Branch Predictor class local_predictor : public branch_predictor { public: local_update u; local_predictor (void) { } branch_update *predict (branch_info & b) { u.direction_prediction (true); u.target_prediction (0); return &u; void update (branch_update *u, bool taken, unsigned int target){ };

Gshare Implementation class gshare_predictor : public branch_predictor { public: #define HISTORY_LENGTH 15 #define TABLE_BITS 15 gshare_update u; branch_info bi; unsigned int history; unsigned char tab[1<<TABLE_BITS]; gshare_predictor (void) : history(0) { memset (tab, 0, sizeof (tab)); } branch_update *predict (branch_info & b) { bi = b; if (b.br_flags & BR_CONDITIONAL) { u.index = (history << (TABLE_BITS - HISTORY_LENGTH)) ^ (b.address & ((1<<TABLE_BITS)-1)); u.direction_prediction (tab[u.index] >> 1); } else { u.direction_prediction (true); u.target_prediction (0); return &u;

Gshare Implementation void update (branch_update *u, bool taken, unsigned int target) { if (bi.br_flags & BR_CONDITIONAL) { unsigned char *c = &tab[((gshare_update*)u)->index]; if (taken) { if (*c < 3) (*c)++; } else { if (*c > 0) (*c)--; } history <<= 1; history |= taken; history &= (1<<HISTORY_LENGTH)-1;

Pentium M

Hybrid branch outcome predictors Branch Prediction Hybrid branch outcome predictors Bimodal predictor Global predictor Loop predictor Branch target predictors BTB iBTB

Reverse Engineering

Branch Prediction in Pentium M

Branch Prediction in Pentium M

Bimodal Predictor A table of Bimodal counters – 4096 counters Indexed by the IP address bits [11:0]

Global Predictor A 4-way cache structure with 2048 entries Accessed with the hash function - PIR XOR conditional branch IP Resultant 9 bits are used as the index, 6 bits as the tag in the Global predictor PIR Organization PIR is the same PIR as the iBTB PIR

PIR Organization Width – 15 bits Affected by the 15 bits of the conditional taken branch IP address Affected by the 15 bits combined from the indirect branch IP address and the indirect branch target address. PIR is shifted for two bits left prior to update (XOR) with the newly occurred program branch. Unconditional, Conditional Not taken and Call/Returns branches do not affect the PIR

Hash Function