1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.
Pipelining and Control Hazards Oct
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
EECC551 - Shaaban #1 lec # 5 Spring Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC551 - Shaaban #1 lec # 7 Fall Hardware Dynamic Branch Prediction Simplest method: –A branch prediction buffer or Branch History Table.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
EECC551 - Shaaban #1 lec # 5 Winter Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Branch Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
CPE 631 Session 17 Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Dynamic Branch Prediction
Instruction-Level Parallelism Dynamic Branch Prediction
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
COMP 740: Computer Architecture and Implementation
CS 704 Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture
Module 3: Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Dynamic Hardware Branch Prediction
CPE 631: Branch Prediction
Dynamic Branch Prediction
/ Computer Architecture and Design
Pipelining and control flow
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

2Reading  HP3, Section

3 Why Do We Need Branch Prediction?  Basic blocks are short, and we have done about all we can do for them with dynamic scheduling control dependences now become the bottleneck control dependences now become the bottleneck  Since branches disrupt sequential flow of instrs… we need to be able to predict branch behavior to avoid stalling the pipeline we need to be able to predict branch behavior to avoid stalling the pipeline  What we must predict Branch outcome (Is the branch taken?) Branch outcome (Is the branch taken?) Branch Target Address (What is the next non-sequential PC value?) Branch Target Address (What is the next non-sequential PC value?)

4 A General Model of Branch Prediction T: probability of branch being taken p: fraction of branches that are predicted to be taken A: accuracy of prediction j, k, m, n: associated delays (penalties) for the four events ( n is usually 0) Branch penalty of a particular prediction method Branch predictor accuracy Branch penalties

5 Theoretical Limits of Branch Prediction Best case: branches are perfectly predicted (A = 1) also assume that n = 0 also assume that n = 0 minimum branch penalty = j*T minimum branch penalty = j*T  Let s be the pipeline stage where BTA becomes known Then j = s-1 Then j = s-1 See static prediction methods in Lecture 7 See static prediction methods in Lecture 7  Thus, performance of any branch prediction strategy is limited by s, the location of the pipeline stage that develops BTA s, the location of the pipeline stage that develops BTA A, the accuracy of the prediction A, the accuracy of the prediction

6 Review: Static Branch Prediction Methods Several static prediction strategies: Predict all branches as NOT TAKEN Predict all branches as NOT TAKEN Predict all branches as TAKEN Predict all branches as TAKEN Predict all branches with certain opcodes as TAKEN, and all others as NOT TAKEN Predict all branches with certain opcodes as TAKEN, and all others as NOT TAKEN Predict all forward branches as NOT TAKEN, and all backward branches as TAKEN Predict all forward branches as NOT TAKEN, and all backward branches as TAKEN Opcodes have default predictions, which the compiler may reverse by setting a bit in the instruction Opcodes have default predictions, which the compiler may reverse by setting a bit in the instruction Review material in Lecture 7

7 Dynamic Branch Prediction Premise: History of a branch instr’s outcome matters! whether a branch will be taken depends greatly on the way previous dynamic instances of the same branch were decided whether a branch will be taken depends greatly on the way previous dynamic instances of the same branch were decided  Dynamic prediction methods: take advantage of this fact by making their predictions dependent on the past behavior of the same branch instr take advantage of this fact by making their predictions dependent on the past behavior of the same branch instr such methods are called Branch History Table (BHT) methods such methods are called Branch History Table (BHT) methods

8 BHT Methods for Branch Prediction

9 NT A One-Bit Predictor  Predictor misses twice on typical loop branches Once at the end of loop Once at the end of loop Once at the end of the 1 st iteration of next execution of loop Once at the end of the 1 st iteration of next execution of loop  The outcome sequence NT-T-NT-T makes it miss all the time State 0 Predict Not Taken State 1 Predict Taken T T NT

10 A Two-Bit Predictor  A four-state Moore machine  Predictor misses once on typical loop branches hence popular hence popular  Outcome sequence NT-NT-T-T-NT-NT-T-T make it miss all the time NT State 2 Predict Taken State 3 Predict Taken T T NT State 0 Predict Not Taken State 1 Predict Not Taken T NT T

11 A Two-Bit Predictor  A four-state Moore machine  Predictor misses once on typical loop branches hence popular hence popular  Input sequence NT-NT-T-T-NT-NT-T-T make it miss all the time

12 Correlating Branch Outcome Predictors  The history-based branch predictors seen so far base their predictions on past history of branch that is being predicted  A completely different idea: The outcome of a branch may well be predicted successfully based on the outcome of the last k branches executed The outcome of a branch may well be predicted successfully based on the outcome of the last k branches executed  i.e., the path leading to the branch being predicted Much-quoted example from SPEC92 benchmark eqntott Much-quoted example from SPEC92 benchmark eqntott if (aa == 2) /*b1*/ aa = 0; if (bb == 2) /*b2*/ bb = 0; if (aa != bb) /*b3*/ { … } if (aa == 2) /*b1*/ aa = 0; if (bb == 2) /*b2*/ bb = 0; if (aa != bb) /*b3*/ { … } TAKEN(b1) && TAKEN(b2) implies NOT-TAKEN(b3)

13 Another Example of Branch Correlation if (d == 0) //b1 d = 1; if (d == 1) //b2... if (d == 0) //b1 d = 1; if (d == 1) //b2... Assume multiple runs of code fragment d alternates between 2 and 0 How would a 1-bit predictor initialized to state 0 behave? BNEZR1, L1 ADDIR1, R0, 1 L1: SUBIR3, R1, 1 BNEZR3, L2 … L2:

14 A Correlating Branch Predictor  Think of having a pair of 1-bit predictors [p 0, p 1 ] for each branch, where we choose between predictors (and update them) based on outcome of most recent branch (i.e., B1 for B2, and B2 for B1) if most recent br was not taken, use and update (if needed) predictor p 0 if most recent br was not taken, use and update (if needed) predictor p 0 If most recent br was taken, use and update (if needed) predictor p 1 If most recent br was taken, use and update (if needed) predictor p 1  How would such (1,1) correlating predictors behave if initialized to [0,0]?

15 Organization of (m,n) Correlating Predictor  Using the results of last m branches 2 m outcomes 2 m outcomes can be kept in m -bit shift register can be kept in m -bit shift register  n -bit “self-history” predictor  BHT addressed using m bits of global history m bits of global history  select column (particular predictor) some lower bits of branch address some lower bits of branch address  select row (particular branch instr) entry holds n previous outcomes entry holds n previous outcomes  Aliasing can occur since BHT uses only portion of branch instr address state in various predictors in single row may correspond to different branches at different points of time state in various predictors in single row may correspond to different branches at different points of time  m =0 is ordinary BHT 4 Branch address Global branch history Prediction 2-bit branch predictors 2

16 Improved Dynamic Branch Prediction  Recall that, even with perfect accuracy of prediction, branch penalty of a prediction method is (s-1)*T s is the pipeline stage where BTA is developed s is the pipeline stage where BTA is developed T is the frequency of taken branches T is the frequency of taken branches  Further improvements can be obtained only by using a cache storing BTAs, and accessing it simultaneously with the I-cache Such a cache is called a Branch Target Buffer (BTB) Such a cache is called a Branch Target Buffer (BTB)  BHT and BTB can be used together Coupled: one table holds all the information Coupled: one table holds all the information Uncoupled: two independent tables Uncoupled: two independent tables

17 Using BTB and BHT Together  Uncoupled solution BTB stores only the BTAs of taken branches recently executed BTB stores only the BTAs of taken branches recently executed No separate branch outcome prediction (the presence of an entry in BTB can be used as an implicit prediction of the branch being TAKEN next time) No separate branch outcome prediction (the presence of an entry in BTB can be used as an implicit prediction of the branch being TAKEN next time) Use the BHT in case of a BTB miss Use the BHT in case of a BTB miss  Coupled solution Stores BTAs of all branches recently executed Stores BTAs of all branches recently executed Has separate branch outcome prediction for each table entry Has separate branch outcome prediction for each table entry Use BHT in case of BTB hit Use BHT in case of BTB hit Predict NOT TAKEN otherwise Predict NOT TAKEN otherwise

18 Parameters of Real Machines

19 Coupled BTB and BHT

20 Decoupled BTB and BHT

21 Reducing Misprediction Penalties  Need to recover whenever branch prediction is not correct Discard all speculatively executed instructions Discard all speculatively executed instructions Resume execution along alternative path (this is the costly step) Resume execution along alternative path (this is the costly step) Scenarios where recovery is needed Scenarios where recovery is needed  Predict taken, branch is taken, BTA wrong (case 7)  Predict taken, branch is not taken (cases 4 and 6)  Predict not taken, branch is taken (case 3)  Preparing for recovery involves working on alternative parh On instruction level On instruction level  Two fetch address registers per speculated branch (PPC 603 & 640)  Two instruction buffers (IBM 360/91, SuperSPARC, Pentium) On I-cache level On I-cache level  For PT, also do next-line prefetching  For PNT, also do target-line prefetching

22 Predicting Dynamic BTAs  Vast majority of dynamic BTAs come from procedure returns (85% for SPEC95)  Since procedure call-return for the most part follows a stack discipline, a specialized return address buffer operated as a stack is appropriate for high prediction accuracy Pushes return address on call Pushes return address on call Pops return address on return Pops return address on return Depth of RAS should be as large as maximum call depth to avoid mispredictions Depth of RAS should be as large as maximum call depth to avoid mispredictions 8-16 elements generally sufficient 8-16 elements generally sufficient