1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections 2.3-2.6)

Slides:

Advertisements

Similar presentations

Pipelining V Topics Branch prediction State machine design Systems I.

Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Pipelining and Control Hazards Oct

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Goal: Reduce the Penalty of Control Hazards

Branch Target Buffers BPB: Tag + Prediction

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

Branch Prediction Dimitris Karteris Rafael Pasvantidιs.

Dynamic Branch Prediction

EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

Spring 2003CSE P5481 Control Hazard Review The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Branch Prediction CSE 4711 Branch statistics Branches occur every 4-7 instructions on average in integer programs, commercial and desktop applications;

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Korea UniversityG. Lee CRE652 Processor Architecture Dynamic Branch Prediction.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Dynamic Branch Prediction

Lecture: Out-of-order Processors

CS203 – Advanced Computer Architecture

Lecture: Branch Prediction

Lecture: Branch Prediction

Lecture: Out-of-order Processors

Lecture 6: Advanced Pipelines

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 6: Static ILP, Branch prediction

Lecture 18: Pipelining Today’s topics:

Lecture: Static ILP, Branch Prediction

Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.

Lecture 18: Pipelining Today’s topics:

Lecture: Branch Prediction

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Loop Unrolling Determine loop unrolling useful by finding that loop iterations were independent Determine address offsets for different loads/stores Increases.

Lecture 10: Branch Prediction and Instruction Delivery

Lecture 20: OOO, Memory Hierarchy

Lecture 20: OOO, Memory Hierarchy

Lecture 18: Pipelining Today’s topics:

Dynamic Hardware Prediction

Lecture 10: ILP Innovations

Lecture 9: ILP Innovations

Lecture 9: Dynamic ILP Topics: out-of-order processors

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

2 Control Hazards In the 5-stage in-order processor: assume always taken or assume always not taken; if the branch goes the other way, squash mis-fetched instructions (momentarily, forget about branch delay slots) Modern in-order and out-of-order processors: dynamic branch prediction; instead of a default not-taken assumption, either predict not-taken, or predict taken-to-X, or predict taken-to-Y Branch predictor: a cache of recent branch outcomes

3 Pipeline without Branch Predictor IF (br) PC Reg Read Compare Br-target PC + 4 In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch

4 Pipeline with Branch Predictor IF (br) PC Reg Read Compare Br-target In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch Branch Predictor

5 Branch Mispredict Penalty Assume: no data or structural hazards; only control hazards; every 5 th instruction is a branch; branch predictor accuracy is 90% Slowdown = 1 / (1 + stalls per instruction) Stalls per instruction = % branches x %mispreds x penalty = 20% x 10% x 1 = 0.02 Slowdown = 1/1.02 ; if penalty = 20, slowdown = 1/1.4

6 1-Bit Bimodal Prediction For each branch, keep track of what happened last time and use that outcome as the prediction What are prediction accuracies for branches 1 and 2 below: while (1) { for (i=0;i<10;i++) { branch-1 … } for (j=0;j<20;j++) { branch-2 … }

7 2-Bit Bimodal Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) If (counter >= 2), predict taken, else predict not taken Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”) Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor) Can be easily extended to N-bits (in most processors, N=2)

8 Bimodal 1-Bit Predictor Branch PC 10 bits Table of 1K entries Each entry is a bit The table keeps track of what the branch did last time

9 Bimodal 2-Bit Predictor Branch PC 10 bits Table of 1K entries Each entry is a 2-bit sat. counter The table keeps track of the common-case outcome for the branch

10 Correlating Predictors Basic branch prediction: maintain a 2-bit saturating counter for each entry (or use 10 branch PC bits to index into one of 1024 counters) – captures the recent “common case” for each branch Can we take advantage of additional information?  If a branch recently went 01111, expect 0; if it recently went 11101, expect 1; can we have a separate counter for each case?  If the previous branches went 01, expect 0; if the previous branches went 11, expect 1; can we have a separate counter for each case? Hence, build correlating predictors

11 Global Predictor Branch PC 10 bits Table of 1K entries Each entry is a 2-bit sat. counter The table keeps track of the common-case outcome for the branch/history combo Global history XOR

12 Local Predictor Branch PC Table of 16K entries of 2-bit saturating counters Table of 64 entries of 14-bit histories for a single branch Use 6 bits of branch PC to index into local history table 14-bit history indexes into next level Also a two-level predictor that only uses local histories at the first level

13 Local Predictor Branch PC 6 bits Table of 1K entries Each entry is a 2-bit sat. counter The table keeps track of the common-case outcome for the branch/local-history combo Local history 10 bit entries XOR 64 entries 10 bits

14 Local/Global Predictors Instead of maintaining a counter for each branch to capture the common case,  Maintain a counter for each branch and surrounding pattern  If the surrounding pattern belongs to the branch being predicted, the predictor is referred to as a local predictor  If the surrounding pattern includes neighboring branches, the predictor is referred to as a global predictor

15 Tournament Predictors A local predictor might work well for some branches or programs, while a global predictor might work well for others Provide one of each and maintain another predictor to identify which predictor is best for each branch Tournament Predictor Branch PC Table of 2-bit saturating counters Local Predictor Global Predictor MUXMUX Alpha 21264: 1K entries in level-1 1K entries in level-2 4K entries 12-bit global history 4K entries Total capacity: ?

16 Branch Target Prediction In addition to predicting the branch direction, we must also predict the branch target address Branch PC indexes into a predictor table; indirect branches might be problematic Most common indirect branch: return from a procedure – can be easily handled with a stack of return addresses

17 An Out-of-Order Processor Implementation Branch prediction and instr fetch R1  R1+R2 R2  R1+R3 BEQZ R2 R3  R1+R2 R1  R3+R2 Instr Fetch Queue Decode & Rename Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6 T1 T2 T3 T4 T5 T6 Reorder Buffer (ROB) T1  R1+R2 T2  T1+R3 BEQZ T2 T4  T1+T2 T5  T4+T2 Issue Queue (IQ) ALU Register File R1-R32 Results written to ROB and tags broadcast to IQ

18 Title Bullet