Download presentation
Presentation is loading. Please wait.
Published byHerbert Antony Bailey Modified over 7 years ago
1
COSC3330 Computer Architecture Lecture 14. Branch Prediction
Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
2
Out-of-Order Execution Branch Prediction
Topic Out-of-Order Execution Branch Prediction
3
Superscalar Terminology
Superscalar Able to issue > 1 instruction / cycle Superpipelined Deep, but not superscalar pipeline. Issue Width Number of instructions issued per cycle Out-of-order Able to execute instructions out of program order Register Renaming Able to dynamically assign physical registers to instructions Speculative Execution Able to run instructions speculatively (branch predictions)
4
A Dynamic Superscalar Processor
IF ID RD ( in order ) Dispatch Buffer ( out of order ) ALU FP1 MEM1 BR EX FP2 MEM2 FP3 ( out of order ) Reorder Buffer ( in order ) WB
5
Remember the Toll Booth?
5s 30s Hands toll-booth agent a $100 bill; takes a while to count the change One-at-a-time = 45s OOO = 30s With a “4-Issue” Toll Booth L1 L2 L3 L4 OOO = Out of Order We’ll add the equivalent of the “shoulder” to the CPU: the Re-Order Buffer (ROB)
6
Re-Order Buffer (ROB) Separates architected vs. physical registers
Tracks program order of all in-flight instructions Enables in-order completion or “commit”
7
Hardware Organization
Instruction Buffers RAT Architected Register File ROB Reservation Stations and ALUs “head” op Qj Qk Vj Vk Add op Qj Qk Vj Vk Mult type dest value fin
8
Circular Ring Buffer
9
Stall issue if any needed resource not available
Instruction Buffers RAT Architected Register File Read inst from inst buffer Check if resources available: Appropriate RS entry ROB entry Read RAT, read (available) sources, update RAT Write to RS and ROB Reservation Stations and ALUs ROB op Qj Qk Vj Vk Add op Qj Qk Vj Vk Mult Stall issue if any needed resource not available type dest value fin
10
Exec Same as before Wait for all operands to arrive
Compete to use functional unit Execute!
11
Write Result Broadcast result on CDB
(any dependents will grab the value) Write result back to your ROB entry The ARF holds the “official” register state, which we will only update in program order Mark ready/finished bit in ROB (note that this inst has completed execution)
12
New: Commit When an inst is the oldest in the ROB
i.e. ROB-head points to it Write result (if ready/finished bit is set) If register producing instruction: write to architected register file If store: write to memory Advance ROB-head to next instruction This is what the outside world sees And it’s all in-order
13
Commit Illustrated Make instruction execution “visible” to the outside world “Commit” the changes to the architected state ROB Outside World “sees”: WB result A ARF A executed B B executed C C executed D D executed E E executed F G H Instructions execute out of program order, but outside world still “believes” it’s in-order J K
14
James E. Smith Eckert–Mauchly Award for fundamental contributions to high performance micro-architecture, including saturating counters for branch prediction, reorder buffers for precise exceptions, …
15
Loose Ends Up to now: Techniques for handling register-related dependencies Register renaming for WAR, WAW Tomasulo’s algorithm for scheduling RAW Still need to address: Control dependencies
16
Branch Prediction/Speculative Execution
When we hit a branch, guess if it’s T or NT ADD A Guess T Branch LOAD DIV ADD Branch SUB STORE SUB LOAD XOR STORE ADD MUL B Keep scheduling and executing Instructions as if the branch Didn’t even exist T NT C Q Sometime later, if we messed up… D R Just throw it all out … … And fetch the correct instructions
17
Branches Kill! Branches are very frequent
Approx. 20% of all instructions Can not afford waiting until we know where it goes Long pipelines Branch outcome known after B cycles No scheduling past the branch until outcome known Superscalars (e.g. 4-way) Branch every cycle or so! One cycle of work, then bubbles for ~B cycles?
18
Categorizing Branches
Source: H&P using Alpha
19
Surviving Branches: Prediction
Predict Branches And predict them well! Fetch, decode, etc. on the predicted path Option 1: No execute until branch resovled Option 2: Execute anyway (speculation) Recover from mispredictions Restart fetch from correct path A B T NT C Q D R … …
20
Branch Misprediction Single Issue 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 PC Next PC Fetch Drive Alloc Rename Queue Schedule Dispatch Reg File Exec Flags Br Resolve Single Issue Mispredict
21
Branch Misprediction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 PC Next PC Fetch Drive Alloc Rename Queue Schedule Dispatch Reg File Exec Flags Br Resolve Single Issue (flush entailed instructions and refetch) Mispredict
22
Branch Misprediction Single Issue
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 PC Next PC Fetch Drive Alloc Rename Queue Schedule Dispatch Reg File Exec Flags Br Resolve Single Issue Mispredict 8-issue Superscalar Processor (Worst case)
23
Intel Quad Core
24
A9 (Apple A5)
25
Importance of Branches
Instruction Window for ILP If misp rate equals 50%, and 1 in 5 insts is a branch, then number of useful instructions that we can fetch is: 5*(1 + ½ + (½)2 + (½)3 + … ) = 10 If we halve the miss rate down to 25%: 5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20 Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from
26
Branch Prediction Need to know two things
Whether the branch is taken or not (direction) The target address if it is taken (target) Direct jumps, Function calls Direction known (always taken), target easy to compute Conditional Branches (typically PC-relative) Direction difficult to predict, target easy to compute Indirect jumps, function returns Direction known (always taken), target difficult
27
Branch Prediction: Direction
Needed for conditional branches Most branches are of this type Many, many kinds of predictors for this Static: fixed rule, or compiler annotation (e.g. “BEQL” is “branch if equal likely”) Dynamic: hardware prediction Dynamic prediction usually history-based Example: predict direction is the same as the last time this branch was executed
28
Why Branch Direction is Predictable?
if (aa==2) aa = 0; if (bb==2) bb = 0; if (aa!=bb) …. for (i=0; i<100; i++) { …. } addi r2, r0, 2 bne r10, r2, L_bb xor r10, r10, r10 j L_exit L_bb: bne r11, r2, L_xx xor r11, r11, r11 j L_exit L_xx: beq r10, r11, L_exit … Lexit: addi r10, r0, 100 addi r1, r0, r0 L1: … … addi r1, r1, 1 bne r1, r10, L1
29
Static Branch Prediction
Uni-directional, always predict taken (or not taken) Backward taken, Forward not taken Need offset information Compiler hints with branch annotation When the info will be available? Post-decode?
30
FSM of the Simplest Predictor
A 2-state machine Change mind fast 1 If branch taken If branch not taken Predict not taken 1 Predict taken
31
Example using 1-bit branch history table
addi r10, r0, 4 addi r1, r1, r0 L1: … … addi r1, r1, 1 bne r1, r10, L1 for (i=0; i<4; i++) { …. } Pred 1 1 1 1 1 1 1 1 1 Actual T T T T NT T T T T NT T 1 60% accuracy
32
2-bit Saturating Up/Down Counter Predictor
MSB: Direction bit LSB: Hysteresis bit 01/ WN 00/ SN 10/ WT 11/ ST Taken Not Taken ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken Predict Not taken Predict taken
33
2-bit Counter Predictor (Another Scheme)
01/ WN 00/ SN 11/ ST 10/ WT Taken ST: Strongly Taken WT: Weakly Taken WN: Weakly Not Taken SN: Strongly Not Taken Not Taken Predict Not taken Predict taken
34
Example using 2-bit up/down counter
addi r10, r0, 4 addi r1, r1, r0 L1: … … addi r1, r1, 1 bne r1, r10, L1 for (i=0; i<4; i++) { …. } Pred 01 10 11 11 11 10 11 11 11 11 10 1 Actual T T T T NT T T T T NT T 01/ WN 00/ SN 10/ WT 11/ ST 80% accuracy
35
Bimodal Branch Prediction
PC Address 2N entries (each entry has a 2 bit counter) 1 N bits . table update 2N entries addressed by N-bit PC Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter FSM Update Logic Actual outcome Prediction
36
Global vs. Local Branch History
Local Behavior What is the predicted direction of Branch A given the outcomes of previous instances of Branch A? Global Behavior What is the predicted direction of Branch Z given the outcomes of all* previous branches A, B, …, X and Y? * number of previous branches tracked limited by the history length
37
Branch Correlation Code Snippet Branch direction Not independent
if (aa==2) // b1 aa = 0; if (bb==2) // b2 bb = 0; if (aa!=bb) { // b3 ……. } 1 (T) 0 (NT) b2 b2 1 1 b3 b3 b3 b3 Path: A:1-1 B:1-0 C:0-1 D:0-0 aa=0 bb=0 aa=0 bb2 aa2 bb=0 aa2 bb2 Branch direction Not independent Correlated to the path taken Example: Path 1-1 of b3 can be surely known beforehand Track path using a 2-bit register
38
Global Branch History Register
Code Snippet An N-bit Shift Register Shift-in branch outcomes 1 taken 0 not taken First-in First-Out BHR can be Global Local (Per-address) if (aa==2) // b1 aa = 0; if (bb==2) // b2 bb = 0; if (aa!=bb) { // b3 ……. } Actual T 001 T 011 110 NT 000
39
Local Branch History Register
addi r10, r0, 4 addi r1, r1, r0 L1: … … addi r1, r1, 1 bne r1, r10, L1 for (i=0; i<4; i++) { …. } Actual T 001 T 011 T 111 T 111 NT 110 T 101 T 011 T 111 T 111 NT 110 000
40
Two-Level Branch Predictor [YehPatt91,92,93]
Pattern History Table (PHT) 00…..00 2N entries 00…..01 Branch History Register (BHR) (Shift left when update) 00…..10 Rc-k Rc-1 1 1 1 N Prediction 11…..10 Current State 11…..11 PHT update Branch History Pattern FSM Update Logic Rc: Actual Branch Outcome Generalized correlated branch predictor 1st level keeps branch history in Branch History Register (BHR) 2nd level segregates pattern history in Pattern History Table (PHT)
41
Correlated Branch Predictor [PanSoRahmeh’92]
2-bit shift register (global branch history) Subsequent branch direction select Branch PC Branch PC 2-bit counter . 2-bit counter . X 2-bit counter . 2-bit counter . X X Prediction Prediction w hash hash . 2w 2-bit counter (2,2) Correlation Scheme 2-bit Sat. Counter Scheme (M,N) correlation scheme M: shift register size (# bits) N: N-bit counter
42
Pattern History Table 2N entries addressed by N-bit BHR
Each entry keeps a counter (2-bit or more) for prediction Counter update: the same as 2-bit counter Can be initialized in alternate patterns (01, 10, 01, 10, ..) Alias (or interference) problem
43
Two-Level Branch Prediction
The 2 LSBs are insignificant for 32-bit instruction PHT PC = 0x C . 10 0110 . BHR MSB = 1 Predict Taken
44
PHT Indexing Tradeoff between more history bits and address bits
Branch addr Global history Gselect 4/4 Insufficient History Tradeoff between more history bits and address bits Too many bits needed in Gselect sparse table entries
45
Gshare Branch Predictor [McFarling93]
Branch addr Global history Gselect 4/4 Gshare 8/8 Gselect 4/4: Index PHT by concatenate low order 4 bits Gshare 8/8: Index PHT by {Branch address Global history} Tradeoff between more history bits and address bits Too many bits needed in Gselect sparse table entries Gshare Not to lose global history bits Ex: AMD Athlon, MIPS R12000, Sun MAJC, Broadcom SiByte’s SB-1
46
Gshare Branch Predictor
PHT PC Address 1 . 00 1 . Global BHR MSB = 0 Predict Not Taken
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.