Download presentation
Presentation is loading. Please wait.
1
CSL718 : Pipelined Processors
Improving Branch Performance – contd. 21st Jan, 2006 Anshul Kumar, CSE IITD
2
Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD
3
Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD
4
Branch Elimination Use conditional/guarded instructions
(predicated execution) F C T S C : S OP1 BC CC = Z, + 2 ADD R3, R2, R1 OP2 OP1 ADD R3, R2, R1, NZ OP2 Examples: HP PA (all integer arithmetic/logical instructions) DEC Alpha, SPARC V9 (conditional move) Anshul Kumar, CSE IITD
5
Branch Elimination - contd.
CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) Anshul Kumar, CSE IITD
6
Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD
7
Branch Speed Up : early target address generation
Assume each instruction is Branch Generate target address while decoding If target in same page omit translation After decoding discard target address if not Branch IF IF IF D TIF TIF TIF AG BC Anshul Kumar, CSE IITD
8
Branch Speed Up : increase CC - branch gap
Increase the gap between the instruction which sets CC and branching Early CC setting Delayed branch Anshul Kumar, CSE IITD
9
Summary - Branch Speed Up
n=0 n=1 n=2 n=3 n=4 n=5 uncond cond (T) cond (I) uncond cond (T) cond (I) delayed early CC branch setting Anshul Kumar, CSE IITD
10
Delayed Branch with Nullification
(Also called annulment ) Delay slot is used optionally Branch instruction specifies the option Option may be exercised based on correctness of branch prediction Helps in better utilization of delay slots Anshul Kumar, CSE IITD
11
Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD
12
Branch Prediction Treat conditional branches as unconditional branches / NOP Undo if necessary Strategies: Fixed (always guess inline) Static (guess on the basis of instruction type / displacement) Dynamic (guess based on recent history) Anshul Kumar, CSE IITD
13
Static Branch Prediction
Total 68.2% Anshul Kumar, CSE IITD
14
Threshold for Static prediction
CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I actual T I guess T 4 5 I guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 Anshul Kumar, CSE IITD
15
Dynamic Branch Prediction - basic idea
Predict based on the history of previous branch loop: xxx mispredictions xxx for every xxx occurrence xxx BC loop Anshul Kumar, CSE IITD
16
Dynamic Branch Prediction - 2 bit prediction scheme
1 T 3/2 0/1 T T N predict taken predict not taken N N 2 3 T Anshul Kumar, CSE IITD
17
Dynamic Branch Prediction - second scheme
Predict based on the history of previous n branches e.g., if n = 3 then 3 branches taken predict taken 2 branches taken predict taken 1 branch taken predict not taken 0 branches taken predict not taken Anshul Kumar, CSE IITD
18
Dynamic Branch Prediction - Bimodal predictor
Maintain saturating counters T T T 1 2 3 T N N N N One counter per branch or One counter per cache line - merge results if multiple branches Anshul Kumar, CSE IITD
19
Dynamic Branch Prediction - History of last n occurrences
current entry updated entry outcome of last three occurrences of this branch 0 : not taken 1 : taken actual outcome ‘taken’ prediction using majority decision Anshul Kumar, CSE IITD
20
Dynamic Branch Prediction - storing prediction counters
store in separate buffer or store in cache directory CACHE directory storage cache line counter Anshul Kumar, CSE IITD
21
Correct guesses vs. history length
Anshul Kumar, CSE IITD
22
Two-Level Prediction Uses two levels of information to make a direction prediction Branch History Table (BHT) - last n occurrences Pattern History Table (PHT) - saturating 2 bit counters Captures patterned behavior of branches Groups of branches are correlated Particular branches have particular behavior Anshul Kumar, CSE IITD
23
Correlation between branches
B1: if (x) ... B2: if (y) z = x && y B3: if (z) B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2 Anshul Kumar, CSE IITD
24
Some Two-level Predictors
PC GBHR BHT PHT PHT T/NT T/NT Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT Anshul Kumar, CSE IITD
25
Two-level Predictor Classification
Yeh and Patt 3-letter naming scheme Type of history collected G (global), P (per branch), S (per set) PHT type A (adaptive), S (static) PHT organization g (global), p (per branch), s (per set) Examples - GAs, PAp etc. Anshul Kumar, CSE IITD
26
Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD
27
Branch Target Capture Branch Target Buffer (BTB)
Target Instruction Buffer (TIB) instr addr pred stats target target addr target instr prob of target change < 5% Anshul Kumar, CSE IITD
28
BTB Performance BTB miss go inline BTB hit go to target decision .4 .6
result inline target inline target .8 .2 .2 .8 delay 5 4 .4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*0 = 0.88 Anshul Kumar, CSE IITD
29
Dynamic information about branch
Previous branch decisions Explicit prediction Stored in cache directory Branch History Table, BHT Previous target address / instruction Implicit prediction Stored in separate buffer Branch Target Buffer, BTB Br Target Addr Cache, BTAC Target Instr Buffer, TIB Br Target Instr Cache, BTIC These two can be combined Anshul Kumar, CSE IITD
30
Storing prediction info
directory storage In cache cache line counter instr addr pred stats target In separate buffer Anshul Kumar, CSE IITD
31
Combined prediction mechanism
Explicit : use history bits Implicit : use BTB hit/miss hit go to target, miss go inline Combined : BTB hit/miss followed by explicit prediction using history bits. One of the following is commonly used hit go to target, miss explicit prediction miss go inline, hit explicit prediction Anshul Kumar, CSE IITD
32
Combined prediction BTB miss I BTB hit BTB miss BTB hit T expl predict
Prediction T: Target, I: Inline Actual outcome T: Target, I: Inline Anshul Kumar, CSE IITD
33
Structure of Tables Instruction fetch path with BHT BTAC BTIC
Anshul Kumar, CSE IITD
34
(no dynamic branch prediction)
Compute/fetch scheme (no dynamic branch prediction) Instruction Fetch address A I I I I + 3 I F A R I - cache BTA IIFA Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD
35
BHT (Branch History Table)
Instruction Fetch address 128 x 4 lines 8 instr/line I-cache 16 K 4-way set assoc BHT 128 x 4 entries 4 instr/cycle History bits 4 x 1 instr decode queue Prediction logic issue queue 4 x 1 instr Taken / not taken BTA for a taken guess Anshul Kumar, CSE IITD
36
BTAC scheme Instruction Fetch address I F A R I - cache BTAC + BA BTA
A I I I I + 3 BA BTA I F A R I - cache BTA IIFA BTAC + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD
37
BTIC scheme - 1 Instruction Fetch address I F A R I - cache BTIC +
A I BA BTI BTA+ I F A R I - cache BTA IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD
38
BTIC scheme - 2 computed Instruction Fetch address I F A R I - cache
A I I+1 BA BTI BTI+1 I F A R I - cache BTA+ IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD
39
Successor index in I-cache
Instruction Fetch address A I I I I + 3 IIFA I F A R I - cache Next address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.