Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSL718 : Pipelined Processors

Similar presentations


Presentation on theme: "CSL718 : Pipelined Processors"— Presentation transcript:

1 CSL718 : Pipelined Processors
Improving Branch Performance – contd. 21st Jan, 2006 Anshul Kumar, CSE IITD

2 Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD

3 Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD

4 Branch Elimination Use conditional/guarded instructions
(predicated execution) F C T S C : S OP1 BC CC = Z,  + 2 ADD R3, R2, R1 OP2 OP1 ADD R3, R2, R1, NZ OP2 Examples: HP PA (all integer arithmetic/logical instructions) DEC Alpha, SPARC V9 (conditional move) Anshul Kumar, CSE IITD

5 Branch Elimination - contd.
CC IF IF IF D AG DF DF DF EX EX OP1 IF IF IF D AG TIF TIF TIF BC IF IF IF D’ D AG ADD/OP2 IF IF IF D AG DF DF DF EX EX ADD (cond) Anshul Kumar, CSE IITD

6 Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD

7 Branch Speed Up : early target address generation
Assume each instruction is Branch Generate target address while decoding If target in same page omit translation After decoding discard target address if not Branch IF IF IF D TIF TIF TIF AG BC Anshul Kumar, CSE IITD

8 Branch Speed Up : increase CC - branch gap
Increase the gap between the instruction which sets CC and branching Early CC setting Delayed branch Anshul Kumar, CSE IITD

9 Summary - Branch Speed Up
n=0 n=1 n=2 n=3 n=4 n=5 uncond cond (T) cond (I) uncond cond (T) cond (I) delayed early CC branch setting Anshul Kumar, CSE IITD

10 Delayed Branch with Nullification
(Also called annulment ) Delay slot is used optionally Branch instruction specifies the option Option may be exercised based on correctness of branch prediction Helps in better utilization of delay slots Anshul Kumar, CSE IITD

11 Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD

12 Branch Prediction Treat conditional branches as unconditional branches / NOP Undo if necessary Strategies: Fixed (always guess inline) Static (guess on the basis of instruction type / displacement) Dynamic (guess based on recent history) Anshul Kumar, CSE IITD

13 Static Branch Prediction
Total 68.2% Anshul Kumar, CSE IITD

14 Threshold for Static prediction
CC IF IF D AG AG DF DF EX EX I-1 IF IF D AG AG TIF TIF I actual  T I guess T 4 5  I guess target if 4 p + 5 (1 - p) < 6 p + 0 (1 - p) i.e. p > .71 Anshul Kumar, CSE IITD

15 Dynamic Branch Prediction - basic idea
Predict based on the history of previous branch loop: xxx mispredictions xxx for every xxx occurrence xxx BC loop Anshul Kumar, CSE IITD

16 Dynamic Branch Prediction - 2 bit prediction scheme
1 T 3/2 0/1 T T N predict taken predict not taken N N 2 3 T Anshul Kumar, CSE IITD

17 Dynamic Branch Prediction - second scheme
Predict based on the history of previous n branches e.g., if n = 3 then 3 branches taken  predict taken 2 branches taken  predict taken 1 branch taken  predict not taken 0 branches taken  predict not taken Anshul Kumar, CSE IITD

18 Dynamic Branch Prediction - Bimodal predictor
Maintain saturating counters T T T 1 2 3 T N N N N One counter per branch or One counter per cache line - merge results if multiple branches Anshul Kumar, CSE IITD

19 Dynamic Branch Prediction - History of last n occurrences
current entry updated entry outcome of last three occurrences of this branch 0 : not taken 1 : taken actual outcome ‘taken’ prediction using majority decision Anshul Kumar, CSE IITD

20 Dynamic Branch Prediction - storing prediction counters
store in separate buffer or store in cache directory CACHE directory storage cache line counter Anshul Kumar, CSE IITD

21 Correct guesses vs. history length
Anshul Kumar, CSE IITD

22 Two-Level Prediction Uses two levels of information to make a direction prediction Branch History Table (BHT) - last n occurrences Pattern History Table (PHT) - saturating 2 bit counters Captures patterned behavior of branches Groups of branches are correlated Particular branches have particular behavior Anshul Kumar, CSE IITD

23 Correlation between branches
B1: if (x) ... B2: if (y) z = x && y B3: if (z) B3 can be predicted with 100% accuracy based on the outcomes of B1 and B2 Anshul Kumar, CSE IITD

24 Some Two-level Predictors
PC GBHR BHT PHT PHT T/NT T/NT Local Predictor Global Predictor bits from PC and BHT can be combined to index PHT Anshul Kumar, CSE IITD

25 Two-level Predictor Classification
Yeh and Patt 3-letter naming scheme Type of history collected G (global), P (per branch), S (per set) PHT type A (adaptive), S (static) PHT organization g (global), p (per branch), s (per set) Examples - GAs, PAp etc. Anshul Kumar, CSE IITD

26 Improving Branch Performance
Branch Elimination replace branch with other instructions Branch Speed Up reduce time for computing CC and TIF Branch Prediction guess the outcome and proceed, undo if necessary Branch Target Capture make use of history Anshul Kumar, CSE IITD

27 Branch Target Capture Branch Target Buffer (BTB)
Target Instruction Buffer (TIB) instr addr pred stats target target addr target instr prob of target change < 5% Anshul Kumar, CSE IITD

28 BTB Performance BTB miss go inline BTB hit go to target decision .4 .6
result inline target inline target .8 .2 .2 .8 delay 5 4 .4*.8*0 + .4*.2*5 + .6*.2*4 + .6*.8*0 = 0.88 Anshul Kumar, CSE IITD

29 Dynamic information about branch
Previous branch decisions Explicit prediction Stored in cache directory Branch History Table, BHT Previous target address / instruction Implicit prediction Stored in separate buffer Branch Target Buffer, BTB Br Target Addr Cache, BTAC Target Instr Buffer, TIB Br Target Instr Cache, BTIC These two can be combined Anshul Kumar, CSE IITD

30 Storing prediction info
directory storage In cache cache line counter instr addr pred stats target In separate buffer Anshul Kumar, CSE IITD

31 Combined prediction mechanism
Explicit : use history bits Implicit : use BTB hit/miss hit  go to target, miss  go inline Combined : BTB hit/miss followed by explicit prediction using history bits. One of the following is commonly used hit  go to target, miss  explicit prediction miss  go inline, hit  explicit prediction Anshul Kumar, CSE IITD

32 Combined prediction BTB miss I BTB hit BTB miss BTB hit T expl predict
Prediction  T: Target, I: Inline Actual outcome  T: Target, I: Inline Anshul Kumar, CSE IITD

33 Structure of Tables Instruction fetch path with BHT BTAC BTIC
Anshul Kumar, CSE IITD

34 (no dynamic branch prediction)
Compute/fetch scheme (no dynamic branch prediction) Instruction Fetch address A I I I I + 3 I F A R I - cache BTA IIFA Compute BTA + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD

35 BHT (Branch History Table)
Instruction Fetch address 128 x 4 lines 8 instr/line I-cache 16 K 4-way set assoc BHT 128 x 4 entries 4 instr/cycle History bits 4 x 1 instr decode queue Prediction logic issue queue 4 x 1 instr Taken / not taken BTA for a taken guess Anshul Kumar, CSE IITD

36 BTAC scheme Instruction Fetch address I F A R I - cache BTAC + BA BTA
A I I I I + 3 BA BTA I F A R I - cache BTA IIFA BTAC + Next sequential address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD

37 BTIC scheme - 1 Instruction Fetch address I F A R I - cache BTIC +
A I BA BTI BTA+ I F A R I - cache BTA IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD

38 BTIC scheme - 2 computed Instruction Fetch address I F A R I - cache
A I I+1 BA BTI BTI+1 I F A R I - cache BTA+ IIFA BTIC + Next sequential address To decoder Anshul Kumar, CSE IITD

39 Successor index in I-cache
Instruction Fetch address A I I I I + 3 IIFA I F A R I - cache Next address BTI BTI+1 BTI+2 BTI+3 Anshul Kumar, CSE IITD


Download ppt "CSL718 : Pipelined Processors"

Similar presentations


Ads by Google