Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium

3 Conditional Branches if (i > 0) /* something */ else /* something else */ for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? 1/8

4 Program Execution Fetch = take next instruction Decode = analyze type and read operands Execute Write Back = write result FetchDecodeExecuteWrite Back R1=R2+R3 addition 43 computation R1 contains 7

5 Pipelined architectures Parallel versus sequential: Constant flow of instructions possible Faster applications Limitation due to conditional branches FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1R1=R2+R3R5=R2+1R4=R3-1R1=R2+R3R5=R2+1R4=R3-1R7=2*R1R5=R2+1R4=R3-1R7=2*R1R5=R6R4=R3-1R7=2*R1R5=R6R1>0

6 Problem: Branches Branches introduce bubbles Affects pipeline throughput FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 ? if R1>0R5=R2+1 ?? if R1>0 R7=2*R1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1

7 Solution: Prediction Fetch those instructions that are likely to be executed FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1 R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 R7=2*R1 if R1>0R5=R2+1 R7=2*R1R2=R2-1 correct prediction = gain misprediction = penalty

8 Nowaday’s Architecture instruction cache fetchdecoderegister renamedispatch instruction window re- order logic functional unit register file IPC functional unit functional unit functional unit Branch predictor

10 Bimodal Branch Predictor Predict outcome of condition e.g. if or else based on unique branch address Update prediction table k Branch address prediction table

11 Global History Branch Predictor k Global history prediction table Predict outcome of condition e.g. for loop based on global history 111101111011110 Update prediction table and global history

12 Gshare Branch Predictor k Global history Branch address Original index prediction table [McFarling] XOR

13 Misprediction rate: gshare 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate SPEC INT 2000 better

14 Aliasing Resource limitations: 8 entries, index = 3 bits index 101 Two different branches using the same prediction information 3 bit index Index=101 B A prediction table

15 Aliasing SPEC INT 2000

17 Basic Observations Branches with similar behavior can share prediction information 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 Branches can use same table entry, e.g. 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 time

18 Time Varying Behavior 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: A: B: C: D: phase NE = not executed phase

19 Branch Clustering Each branch represents a point in N-dim space Clusters formed by k- means algorithm A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33%

20 k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest center X X 4. Restart with new centers X X 3. redefine centers X X X X

21 k-Means Cluster Algorithm X X 1. initial centers Stable solution X X X X X X 2. calculate nearest centers 3. redefine centers

22 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) Tradeoff between k and goodness of a clustering Stable solution with k=2 X X Stable solution with k=3 X X X best?

23 Branch Clustering SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster

25 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 Index = 1Cluster prediction table

26 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 Index = 1Cluster prediction table

27 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 to 6 bits for cluster [SPECint2000] can be used in every predictor scheme 3 Index = 1Cluster prediction table

28 Subtables for Bimodal Cluster Branch addr prediction table

29 Subtables for Gshare Cluster Branch addr prediction table Global history 19% better for SMALL predictors

30 Why Clustered Indexing Works Subtabling Uses smaller predictors More aliasing expected… but More constructive aliasing

31 Hashing: Alternative to Subtables Keeps original global history length Global history Gshare ix index Branch addr Cluster prediction table

32 Hashing for Gshare 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 3,5 4 4,5 5 5,5 6 6,5 7 7,5 1000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 5% better for LARGE predictors

33 Self Profile-Based Clustering Limit study Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster

34 Cross Profile-Based Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster SELF 90% 10% 100% 60% NE NE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% A: B: C: D: E: Cluster SPEC-train inputs OK Cluster additional cluster for unseen branches Cluster

35 Cross Profile-Based Clustering 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate bimodal original bimodal self clustered bimodal cross clustered 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare self clustered gshare cross clustered cross clustered still good GSHARE @ small budgets: subtables 12.3% less mispredictions (19% self clustered) @ large budgets: hashing 3% better (5% self clustered)

36 Conclusion Small branch predictors suffer from aliasing frequently destructive Exploit constructive aliasing by clustering branches Implementation subtables (can be used in all branch prediction schemes) hashing (specific for gshare) Gshare misprediction rate @ 1KiB: reduced by 19% (self), 12.3% (cross) @ 256KiB: reduced by 5% (self), 3% (cross)

Questions?

The End

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Similar presentations

Presentation on theme: "Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

Similar presentations

Presentation on theme: "Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium."— Presentation transcript:

Similar presentations

About project

Feedback