Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
3 Conditional Branches if (i > 0) /* something */ else /* something else */ for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? 1/8
4 Program Execution Fetch = take next instruction Decode = analyze type and read operands Execute Write Back = write result FetchDecodeExecuteWrite Back R1=R2+R3 addition 43 computation R1 contains 7
5 Pipelined architectures Parallel versus sequential: Constant flow of instructions possible Faster applications Limitation due to conditional branches FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1R1=R2+R3R5=R2+1R4=R3-1R1=R2+R3R5=R2+1R4=R3-1R7=2*R1R5=R2+1R4=R3-1R7=2*R1R5=R6R4=R3-1R7=2*R1R5=R6R1>0
6 Problem: Branches Branches introduce bubbles Affects pipeline throughput FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 ? if R1>0R5=R2+1 ?? if R1>0 R7=2*R1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1
7 Solution: Prediction Fetch those instructions that are likely to be executed FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1 R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 R7=2*R1 if R1>0R5=R2+1 R7=2*R1R2=R2-1 correct prediction = gain misprediction = penalty
8 Nowaday’s Architecture instruction cache fetchdecoderegister renamedispatch instruction window re- order logic functional unit register file IPC functional unit functional unit functional unit Branch predictor
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
10 Bimodal Branch Predictor Predict outcome of condition e.g. if or else based on unique branch address Update prediction table k Branch address prediction table
11 Global History Branch Predictor k Global history prediction table Predict outcome of condition e.g. for loop based on global history Update prediction table and global history
12 Gshare Branch Predictor k Global history Branch address Original index prediction table [McFarling] XOR
13 Misprediction rate: gshare predictor size (bytes) misprediction rate SPEC INT 2000 better
14 Aliasing Resource limitations: 8 entries, index = 3 bits index 101 Two different branches using the same prediction information 3 bit index Index=101 B A prediction table
15 Aliasing SPEC INT 2000
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
17 Basic Observations Branches with similar behavior can share prediction information Branches can use same table entry, e.g time
18 Time Varying Behavior % 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: A: B: C: D: phase NE = not executed phase
19 Branch Clustering Each branch represents a point in N-dim space Clusters formed by k- means algorithm A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33%
20 k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest center X X 4. Restart with new centers X X 3. redefine centers X X X X
21 k-Means Cluster Algorithm X X 1. initial centers Stable solution X X X X X X 2. calculate nearest centers 3. redefine centers
22 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) Tradeoff between k and goodness of a clustering Stable solution with k=2 X X Stable solution with k=3 X X X best?
23 Branch Clustering SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
25 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index Index = 1Cluster prediction table
26 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index Index = 1Cluster prediction table
27 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index to 6 bits for cluster [SPECint2000] can be used in every predictor scheme 3 Index = 1Cluster prediction table
28 Subtables for Bimodal Cluster Branch addr prediction table
29 Subtables for Gshare Cluster Branch addr prediction table Global history 19% better for SMALL predictors
30 Why Clustered Indexing Works Subtabling Uses smaller predictors More aliasing expected… but More constructive aliasing
31 Hashing: Alternative to Subtables Keeps original global history length Global history Gshare ix index Branch addr Cluster prediction table
32 Hashing for Gshare predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 3,5 4 4,5 5 5,5 6 6,5 7 7, predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 5% better for LARGE predictors
33 Self Profile-Based Clustering Limit study Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster
34 Cross Profile-Based Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster SELF 90% 10% 100% 60% NE NE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% A: B: C: D: E: Cluster SPEC-train inputs OK Cluster additional cluster for unseen branches Cluster
35 Cross Profile-Based Clustering predictor size (bytes) misprediction rate bimodal original bimodal self clustered bimodal cross clustered predictor size (bytes) misprediction rate gshare original gshare self clustered gshare cross clustered cross clustered still good small budgets: subtables 12.3% less mispredictions (19% self large budgets: hashing 3% better (5% self clustered)
36 Conclusion Small branch predictors suffer from aliasing frequently destructive Exploit constructive aliasing by clustering branches Implementation subtables (can be used in all branch prediction schemes) hashing (specific for gshare) Gshare misprediction 1KiB: reduced by 19% (self), 12.3% 256KiB: reduced by 5% (self), 3% (cross)
Questions?
The End