Download presentation
Presentation is loading. Please wait.
Published byMaud Hancock Modified over 9 years ago
1
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
2
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
3
3 Conditional Branches if (i > 0) /* something */ else /* something else */ for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? 1/8
4
4 Program Execution Fetch = take next instruction Decode = analyze type and read operands Execute Write Back = write result FetchDecodeExecuteWrite Back R1=R2+R3 addition 43 computation R1 contains 7
5
5 Pipelined architectures Parallel versus sequential: Constant flow of instructions possible Faster applications Limitation due to conditional branches FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1R1=R2+R3R5=R2+1R4=R3-1R1=R2+R3R5=R2+1R4=R3-1R7=2*R1R5=R2+1R4=R3-1R7=2*R1R5=R6R4=R3-1R7=2*R1R5=R6R1>0
6
6 Problem: Branches Branches introduce bubbles Affects pipeline throughput FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 ? if R1>0R5=R2+1 ?? if R1>0 R7=2*R1 R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1
7
7 Solution: Prediction Fetch those instructions that are likely to be executed FetchDecodeExecuteWrite Back R1=R2+R3 R5=R2+1 R7=0 R7=2*R1 R5=R6 if R1>0 else then R2=R2-1 R1=R2+R3 R5=R2+1 R4=R3-1R7=2*R1R5=R6 if R1>0 R5=R2+1R5=R6 R7=2*R1 if R1>0R5=R2+1 R7=2*R1R2=R2-1 correct prediction = gain misprediction = penalty
8
8 Nowaday’s Architecture instruction cache fetchdecoderegister renamedispatch instruction window re- order logic functional unit register file IPC functional unit functional unit functional unit Branch predictor
9
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
10
10 Bimodal Branch Predictor Predict outcome of condition e.g. if or else based on unique branch address Update prediction table k Branch address prediction table
11
11 Global History Branch Predictor k Global history prediction table Predict outcome of condition e.g. for loop based on global history 111101111011110 Update prediction table and global history
12
12 Gshare Branch Predictor k Global history Branch address Original index prediction table [McFarling] XOR
13
13 Misprediction rate: gshare 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate SPEC INT 2000 better
14
14 Aliasing Resource limitations: 8 entries, index = 3 bits index 101 Two different branches using the same prediction information 3 bit index Index=101 B A prediction table
15
15 Aliasing SPEC INT 2000
16
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
17
17 Basic Observations Branches with similar behavior can share prediction information 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 Branches can use same table entry, e.g. 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 time
18
18 Time Varying Behavior 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: A: B: C: D: phase NE = not executed phase
19
19 Branch Clustering Each branch represents a point in N-dim space Clusters formed by k- means algorithm A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33%
20
20 k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest center X X 4. Restart with new centers X X 3. redefine centers X X X X
21
21 k-Means Cluster Algorithm X X 1. initial centers Stable solution X X X X X X 2. calculate nearest centers 3. redefine centers
22
22 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) Tradeoff between k and goodness of a clustering Stable solution with k=2 X X Stable solution with k=3 X X X best?
23
23 Branch Clustering SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster
24
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium
25
25 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 Index = 1Cluster prediction table
26
26 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 Index = 1Cluster prediction table
27
27 Subtables Example 8 entries, index = 3 bits 4 clusters, 2 bits Original index 101 3 to 6 bits for cluster [SPECint2000] can be used in every predictor scheme 3 Index = 1Cluster prediction table
28
28 Subtables for Bimodal Cluster Branch addr prediction table
29
29 Subtables for Gshare Cluster Branch addr prediction table Global history 19% better for SMALL predictors
30
30 Why Clustered Indexing Works Subtabling Uses smaller predictors More aliasing expected… but More constructive aliasing
31
31 Hashing: Alternative to Subtables Keeps original global history length Global history Gshare ix index Branch addr Cluster prediction table
32
32 Hashing for Gshare 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 3,5 4 4,5 5 5,5 6 6,5 7 7,5 1000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare clustered: subtables gshare clustered: hashed 5% better for LARGE predictors
33
33 Self Profile-Based Clustering Limit study Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster
34
34 Cross Profile-Based Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster SELF 90% 10% 100% 60% NE NE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% A: B: C: D: E: Cluster SPEC-train inputs OK Cluster additional cluster for unseen branches Cluster
35
35 Cross Profile-Based Clustering 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate bimodal original bimodal self clustered bimodal cross clustered 0 5 10 15 20 25 101001000100001000001000000 predictor size (bytes) misprediction rate gshare original gshare self clustered gshare cross clustered cross clustered still good GSHARE @ small budgets: subtables 12.3% less mispredictions (19% self clustered) @ large budgets: hashing 3% better (5% self clustered)
36
36 Conclusion Small branch predictors suffer from aliasing frequently destructive Exploit constructive aliasing by clustering branches Implementation subtables (can be used in all branch prediction schemes) hashing (specific for gshare) Gshare misprediction rate @ 1KiB: reduced by 19% (self), 12.3% (cross) @ 256KiB: reduced by 5% (self), 3% (cross)
37
Questions?
38
The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.