Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.

Similar presentations


Presentation on theme: "Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University."— Presentation transcript:

1 Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University Depts. of Elec. Eng. & Comp. Sci. This research was funded by NSF Grant MIP-9702281

2 The Branch Prediction Problem 1 out of 5 instructions is a branch1 out of 5 instructions is a branch May require many cycles to resolveMay require many cycles to resolve –P4 has 20 cycle branch resolution pipeline –Future pipeline depths likely to increase [Sprangle02] Predict branches to keep pipeline fullPredict branches to keep pipeline full PC ComputeBranch resolution

3 Bigger Predictors = More Accurate Larger predictors tend to yield more accurate predictionsLarger predictors tend to yield more accurate predictions Faster cycle times force smaller branch predictorsFaster cycle times force smaller branch predictors Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000]Overriding predictor couples small, fast predictor with a large, multi-cycle predictor [Jiménez2000] –performs close to ideal large-fast predictor (but bigger predictors = slower)

4 Hybrid Predictors Wide variety of branch prediction algorithms availableWide variety of branch prediction algorithms available Hybrid combines more than one “stand-alone” or component predictor [McFarling93]:Hybrid combines more than one “stand-alone” or component predictor [McFarling93]: P1P1P1P1 P2P2P2P2Meta-Predictor Final Prediction

5 Multi-Hybrids P1P1P1P1 P2P2P2P2 PnPnPnPn Pr. Encoder … … … … Final Prediction P1P1P1P1 P2P2P2P2 M1M1M1M1 P3P3P3P3 P4P4P4P4 M2M2M2M2 M3M3M3M3 “Multi-Hybrid” [Evers96] “Quad-Hybrid” [Evers00]

6 Our Idea: Prediction Fusion P1P1P1P1 … … P2P2P2P2 P3P3P3P3 PnPnPnPnXXX Prediction Selection P1P1P1P1 … … P2P2P2P2 P3P3P3P3 PnPnPnPn Prediction Fusion

7 Early Attempt from ML Weighted Majority algorithm [LW94]Weighted Majority algorithm [LW94] –Better predictors get assigned larger weights –Make final prediction with larger sum Predictor with largest weight not always correctPredictor with largest weight not always correct 0.4870.513 P2P2P2P2 P6P6P6P6 P7P7P7P7 P1P1P1P1 P3P3P3P3 P4P4P4P4 P5P5P5P5 P8P8P8P8 P 2, P 6 and P 7 say “not-taken”P 1, P 3, P 4, P 5 and P 8 say “taken”

8 Outline COLT PredictorCOLT Predictor Choosing parameters and componentsChoosing parameters and components PerformancePerformance Prediction distributions, component choicePrediction distributions, component choice

9 COLT Organization Branch Address Branch History P1P1P1P1 P2P2P2P2 P3P3P3P3 PnPnPnPn 1010… … MappingTable VMT … Final Prediction

10 Pathological Example P1P1P1P1 P2P2P2P2 P3P3P3P3 0 00 Actual outcome = 1 (taken)

11 Example (cont’d) P1P1P1P1 P2P2P2P2 P3P3P3P3 0 00 Outcome is always wrong Selection: P1P1P1P1 P2P2P2P2 P3P3P3P3 1 1 0 1 0 0 0 Can recognize and remember this pattern 1 COLT: VMT

12 COLT Lookup Delay 10011 …......... P1P1P1P1 P2P2P2P2 PnPnPnPn Prediction time … MT Select critical delay

13 Design Choices # of branch address bits# of branch address bits # of branch history bits# of branch history bits # of components# of components Choice of componentsChoice of components –gshare, PAs, gskewed, … –History length, PHT size, … } Determines number of mapping tables } Determines size of individual MT’s

14 Predictor Components Global HistoryGlobal History –gshare [McFarling93] –Bi-Mode [Lee97] –Enhanced gskewed [Michaud97] –YAGS [Eden98] Local HistoryLocal History –PAs [Yeh94] –pskewed [Evers96] OtherOther –2bC (bimodal) [Smith81] –Loop [Chang95] –alloyed Perceptron [Jiménez02] } history lengths optimized on test data sets Total of 59 configurations Sizes vary up to 64KB

15 Huge Search Space 2 59 ways to choose components2 59 ways to choose components  ways to choose COLT parameters  ways to choose COLT parameters We use a genetic searchWe use a genetic search … bit-k = 0 means don’t include P k bit-k = 1 means do include P k VMT Size historylength gene format: …

16 Methodology SPEC2000 integer benchmarksSPEC2000 integer benchmarks –For tuning/optimization: 10M branches from test –For evaluation: 500M branches from train Skipped first 100M branchesSkipped first 100M branches –Compiled with cc –arch ev6 –O4 –fast –non_shared SimpleScalar simulatorSimpleScalar simulator –sim-safe for trace collection –MASE for ILP simulations

17 Genetic Search COLT Results NameSize(KB)ComponentsVMT Counter width History length 16 alpct(34/10) gskewed(12) gshare(8)204848 32 alpct(34/10) gshare(15) gshare(9) PAs(7) 819247 64 alpct(40/14) gshare(16) YAGS(11) pskewed(6) 16384410 128 alpct(40/14) alpct(38/14) gshare(16) gskewed(13) YAGS(12) PAs(8) 1638447 256 alpct(50/18) alpct(34/10) gshare(18) Bi-Mode(16) gskewed(15) PAs(8) 3276844

18 Overall Predictor Performance

19 Per-Benchmark Performance

20 ILP Performance Simulated CPU:Simulated CPU: –6-issue –20 cycle pipeline –Same functional units, latencies, caches as Int e l P4/NetBurst microarchitecture 1-cycle2bC4-cycle OR alpct ++ 4-cycle OR COLT Ideal1-cycleCOLT

21 ILP Impact

22 COLT Parameter Sensitivity Mapping table counter widthsMapping table counter widths Number of mapping tablesNumber of mapping tables Number of history bits for VMT indexNumber of history bits for VMT index

23 Counter Width

24 VMT Size

25 History Length

26 Explaining Choice of Components Parameter sensitivity results shows GA performed well for the COLT parametersParameter sensitivity results shows GA performed well for the COLT parameters Why did it choose the component predictors that it did?Why did it choose the component predictors that it did?

27 Classifying COLT Predictions We examined the  (32KB) COLT config.We examined the  (32KB) COLT config. For each mapping table lookup, we examine the neighboring entries:For each mapping table lookup, we examine the neighboring entries: P1P1P1P1 P2P2P2P2 P3P3P3P3 P4P4P4P4 1001 1111 0010 1001 entry 0001 = NT entry 1001 = T entry 1101 = T

28 Classifying Predictions (cont’d) easy: all neighboring entries agree short: only gshare(9) distinguishes long: only gshare(14) distinguishes local: only PAs(7) distinguishes perceptron: only alpct(34/10) distinguishes multi-length: mix of gshare(9), (14) or alpct mixed: both global and local components gshare(9)gshare(14)PAs (7) alpct (34/10) 32KB COLT: Classes:

29 Prediction Classifications

30 Related Work/Issues Alloyed history [Skadron00]Alloyed history [Skadron00] Variable path history length [Stark98]Variable path history length [Stark98] Dynamic history length fitting [Juan98]Dynamic history length fitting [Juan98] Interference reduction [lots…]Interference reduction [lots…] COLT handles all of these cases* COLT handles all of these cases*  Doesn’t support partial update policies

31 Open Research Better individual componentsBetter individual components Augment with SBI [Manne99], agree [Sprangle97]Augment with SBI [Manne99], agree [Sprangle97] Better fusion algorithmsBetter fusion algorithms Hybrid fusion/selection algorithmsHybrid fusion/selection algorithms Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)Other domains (branch confidence prediction, value prediction, memory dependence prediction, instruction criticality prediction, …)

32 Summary Fusion is more powerful than selectionFusion is more powerful than selection –Combines multiple sources of information Branch behavior is very variedBranch behavior is very varied –Need long, short, global and local histories, multiple simultaneous lengths and types of history COLT is one possible fusion-based predictorCOLT is one possible fusion-based predictor –Combines multiple types of information –Current “best” purely dynamic predictor*

33 Questions?


Download ppt "Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University."

Similar presentations


Ads by Google