Download presentation
Presentation is loading. Please wait.
Published byPierce Bennett Modified over 9 years ago
1
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA
2
2 Branch prediction Just the simplest way to improve processor core performance: Replacing the branch predictor by a more accurate one does not necessitate to change the rest of the design
3
3 The TAGE branch predictor Introduced in 2006 State-of-the-art global history predictor CBP-2 (2006), CBP-3 (2011)
4
4 TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories
5
5 TAGE: Tagged and prediction by the longest history matching entry pc h[0:L1] ctru tag =? ctru tag =? ctru tag =? prediction pc h[0:L2]pch[0:L3] 1 11 1111 1 1 Tagless base predictor
6
6 Why TAGE State-of-the-art global history predictor But also: Large cost-effective design space: 32Kbits-512 Kbits 4 to 12 tables 100-1000 history bits Confidence estimator for free [HPCA2011]
7
7 And more in this presentation Cost-effective hardware complexity and energy consumption Limited nb of accesses to predictor tables Use of single ported predictor tables Improving TAGE accuracy with a small side predictor Tracking statistical correlation Using local history
8
8 The implicit 3 accesses scenario in academic studies A prediction on the right path: Read at prediction time Update at retire time Re-read Update 3 accesses on the same predictor entry !! Might lead to the usage of 3-ported components Might lead to the usage of 3-ported components
9
9 Why not only 2 accesses through propagating values read at prediction time A loop, a bimodal predictor C=1 C=1, misprediction C=0, misprediction Execute C=0 Fetch Retire C=2 C=2, correct prediction
10
10 Is it that important for global history predictors ? Only 2 accesses: 33 % increase misp. rate on gshare 17 % increase misp. rate on GEHL 4 % increase misp. rate on TAGE Using 3rd Championship Branch Prediction framework
11
11 Reducing the number of predictor writes At retire time: Lots of silent updates (rewrite saturated counters) [Banisiadi and Moshovos 2003] ~ 2.1 writes for 1 mispredictions for TAGE less than 10 writes for 100 branches
12
12 Eliminating most of the reads at retire time On correct predictions, do not re-read, but use the values read at prediction time: gshare: +4.5 % mispredictions GEHL: +8.8 % mispredictions TAGE: +1.3 % mispredictions TAGE: 1.13 access {prediction+ (read at retire) + update} per prediction on the correct path TAGE: 1.13 access {prediction+ (read at retire) + update} per prediction on the correct path
13
13 Opens opportunity to use single-ported memory components Cycle stealing: Wait for free cycles to update Mispredictions Fetch gating Front-end stalls Complex management.. and impact on accuracy ? Complex management.. and impact on accuracy ?
14
14 A simple and general scheme using single-ported components A simple and general scheme using single-ported components
15
15 4-way interleaved single ported 4 banks per predictor table Guarantee that 3 consecutive predictions are done by 3 different banks: Predictions for Z after X and Y b(Z) = Z & 3 while ((b(Z)==b(X))|| (b(Z)==b(Y)) b(Z) += (1 & 3)
16
16 Read at prediction has priority Read at retire is delayed by at most one cycle Write update is delayed by at most two cycles
17
17 B0 B1 B3 B2 Pa Rt Un T=0 Prediction has priority no prediction for at least 2 cycles no prediction for at least 2 cycles Worst case for an update no extra read at retire time and no update for 2 cycles no extra read at retire time and no update for 2 cycles
18
18 B0 B1 B3 B2 Rt Un T=1 No prediction by construction Read at retire time
19
19 B0 B1 B3 B2 Un T=2 No prediction and no read at retire time by construction
20
20 4-way interleaved vs 3-ported TAGE predictor 0.5 % increase of misprediction rate 3.3x decrease of silicon area of the predictor tables 2x decrease of energy per table access Works also for the other global history predictors
21
21 Improving TAGE accuracy with a small side predictor
22
22 Two classes of branches not that well predicted by TAGE « Statistically » correlated branches: Not really correlated with the global history, but exhibit a bias Sometimes better predicted by a single wide PC indexed counter than by TAGE « Statistically » correlated branches: Not really correlated with the global history, but exhibit a bias Sometimes better predicted by a single wide PC indexed counter than by TAGE Branches correlated with local history: No problem if very regular global history TAGE can not learn the pattern if irregular Not just the loops with constant iteration numbers Branches correlated with local history: No problem if very regular global history TAGE can not learn the pattern if irregular Not just the loops with constant iteration numbers
23
23 The Statistical Corrector predictor (from 3rd Championship Branch Prediction) Poor correlation with global history, but some bias Track cases such that: « In this case (PC, history, prediction), TAGE is likely (>50 %) to mispredict » AND REVERSE THE PREDICTION !! Tree of adders captures the « average behavior »
24
24 Statistical Correlator Predictor TAGE HAHA S tat. Corr. Prediction + ctr value ++ H A Pred Gehl-like 2.5 % misprediction rate decrease
25
25 Use the same principle for local history biased branches ! Use the same principle for local history biased branches !
26
26 Local Statistical Correlator Predictor TAGE HAHA Local S tat. Corr. Prediction + ctr value ++ LH A Pred LGehl-like Local hist. 478 Kbits 30 Kbits
27
27 Local Statistical Corrector Predictor 8-9 % misprediction rate decrease over TAGE local history correlation AND statistically biased branches No need for loop predictor Small local history tables (32-64 entries) State-of-art prediction accuracy: without the irrealistic tricks used at 3rd CBP State-of-art prediction accuracy: without the irrealistic tricks used at 3rd CBP
28
28 Managing speculative local history: not that easy S(peculative) H(istory) P(rogram) C(ounter) Inflight branches SH PC SH PC SH PC SH PC SH PC SH PC SH PC SH PC Direct Mapped Local History Table Direct Mapped Local History Table Stat. Corr. Local History Prediction SH = (SH <<1) + pred SH PC TAGE prediction
29
29 Major local history management cost The associative search on the inflight branches Can be leveraged to another goal !!
30
30 The « late update » mispredictions Issue: Some mispredictions are due to late updates at retirement, (later than resolution time) Immediate Update Mimicker: Try to catch these cases
31
31 PTAPTA Same table, same entry ETAETA ETAETA ETAETA PTAPTA PTAPTA ETAETA PTAPTA ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Misprediction P(rediction) or (E)xecuted T(able) A(ddress in the table) PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Fetch 1 % misp. rate decrease 1 % misp. rate decrease The Immediate Update Mimicker for TAGE
32
32 The Immediate Update Mimicker Marginal accuracy gain But can be combined with speculative local history management
33
33 MPPKI Storage budget
34
34 Against alternative predictors Outperforms the (not so realistic) podium of 3rd Championship Branch Prediction ISL-TAGE FTL++ GEHL+LGEHL based OH-SNAP Piecewise linear + varying weights Particularly, on the most predictable benchmarks
35
35 Putting all together Complexity and energy 4-way interleaved tables Reduced accesses at retire time Accuracy Local Statistical Corrector Predictor Immediate Update Mimicker ≈ State-of-the-art predictor Cost effective: silicon, energy
36
36 Conclusion Made a new case for TAGE: Already known: State-of-the-art global history predictor Confidence estimation for free Established: Area and energy effective implementation with single-ported components Accuracy improved with Local Statistical Predictor
37
37
38
38 Some « hope » on less predictable benchmarks MPPKI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.