TAGE-SC-L Again MTAGE-SC André Seznec INRIA/IRISA
Where do these predictors come from ? GEHL: CBP 2004 , ISCA 2005 TAGE: JILP 2006, CBP 2006 Statistical correlation: CBP 2011 Combining more info: Micro 2011, CBP 2014, Micro 2015 Optimizing everything: CBP 2016 Unlimited: CBP 2014 CBP 2016
« long » global histories: Around 2002 Introduction of perceptron predictor (Jimenez01) State-of-the-art : EV8 predictor Lagging behind perceptron on a few benchmarks + with EV8-like: some applications would benefit from 100+ history bits Both able to handle « long » global histories: 30+ branches
GEOMETRIC HISTORY LENGTH PREDICTOR CBP 2004
A Multiple length global history predictor Σ L(4) L(3) L(2) L(1) T0 T1 T2 T3 T4 With a limited number of tables
Table T2 should allow to discriminate Underlying idea H and H’ two history vectors equal on N bits, but differ on bit N+1 e.g. L(1)NL(2) Branches (A,H) and (A,H’) biased in opposite directions Table T2 should allow to discriminate between (A,H) and (A,H’)
GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!
GEHL (CBP 2004) Neural inspired Use of 200+ bits of global history Narrow counters Dynamic threshold update
TAgged GEometric history length predictor TAGE TAgged GEometric history length predictor JILP 2006
At CBP 2004, only neural predictors apart PPM-like predictor (Michaud 2004) but .. The update policy was poor
TAGE (JILP 2006) Partial tag match almost .. Geometric history length Very effective update policy
TAGE: Tagged and prediction by the longest history matching entry pc h[0:L1] ctr u tag =? prediction h[0:L2] h[0:L3] 1 Tagless base predictor
=? 1 Hit Altpred Pred Miss
Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through 4-bit counters
A tagged table entry Ctr: 3-bit prediction counter U: 1 or 2-bit counters Was the entry recently useful ? Tag: partial tag Tag Ctr U
Allocate entries on mispredictions Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 Limited storage budget: Allocate 2 entries (when 15 to 20 different history lengths)
Managing the (U)seful counter Increment when avoids a misprediction (Pred = taken) & (Altpred ≠ taken) Becomes « useful » Global decrement when it becomes « difficult » to allocate: Many possible heuristics (« difficult » ≈ 2/3 of the entries useful) CBP 2016 heuristics: ≈ 0.5 % MPKI
May vary with individual benchmarks ! TAGE vs GEHL: At equal sizes: ≈ 10 % MPKI reduction May vary with individual benchmarks !
Optimizations for CBP2016 ≈ 2 % MPKI reduction Sharing storage space Small hist. sharing a bank-interleaved table Small tag (8 bits) Long hist. sharing a bank-interleaved table Longer tag (12 bits) Partial associativity 2 banks for medium hist. Lengths ≈ 2 % MPKI reduction
Statistical Corrector TAGE + (G)SC Statistical Corrector (Global history) CBP2011
From CBP 2011, « the Statistical Corrector targets » Branches with poor correlation with history: Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that: « For this (PC, history, prediction), TAGE is likely (>50 %) to mispredict » statistically
TAGE-GSC ( CBP 2011) (was named a posteriori in Micro 2015) ≈3-5% MPKI red. (Main) TAGE Predictor Stat. Cor. Prediction + Confidence PPC + Glob hist PC +Global history Just a global hist neural predictor: + tables indexed with PC, TAGE pred. and confidence
Confidence for TAGE (HPCA 2011) The value of the counter providing the prediction: Saturated = high confidence Intermediate= medium confidence Weak = low confidence
Why does it work The bias tables indexed with PC+TAGE outputs: Correct (most of the time) High counter value Dominates, not many updates Wrong Other counters can be trained (Statistical) Correlation (if it exists) can be captured
Optimizations for CBP 2016 Use TAGE confidence for indexing SC ≈ 1 % MPKI red. On (very) low SC confidence: May use TAGE prediction (if high conf, ..) ≈ 0.4 % MPKI red.
The beauty of neural predictors TAGE-SC The beauty of neural predictors Micro 2011, CBP 2014, Micro 2015
From Compaq in 1999 I learnt: Use global history Avoid local history OK, I cheated with loops I learnt: Use global history Avoid local history Did manage to submit only global history at CBP 2004, 2006 and 2011
Speculative history must be managed !? Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!? Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history
without using local history Would not have won CBP 2014 without using local history
How to use local histories with TAGE+(G)SC Add the local history tables in the neural SC as in the perceptron [Jimenez2002] ≈ 0.9 % MPKI reduction with 2Kbits on the 8KB predictor ≈ 2.5 % MPKI reduction with 28Kbits on the 64KB predictor I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
The beauty of neural predictors TAGE-SC: Just the right framework to test information vectors Add extra tables: some benefit ! continue to explore
Can add extra components in SC IMLI-based components Micro2015 Capture correlation in multidimensional loops Very disappointing results essentially no benefit on CBP5 traces Other forms of history: E.g. only backward branches
TAGE-SC-L + a loop predictor (just in case)
Loop predictor Can predict loop exit for loops with large iteration numbers regular number of iterations Limited storage budget (a few entries) But marginal benefit I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
TAGE-SC-L summary for CBP-5 Most of the budget on global hist. correlation: -TAGE with ≈ 1200 br. for 64 KB and ≈ 400 br. for 8KB -optimize the storage sharing -optimize the allocation Track the statistical correlation with a neural component: -use TAGE prediction AND confidence -incorporate other forms of history (even local history if you are trying to win CBP-5)
TAGE-SC-L is still far from the predictability limits MTAGE-SC TAGE-SC-L is still far from the predictability limits
poTAGE-SC: the previous champion poTAGE+COLT (Michaud2014) and TAGE-SC-L
poTAGE + COLT (Michaud2014) TAGE predictors a (PC + 5 pred) indexed table Global history Local history 1 Local history 2 COLT selection Local History 3 Frequency Use TAGE concept on other forms of hist.
Unlimited TAGE-SC TAGE predictor Statistical Corrector Global history Bias GEHL RHSP Final choser other GEHL and perceptrons ...
poTAGE-SC TAGE predictors Statistical Corrector Global history Bias Local history 1 Local history 2 Local History 3 Frequency COLT selection Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector
MTAGE-SC Global history Local history 1 Local history 2 Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector
MTAGE-SC ≈ 5 % MPKI reduction over poTAGE-SC Global history Local history 1 Local history 2 Local History 3 Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector ≈ 5 % MPKI reduction over poTAGE-SC Leverages confidence from SC and TAGE pred. combiner TAGE prediction combiner: COLT pred + neural combination of outputs pred + confidence Global backward history: to capture long path correlation, but eliminate intermediate branches A few extra history forms: IMLI, ..
Seems that I am not making progress !! CBP 2006 misp. rate: 32KB L-TAGE ≈ 1.22 GTL CBP 2014 misp.rate: 32KB TAGE-SC-L ≈ 1.40 poTAGE-SC CBP 2016 misp.rate: 64KB TAGE-SC-L ≈ 1.55 MTAGE-SC Not the same traces, but ..
Conclusion TAGE-SC-L fits limited storage sizes: Most significant optimizations over CBP 2014 Use of TAGE confidence as index for SC Sharing and partial associativity MTAGE-SC: Predictability limits even (a little bit) further that previously expected