Download presentation
Presentation is loading. Please wait.
1
TAGE-SC-L Again MTAGE-SC
André Seznec INRIA/IRISA
2
Where do these predictors come from ?
GEHL: CBP 2004 , ISCA 2005 TAGE: JILP 2006, CBP 2006 Statistical correlation: CBP 2011 Combining more info: Micro 2011, CBP 2014, Micro 2015 Optimizing everything: CBP 2016 Unlimited: CBP 2014 CBP 2016
3
« long » global histories:
Around 2002 Introduction of perceptron predictor (Jimenez01) State-of-the-art : EV8 predictor Lagging behind perceptron on a few benchmarks + with EV8-like: some applications would benefit from 100+ history bits Both able to handle « long » global histories: 30+ branches
4
GEOMETRIC HISTORY LENGTH PREDICTOR
CBP 2004
5
A Multiple length global history predictor
Σ L(4) L(3) L(2) L(1) T0 T1 T2 T3 T4 With a limited number of tables
6
Table T2 should allow to discriminate
Underlying idea H and H’ two history vectors equal on N bits, but differ on bit N+1 e.g. L(1)NL(2) Branches (A,H) and (A,H’) biased in opposite directions Table T2 should allow to discriminate between (A,H) and (A,H’)
7
GEometric History Length predictor
The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!
8
GEHL (CBP 2004) Neural inspired Use of 200+ bits of global history
Narrow counters Dynamic threshold update
9
TAgged GEometric history length predictor
TAGE TAgged GEometric history length predictor JILP 2006
10
At CBP 2004, only neural predictors
apart PPM-like predictor (Michaud 2004) but .. The update policy was poor
11
TAGE (JILP 2006) Partial tag match almost .. Geometric history length
Very effective update policy
12
TAGE: Tagged and prediction by the longest history matching entry
pc h[0:L1] ctr u tag =? prediction h[0:L2] h[0:L3] 1 Tagless base predictor
13
=? 1 Hit Altpred Pred Miss
14
Prediction computation
General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through 4-bit counters
15
A tagged table entry Ctr: 3-bit prediction counter
U: 1 or 2-bit counters Was the entry recently useful ? Tag: partial tag Tag Ctr U
16
Allocate entries on mispredictions
Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 Limited storage budget: Allocate 2 entries (when 15 to 20 different history lengths)
17
Managing the (U)seful counter
Increment when avoids a misprediction (Pred = taken) & (Altpred ≠ taken) Becomes « useful » Global decrement when it becomes « difficult » to allocate: Many possible heuristics (« difficult » ≈ 2/3 of the entries useful) CBP 2016 heuristics: ≈ 0.5 % MPKI
18
May vary with individual benchmarks !
TAGE vs GEHL: At equal sizes: ≈ 10 % MPKI reduction May vary with individual benchmarks !
19
Optimizations for CBP2016 ≈ 2 % MPKI reduction Sharing storage space
Small hist. sharing a bank-interleaved table Small tag (8 bits) Long hist. sharing a bank-interleaved table Longer tag (12 bits) Partial associativity 2 banks for medium hist. Lengths ≈ 2 % MPKI reduction
20
Statistical Corrector
TAGE + (G)SC Statistical Corrector (Global history) CBP2011
21
From CBP 2011, « the Statistical Corrector targets »
Branches with poor correlation with history: Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that: « For this (PC, history, prediction), TAGE is likely (>50 %) to mispredict » statistically
22
TAGE-GSC ( CBP 2011) (was named a posteriori in Micro 2015)
≈3-5% MPKI red. (Main) TAGE Predictor Stat. Cor. Prediction + Confidence PPC + Glob hist PC +Global history Just a global hist neural predictor: + tables indexed with PC, TAGE pred. and confidence
23
Confidence for TAGE (HPCA 2011)
The value of the counter providing the prediction: Saturated = high confidence Intermediate= medium confidence Weak = low confidence
24
Why does it work The bias tables indexed with PC+TAGE outputs:
Correct (most of the time) High counter value Dominates, not many updates Wrong Other counters can be trained (Statistical) Correlation (if it exists) can be captured
25
Optimizations for CBP 2016 Use TAGE confidence for indexing SC
≈ 1 % MPKI red. On (very) low SC confidence: May use TAGE prediction (if high conf, ..) ≈ 0.4 % MPKI red.
26
The beauty of neural predictors
TAGE-SC The beauty of neural predictors Micro 2011, CBP 2014, Micro 2015
27
From Compaq in 1999 I learnt: Use global history Avoid local history
OK, I cheated with loops I learnt: Use global history Avoid local history Did manage to submit only global history at CBP 2004, and 2011
28
Speculative history must be managed !?
Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!? Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history
29
without using local history
Would not have won CBP 2014 without using local history
30
How to use local histories with TAGE+(G)SC
Add the local history tables in the neural SC as in the perceptron [Jimenez2002] ≈ 0.9 % MPKI reduction with 2Kbits on the 8KB predictor ≈ 2.5 % MPKI reduction with 28Kbits on the 64KB predictor I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
31
The beauty of neural predictors
TAGE-SC: Just the right framework to test information vectors Add extra tables: some benefit ! continue to explore
32
Can add extra components in SC
IMLI-based components Micro2015 Capture correlation in multidimensional loops Very disappointing results essentially no benefit on CBP5 traces Other forms of history: E.g. only backward branches
33
TAGE-SC-L + a loop predictor (just in case)
34
Loop predictor Can predict loop exit
for loops with large iteration numbers regular number of iterations Limited storage budget (a few entries) But marginal benefit I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS
35
TAGE-SC-L summary for CBP-5
Most of the budget on global hist. correlation: -TAGE with ≈ 1200 br. for 64 KB and ≈ 400 br. for 8KB -optimize the storage sharing -optimize the allocation Track the statistical correlation with a neural component: -use TAGE prediction AND confidence -incorporate other forms of history (even local history if you are trying to win CBP-5)
36
TAGE-SC-L is still far from the predictability limits
MTAGE-SC TAGE-SC-L is still far from the predictability limits
37
poTAGE-SC: the previous champion
poTAGE+COLT (Michaud2014) and TAGE-SC-L
38
poTAGE + COLT (Michaud2014)
TAGE predictors a (PC + 5 pred) indexed table Global history Local history 1 Local history 2 COLT selection Local History 3 Frequency Use TAGE concept on other forms of hist.
39
Unlimited TAGE-SC TAGE predictor Statistical Corrector Global history
Bias GEHL RHSP Final choser other GEHL and perceptrons ...
40
poTAGE-SC TAGE predictors Statistical Corrector Global history Bias
Local history 1 Local history 2 Local History 3 Frequency COLT selection Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector
41
MTAGE-SC Global history Local history 1 Local history 2
Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector
42
MTAGE-SC ≈ 5 % MPKI reduction over poTAGE-SC
Global history Local history 1 Local history 2 Local History 3 Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector ≈ 5 % MPKI reduction over poTAGE-SC Leverages confidence from SC and TAGE pred. combiner TAGE prediction combiner: COLT pred + neural combination of outputs pred + confidence Global backward history: to capture long path correlation, but eliminate intermediate branches A few extra history forms: IMLI, ..
43
Seems that I am not making progress !!
CBP 2006 misp. rate: 32KB L-TAGE ≈ 1.22 GTL CBP 2014 misp.rate: 32KB TAGE-SC-L ≈ 1.40 poTAGE-SC CBP 2016 misp.rate: 64KB TAGE-SC-L ≈ 1.55 MTAGE-SC Not the same traces, but ..
44
Conclusion TAGE-SC-L fits limited storage sizes:
Most significant optimizations over CBP 2014 Use of TAGE confidence as index for SC Sharing and partial associativity MTAGE-SC: Predictability limits even (a little bit) further that previously expected
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.