Download presentation
Presentation is loading. Please wait.
Published byGrant Williamson Modified over 9 years ago
1
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA
2
2 Why confidence estimation for branch predictors Energy/performance tradeoffs: Guiding fetch gating or fetch throttling: Dynamic speculative structures resizing Controlling SMT resource allocation through fetch policies Fetch the “most” useful instructions Dual Path execution
3
3 What is confidence estimation ? Assert a confidence to a prediction : Is it likely that the prediction is correct ? Generally discriminate only low and high confidence predictions: High confidence: « very likely » to be correct Low confidence: « not so likely » to be correct
4
4 Confidence estimation for branch predictors 1981, Jim Smith: weak counters predictions are more likely to mispredict 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters Increment on correct prediction, reset on misprediction low confidence < threshold ≤ high confidence 1998 Enhanced JRS Grunwald et al: Use the prediction in the index A few other proposals: Self confidence for perceptrons.. Most studies still use enhanced JRS confidence estimators
5
5 Metrics for confidence estimators (Grunwald et al 1998) SENS Sensitivity: Fraction of correct pred. classified as high conf. PVP Predictive Value of a Positive test Probability of high conf. to be correct SPEC, Specificity: Fraction of mispred. classified as low conf. PVN, Predictive Value of a Negative test Probability of low conf. to be mispredicted Different qualities for different usages
6
6 The current limits of confidence prediction Discriminating between high and low confidence is unsufficient: What is the misp. rate on high and low confidence ? Malik et al: Use probability for each counter value on an enhanced JRS Enhanced JRS and state-of-the art branch predictors ? Each predictor its own confidence estimator
7
7 This study Cost-effective confidence estimator for TAGE No storage overhead Discrimate: Low conf. pred. : ≈ 30 % misp. rate or more Medium conf. pred.: 8-15% misp.rate High conf. pred. : < 1 % misp rate
8
8 TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories
9
9 pc h[0:L1] ctr u tag hash =? ctr u tag hash =? ctr u tag hash =? prediction pc h[0:L2]pch[0:L3] 1 11 1111 1 1 TAGE Geometric history length + PPM-like + optimized update policy Tagless base predictor
10
10
11
11 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit counter
12
12 A tagged table entry Ctr: 3-bit prediction counter U: 2-bit useful counter Was the entry recently useful ? Tag: partial tag TagCtr U
13
13 Updating the U counter If (Altpred ≠ Pred) then Pred = taken : U= U + 1 Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters implemented through the reset of a single bit
14
14 Allocating a new entry on a misprediction Find a single “useless” entry with a longer history: Priviledge the smallest possible history To minimize footprint But not too much To avoid ping-pong phenomena Initialize Ctr as weak and U as zero
15
15 Confidence by observation on TAGE Apart the prediction, the predictor delivers: The provider component and the value of the prediction counter High correlation with the quality of the predictions The history of mispredictions can also be observed burst of mispredictions might indicate predictor warming or program phase changing
16
16 Experimental framework 20 traces from the CBP-1 and 20 traces from the CBP-2 16Kbits TAGE : 5 tables, max hist 80 bits 64Kbits TAGE : 8 tables, max hist 130 bits 256Kbits TAGE : 9 tables, max hist 300 bits Probability of misprediction as a metric of confidence: Misprediction Per Kilopredictions (MKP)
17
17 Bimodal as the provider component Provides many (often most) of the predictions: Allocation of a tagged table entry happens on a misprediction Generally bimodal prediction = the bias of the branch 256Kbits TAGE, bimodal= very accurate prediction Often less than 1 MKP, always significantly lower than the global misprediction rate 16Kbits TAGE: Often bimodal= very accurate prediction On demanding apps: bimodal not better than average
18
18 Discriminating the bimodal predictions Weak counters: Systematically more than 250 MKP (generally more than 300 MKP) Can be classified as low confidence « Identify » conflicts due to limited predictor size: Was there a misprediction provided by the bimodal recently (10 last branches) ? ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits Can be classified as medium confidence The remaining: High confidence: <10 MKP, generally much less
19
19 A tagged component as the provider Discrimate on the values of the prediction counter |2ctr +1|TAGE 16Kbits TAGE 256Kbits Weak: 1 340 MKP 325 MKP Nearly Weak: 3 313 MKP 312 MKP Nearly Sat.: 5 213 MKP 225 MKP Saturated : 7 29 MKP 17 MKP
20
20 Tagged component as provider: a more thorough analysis Weak, Nearly Weak, Nearly Saturated: For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher Saturated: Slightly lower than the global misprediction rate of the applications Very high confidence for predictable applications (< 10 MKP) Not that high confidence for poorly predictable applications (> 50 MKP) Problem: Saturated often represents more than 50 % of the predictions
21
21
22
22 Intermediate summary High confidence class: (Bimodal saturated, no recent misprediction by bimodal) Low confidence class: Bimodal weak and not saturated tagged Medium confidence class: (Bimodal and recent misprediction by bimodal) Tagged saturated: Depends on applications, predictor size etc Very large class..
23
23 Tweaking the predictor to improve confidence
24
24 How to improve confidence on tagged counter saturated class Widening the prediction counter ? Not that good: Slightly decreased accuracy Only marginal improvement on accuracy on saturated class Modifying the counter update: Transition to saturated state with a very low probability P=1/128 in our experiments Marginal accuracy loss ( ≈ 0.02 MPKI)
25
25 Towards 3 confidence classes Tagged Saturated is high confidence Nearly Saturated is enlarged and is medium confidence 16 Kbits 64Kbits 256 Kbits Maximum 16 MKP13 MKP12 MKP Average 4 MKP 2 MKP 16 Kbits 64Kbits 256 Kbits Maximum 169 MKP 173 MKP 174 MKP Average 85 MKP 71 MKP 73 MKP
26
26 Towards 3 confidence classes Low confidence: Weak bimodal + Weak tagged + Nearly Weak tagged Medium confidence: Bimodal recently mispredicted + Nearly Saturated tagged High confidence: Bimodal saturated + Saturated tagged
27
27 Prediction and misprediction coverage high conf medium conf low conf 16Kbits0.740-0.093 (5)0.209-0.466 (85)0.051-0.439 (317) 64Kbits0.799-0.076 (3)0.160-0.450 (71)0.040-0.474 (316) 256 Kbits0.813-0.050 (2)0.148-0.455 (73)0.036-0.491 (325) Misprediction rate Prediction coverage Misprediction coverage
28
28 Behavior examples, 64Kbits high conf medium conf low conf twolf 15.143 MPKI0.465-0.053 (13)0.385-0.460 (137)0.150-0.487 (390) gcc 4.192 MPKI0.780-0.093 (3)0.195-0.450 (51)0.025-0.457 (295) vortex 0.300 MPKI0.976-0.004 (0)0.019-0.710(110)0.005-0.286 (207) Misprediction rate Prediction coverage Misprediction coverage
29
29
30
30 Predictions Mispredictions low medium high
31
31 Summary Many studies on applications of confidence estimations, but a very few on confidence estimators. Each predictor requires a different confidence estimator A very cost-effective and efficient confidence estimator for TAGE Storage free, very limited logic Discriminate between 3 confidence classes: Medium + low conf > 90 % of the mispredictions High conf in the range of 1 % mispredictions or less
32
32 The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.