Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

Similar presentations


Presentation on theme: "1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA."— Presentation transcript:

1 1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA

2 2 Why confidence estimation for branch predictors Energy/performance tradeoffs: Guiding fetch gating or fetch throttling: Dynamic speculative structures resizing Controlling SMT resource allocation through fetch policies Fetch the “most” useful instructions Dual Path execution

3 3 What is confidence estimation ? Assert a confidence to a prediction :  Is it likely that the prediction is correct ? Generally discriminate only low and high confidence predictions: High confidence: « very likely » to be correct Low confidence: « not so likely » to be correct

4 4 Confidence estimation for branch predictors 1981, Jim Smith: weak counters predictions are more likely to mispredict 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters  Increment on correct prediction, reset on misprediction  low confidence < threshold ≤ high confidence 1998 Enhanced JRS Grunwald et al:  Use the prediction in the index A few other proposals: Self confidence for perceptrons.. Most studies still use enhanced JRS confidence estimators

5 5 Metrics for confidence estimators (Grunwald et al 1998) SENS Sensitivity: Fraction of correct pred. classified as high conf. PVP Predictive Value of a Positive test Probability of high conf. to be correct SPEC, Specificity: Fraction of mispred. classified as low conf. PVN, Predictive Value of a Negative test Probability of low conf. to be mispredicted Different qualities for different usages

6 6 The current limits of confidence prediction Discriminating between high and low confidence is unsufficient: What is the misp. rate on high and low confidence ? Malik et al:  Use probability for each counter value on an enhanced JRS Enhanced JRS and state-of-the art branch predictors ? Each predictor  its own confidence estimator

7 7 This study Cost-effective confidence estimator for TAGE No storage overhead Discrimate:  Low conf. pred. : ≈ 30 % misp. rate or more  Medium conf. pred.: 8-15% misp.rate  High conf. pred. : < 1 % misp rate

8 8 TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories

9 9 pc h[0:L1] ctr u tag hash =? ctr u tag hash =? ctr u tag hash =? prediction pc h[0:L2]pch[0:L3] 1 11 1111 1 1 TAGE Geometric history length + PPM-like + optimized update policy Tagless base predictor

10 10

11 11 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit counter

12 12 A tagged table entry Ctr: 3-bit prediction counter U: 2-bit useful counter Was the entry recently useful ? Tag: partial tag TagCtr U

13 13 Updating the U counter If (Altpred ≠ Pred) then Pred = taken : U= U + 1 Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters implemented through the reset of a single bit

14 14 Allocating a new entry on a misprediction Find a single “useless” entry with a longer history: Priviledge the smallest possible history  To minimize footprint But not too much  To avoid ping-pong phenomena Initialize Ctr as weak and U as zero

15 15 Confidence by observation on TAGE Apart the prediction, the predictor delivers: The provider component and the value of the prediction counter  High correlation with the quality of the predictions The history of mispredictions can also be observed  burst of mispredictions might indicate predictor warming or program phase changing

16 16 Experimental framework 20 traces from the CBP-1 and 20 traces from the CBP-2 16Kbits TAGE : 5 tables, max hist 80 bits 64Kbits TAGE : 8 tables, max hist 130 bits 256Kbits TAGE : 9 tables, max hist 300 bits Probability of misprediction as a metric of confidence: Misprediction Per Kilopredictions (MKP)

17 17 Bimodal as the provider component Provides many (often most) of the predictions: Allocation of a tagged table entry happens on a misprediction  Generally bimodal prediction = the bias of the branch 256Kbits TAGE, bimodal= very accurate prediction Often less than 1 MKP, always significantly lower than the global misprediction rate 16Kbits TAGE: Often bimodal= very accurate prediction On demanding apps: bimodal not better than average

18 18 Discriminating the bimodal predictions Weak counters: Systematically more than 250 MKP (generally more than 300 MKP)  Can be classified as low confidence « Identify » conflicts due to limited predictor size: Was there a misprediction provided by the bimodal recently (10 last branches) ?  ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits  Can be classified as medium confidence The remaining: High confidence: <10 MKP, generally much less

19 19 A tagged component as the provider Discrimate on the values of the prediction counter |2ctr +1|TAGE 16Kbits TAGE 256Kbits Weak: 1 340 MKP 325 MKP Nearly Weak: 3 313 MKP 312 MKP Nearly Sat.: 5 213 MKP 225 MKP Saturated : 7 29 MKP 17 MKP

20 20 Tagged component as provider: a more thorough analysis Weak, Nearly Weak, Nearly Saturated: For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher Saturated: Slightly lower than the global misprediction rate of the applications  Very high confidence for predictable applications (< 10 MKP)  Not that high confidence for poorly predictable applications (> 50 MKP) Problem: Saturated often represents more than 50 % of the predictions

21 21

22 22 Intermediate summary High confidence class: (Bimodal saturated, no recent misprediction by bimodal) Low confidence class: Bimodal weak and not saturated tagged Medium confidence class: (Bimodal and recent misprediction by bimodal) Tagged saturated: Depends on applications, predictor size etc  Very large class..

23 23 Tweaking the predictor to improve confidence

24 24 How to improve confidence on tagged counter saturated class Widening the prediction counter ? Not that good:  Slightly decreased accuracy  Only marginal improvement on accuracy on saturated class Modifying the counter update: Transition to saturated state with a very low probability  P=1/128 in our experiments  Marginal accuracy loss ( ≈ 0.02 MPKI)

25 25 Towards 3 confidence classes Tagged Saturated is high confidence Nearly Saturated is enlarged and is medium confidence 16 Kbits 64Kbits 256 Kbits Maximum 16 MKP13 MKP12 MKP Average 4 MKP 2 MKP 16 Kbits 64Kbits 256 Kbits Maximum 169 MKP 173 MKP 174 MKP Average 85 MKP 71 MKP 73 MKP

26 26 Towards 3 confidence classes Low confidence: Weak bimodal + Weak tagged + Nearly Weak tagged Medium confidence: Bimodal recently mispredicted + Nearly Saturated tagged High confidence: Bimodal saturated + Saturated tagged

27 27 Prediction and misprediction coverage high conf medium conf low conf 16Kbits0.740-0.093 (5)0.209-0.466 (85)0.051-0.439 (317) 64Kbits0.799-0.076 (3)0.160-0.450 (71)0.040-0.474 (316) 256 Kbits0.813-0.050 (2)0.148-0.455 (73)0.036-0.491 (325) Misprediction rate Prediction coverage Misprediction coverage

28 28 Behavior examples, 64Kbits high conf medium conf low conf twolf 15.143 MPKI0.465-0.053 (13)0.385-0.460 (137)0.150-0.487 (390) gcc 4.192 MPKI0.780-0.093 (3)0.195-0.450 (51)0.025-0.457 (295) vortex 0.300 MPKI0.976-0.004 (0)0.019-0.710(110)0.005-0.286 (207) Misprediction rate Prediction coverage Misprediction coverage

29 29

30 30 Predictions Mispredictions low medium high

31 31 Summary Many studies on applications of confidence estimations, but a very few on confidence estimators. Each predictor requires a different confidence estimator A very cost-effective and efficient confidence estimator for TAGE Storage free, very limited logic Discriminate between 3 confidence classes:  Medium + low conf > 90 % of the mispredictions  High conf in the range of 1 % mispredictions or less

32 32 The End


Download ppt "1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA."

Similar presentations


Ads by Google