Download presentation
Presentation is loading. Please wait.
Published byPamela Lane Modified over 9 years ago
1
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA
2
2 Storage Free Confidence Estimator for the TAGE predictor
3
3 Why confidence estimation for branch predictors Energy/performance tradeoffs: Guiding fetch gating or fetch throttling: Dynamic speculative structures resizing Controlling SMT resource allocation through fetch policies Fetch the “most” useful instructions Dual Path execution
4
4 What is confidence estimation ? Assert a confidence to a prediction : Is it likely that the prediction is correct ? Generally discriminate only low and high confidence predictions: High confidence: « very likely » to be correct Low confidence: « not so likely » to be correct
5
5 Confidence estimation for branch predictors 1981, Jim Smith: weak counters predictions are more likely to mispredict 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters Increment on correct prediction, reset on misprediction low confidence < threshold ≤ high confidence 1998 Enhanced JRS Grunwald et al: Use the prediction in the index A few other proposals: Self confidence for perceptrons.. Most studies still use enhanced JRS confidence estimators
6
6 Metrics for confidence estimators (Grunwald et al 1998) SENS Sensitivity: Fraction of correct pred. classified as high conf. PVP Predictive Value of a Positive test Probability of high conf. to be correct SPEC, Specificity: Fraction of mispred. classified as low conf. PVN, Predictive Value of a Negative test Probability of low conf. to be mispredicted Different qualities for different usages
7
7 The current limits of confidence prediction Discriminating between high and low confidence is unsufficient: What is the misp. rate on high and low confidence ? Malik et al: Use probability for each counter value on an enhanced JRS Enhanced JRS and state-of-the art branch predictors ? Each predictor its own confidence estimator
8
8 This study Cost-effective confidence estimator for TAGE No storage overhead Discrimate: Low conf. pred. : ≈ 30 % misp. rate or more Medium conf. pred.: 8-15% misp.rate High conf. pred. : < 1 % misp rate
9
9 TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories
10
10 pc h[0:L1] ctr u tag hash =? ctr u tag hash =? ctr u tag hash =? prediction pc h[0:L2]pch[0:L3] 1 11 1111 1 1 TAGE Geometric history length + PPM-like + optimized update policy Tagless base predictor
11
11
12
12 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit counter
13
13 A tagged table entry Ctr: 3-bit prediction counter U: 2-bit useful counter Was the entry recently useful ? Tag: partial tag TagCtr U
14
14 Confidence by observation on TAGE Apart the prediction, the predictor delivers: The provider component and the value of the prediction counter High correlation with the quality of the predictions The history of mispredictions can also be observed burst of mispredictions might indicate predictor warming or program phase changing
15
15 Experimental framework 20 traces from the CBP-1 and 20 traces from the CBP-2 16Kbits TAGE : 5 tables, max hist 80 bits 64Kbits TAGE : 8 tables, max hist 130 bits 256Kbits TAGE : 9 tables, max hist 300 bits Probability of misprediction as a metric of confidence: Misprediction Per Kilopredictions (MKP)
16
16 Bimodal as the provider component Provides many (often most) of the predictions: Allocation of a tagged table entry happens on a misprediction Generally bimodal prediction = the bias of the branch 256Kbits TAGE, bimodal= very accurate prediction Often less than 1 MKP, always significantly lower than the global misprediction rate 16Kbits TAGE: Often bimodal= very accurate prediction On demanding apps: bimodal not better than average
17
17 Discriminating the bimodal predictions Weak counters: Systematically more than 250 MKP (generally more than 300 MKP) Can be classified as low confidence « Identify » conflicts due to limited predictor size: Was there a misprediction provided by the bimodal recently (10 last branches) ? ≈80-150 MKP for 16Kbits, ≈50-70 MKP for 64Kbits Can be classified as medium confidence The remaining: High confidence: <10 MKP, generally much less
18
18 A tagged component as the provider Discrimate on the values of the prediction counter |2ctr +1|TAGE 16Kbits TAGE 256Kbits Weak: 1 340 MKP 325 MKP Nearly Weak: 3 313 MKP 312 MKP Nearly Sat.: 5 213 MKP 225 MKP Saturated : 7 29 MKP 17 MKP
19
19 Tagged component as provider: a more thorough analysis Weak, Nearly Weak, Nearly Saturated: For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher Saturated: Slightly lower than the global misprediction rate of the applications Very high confidence for predictable applications (< 10 MKP) Not that high confidence for poorly predictable applications (> 50 MKP) Problem: Saturated often represents more than 50 % of the predictions
20
20
21
21 Intermediate summary High confidence class: (Bimodal saturated, no recent misprediction by bimodal) Low confidence class: Bimodal weak and not saturated tagged Medium confidence class: (Bimodal and recent misprediction by bimodal) Tagged saturated: Depends on applications, predictor size etc Very large class..
22
22 Tweaking the predictor to improve confidence
23
23 How to improve confidence on tagged counter saturated class Widening the prediction counter ? Not that good: Slightly decreased accuracy Only marginal improvement on accuracy on saturated class Modifying the counter update: Transition to saturated state with a very low probability P=1/128 in our experiments Marginal accuracy loss ( ≈ 0.02 MPKI)
24
24 Towards 3 confidence classes Tagged Saturated is high confidence Nearly Saturated is enlarged and is medium confidence 16 Kbits 64Kbits 256 Kbits Maximum 16 MKP13 MKP12 MKP Average 4 MKP 2 MKP 16 Kbits 64Kbits 256 Kbits Maximum 169 MKP 173 MKP 174 MKP Average 85 MKP 71 MKP 73 MKP
25
25 Towards 3 confidence classes Low confidence: Weak bimodal + Weak tagged + Nearly Weak tagged Medium confidence: Bimodal recently mispredicted + Nearly Saturated tagged High confidence: Bimodal saturated + Saturated tagged
26
26 Prediction and misprediction coverage high conf medium conf low conf 16Kbits0.740-0.093 (5)0.209-0.466 (85)0.051-0.439 (317) 64Kbits0.799-0.076 (3)0.160-0.450 (71)0.040-0.474 (316) 256 Kbits0.813-0.050 (2)0.148-0.455 (73)0.036-0.491 (325) Misprediction rate Prediction coverage Misprediction coverage
27
27 Behavior examples, 64Kbits high conf medium conf low conf twolf 15.143 MPKI0.465-0.053 (13)0.385-0.460 (137)0.150-0.487 (390) gcc 4.192 MPKI0.780-0.093 (3)0.195-0.450 (51)0.025-0.457 (295) vortex 0.300 MPKI0.976-0.004 (0)0.019-0.710(110)0.005-0.286 (207) Misprediction rate Prediction coverage Misprediction coverage
28
28 Predictions Mispredictions low medium high
29
29 Summary on confidence estimation Many studies on applications of confidence estimations, but a very few on confidence estimators. Each predictor requires a different confidence estimator A very cost-effective and efficient confidence estimator for TAGE Storage free, very limited logic Discriminate between 3 confidence classes: Medium + low conf > 90 % of the mispredictions High conf in the range of 1 % mispredictions or less
30
30 SYRANT with Nathanael Prémillieu « Moderate cost » control independence exploitation
31
31 Why ? Branch pred. accuracy is reaching a plateau: TAGE 2006, ? Try something else..
32
32 not-taken path Reconvergence point Branch (if) taken path (else) Instruction flow Control flow reconvergence
33
33 Exploiting Control flow reconvergence Misprediction ! Can we save some useful work after the the reconvergence point
34
34 Control Dependent (CD) Control Independent Data Independent (CIDI) Reconvergence point Control Independent Data Dependent (CIDD) Shoud be conserved To be detected To invalidate
35
35 Difficulties Not the same renaming scheme on both paths: How to conserve results ? Identification of the reconvergence point: Check against all previously fetched instructions on the wrong path ? Identification of CIDI and CIDD instructions ?
36
36 Taken path Not-taken path Reconvergence point Branch P1 P2 P3 P4 P5 P6 P7 P8 P0 P1 P2 P3 P4 P5 P6 P7 P8 P0 Gap Unused registers SYmmetric Resource Allocation on Not- taken and Taken paths Physical registers (LSQ entries, ROB entries) Insert gaps to reuse same physical registers
37
37 I0 I1 I2 I3 X1 X2 Y4 Y5 Y6 X7 Y8 X9 Execution Reconvergence Branch X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Register validity through a tagging process at rename stage at refetch On a misprediction, increment the tag: X to Y Predicted path Corrected path
38
38 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags
39
39 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags
40
40 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags
41
41 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags
42
42 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags
43
43 Reconvergence detection Precise detection would require checking every PC for each instruction Use approximate detection Detect the first branch after reconvergence
44
44 B1 B2 B3 B4 B5 B6 B7 1 12 17 22 23 29 40 T T NT T T B3 B4 B5 B6 B7 17 22 23 29 40 NT T T Branch NbR Direction Active Branch List Shadow Branch List Approximate detection of the reconvergence point Copy wrong path on branch misprediction detection
45
45 B1 B2 B'3 B'4 B'5 B6 1 12 23 27 28 32 T NT T T B3 B4 B5 B6 B7 17 22 23 29 40 NT T T ABLSBL Allows to monitor the resource consumption on both paths
46
46 B1 B2 RP2 RP1 WP Taken B1 RP1 RP Not-Taken Determine the gap B1 B2 RP2 RP1 Taken Use the gap RANT
47
47 Gap size issue The two paths may be very different: Waste of resource Sometimes 100’s of instructions Different filters: Only try when gap size is limited Only try if wrong path was the longest Only try if branch confidence is low (or medium) Only try if reconvergence point/gap confidence is high
48
48 Continue execution after branch misprediction resolution On « normal » superscalar processors: Kill every instruction after the misprediction Control independence exploitation: Let execution continue until resources are claimed back Phantom execution
49
49 Preliminary performance evaluation 8-way superscalar, deep pipeline 20-stage Very large instruction window TAGE predictor SPEC 2006
50
50 Reconvergence is detected in most cases
51
51 Some speed-up but relatively poor
52
52 4-way issue processor
53
53 That’s preliminary.. No gap size limit on the predicted path No discrimination on medium/low confidence No retroaction on branch prediction Just did not use the computed path.
54
54 Preexecution of branches Just consider ABL/SBL mechanisms: Can preexecution of branches be helpful ? Without visibility on validity With visibility on validity (in SYRANT) –To be done
55
55 Just use preexecution to guide the branch prediction
56
56 Summary on SYRANT Control Independence exists: Can be potentially exploited through a SYRANT-like mechanism: Still to be improved/understood Need to understand retroactions Can exploit pre-execution of branches Reduce misprediction rate
57
57
58
58 Back Ups
59
59 T1 T2 T4 T5 T6 T7 T8 T9 Execution Reconvergence point Branch T9 T7,T8 Execution Branch T1 T2 N3 N4 N5 N6 T7 N8 N9 T8 T4,T2 T7 T1,T2 N9 T7,N8 N8 N5,T2 T7 T1,T2 R18,R25 R1 R12,R13 R2 R23,R24 R6 R26,R30 R4 R4,R16 R15 R1,R2 R3 R6,R2 R7 R7,R3 R9 R18,R25 R1 R12,R13 R2 R17,R11 R4 R19,R22 R6 R4,R16 R15 R1,R2 R3 R6,R2 R7 R7,R3 R9 R15,R14 R5 Taken path Not-taken path Control Dependent Instructions CID D CIDI CID D
60
60 Updating the U counter If (Altpred ≠ Pred) then Pred = taken : U= U + 1 Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters implemented through the reset of a single bit
61
61 Allocating a new entry on a misprediction Find a single “useless” entry with a longer history: Priviledge the smallest possible history To minimize footprint But not too much To avoid ping-pong phenomena Initialize Ctr as weak and U as zero
62
62 TAGE update policy General principle: Minimize the footprint of the prediction. Just update the longest history matching component and allocate at most one entry on mispredictions
63
63 Reconvergence point Branch (if) (else) incorrect path (then) Instruction flow
64
64 Taken path Not-taken path Reconvergence point Branch P1 P2 P3 P4 P5 P6 P7 P8 P0 P1 P2 P3 P4 P5 P6 P7 P8 P0 Gap Unused registers SYmmetric Resource Allocation on Not- taken and Taken paths Physical registers
65
65 B1 B2 B3 B4 B5 B6 B7 1 12 17 22 23 29 40 T T NT T T Branch CICI Direction Branch CICI Direction ABLSBL
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.