1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Branch prediction Titov Alexander MDSP November, 2009.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Dynamic Branch Prediction
André Seznec Caps Team IRISA/INRIA Design tradeoffs for the Alpha EV8 Conditional Branch Predictor André Seznec, IRISA/INRIA Stephen Felix, Intel Venkata.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Better Branch Prediction Through Prophet/Critic Hybrids A. Falcón, J. Stark, A. Ramirez, K. Lai, M. Valero Paper Presentation and Discussion.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
Prophet/Critic Hybrid Branch Prediction B B B
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Lecture: Out-of-order Processors
Computer Structure Advanced Branch Prediction
Computer Architecture Advanced Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
CS 704 Advanced Computer Architecture
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Lecture: Static ILP, Branch Prediction
Lecture: Branch Prediction
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Dynamic Hardware Prediction
Patrick Akl and Andreas Moshovos AENAO Research Group
Lecture 9: Dynamic ILP Topics: out-of-order processors
The O-GEHL branch predictor
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA

2 Storage Free Confidence Estimator for the TAGE predictor

3 Why confidence estimation for branch predictors Energy/performance tradeoffs: Guiding fetch gating or fetch throttling: Dynamic speculative structures resizing Controlling SMT resource allocation through fetch policies Fetch the “most” useful instructions Dual Path execution

4 What is confidence estimation ? Assert a confidence to a prediction :  Is it likely that the prediction is correct ? Generally discriminate only low and high confidence predictions: High confidence: « very likely » to be correct Low confidence: « not so likely » to be correct

5 Confidence estimation for branch predictors 1981, Jim Smith: weak counters predictions are more likely to mispredict 1996, Jacobsen, Rotenberg, Smith: Gshare-like 4-bit counters  Increment on correct prediction, reset on misprediction  low confidence < threshold ≤ high confidence 1998 Enhanced JRS Grunwald et al:  Use the prediction in the index A few other proposals: Self confidence for perceptrons.. Most studies still use enhanced JRS confidence estimators

6 Metrics for confidence estimators (Grunwald et al 1998) SENS Sensitivity: Fraction of correct pred. classified as high conf. PVP Predictive Value of a Positive test Probability of high conf. to be correct SPEC, Specificity: Fraction of mispred. classified as low conf. PVN, Predictive Value of a Negative test Probability of low conf. to be mispredicted Different qualities for different usages

7 The current limits of confidence prediction Discriminating between high and low confidence is unsufficient: What is the misp. rate on high and low confidence ? Malik et al:  Use probability for each counter value on an enhanced JRS Enhanced JRS and state-of-the art branch predictors ? Each predictor  its own confidence estimator

8 This study Cost-effective confidence estimator for TAGE No storage overhead Discrimate:  Low conf. pred. : ≈ 30 % misp. rate or more  Medium conf. pred.: 8-15% misp.rate  High conf. pred. : < 1 % misp rate

9 TAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories

10 pc h[0:L1] ctr u tag hash =? ctr u tag hash =? ctr u tag hash =? prediction pc h[0:L2]pch[0:L3] TAGE Geometric history length + PPM-like + optimized update policy Tagless base predictor

11

12 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit counter

13 A tagged table entry Ctr: 3-bit prediction counter U: 2-bit useful counter Was the entry recently useful ? Tag: partial tag TagCtr U

14 Confidence by observation on TAGE Apart the prediction, the predictor delivers: The provider component and the value of the prediction counter  High correlation with the quality of the predictions The history of mispredictions can also be observed  burst of mispredictions might indicate predictor warming or program phase changing

15 Experimental framework 20 traces from the CBP-1 and 20 traces from the CBP-2 16Kbits TAGE : 5 tables, max hist 80 bits 64Kbits TAGE : 8 tables, max hist 130 bits 256Kbits TAGE : 9 tables, max hist 300 bits Probability of misprediction as a metric of confidence: Misprediction Per Kilopredictions (MKP)

16 Bimodal as the provider component Provides many (often most) of the predictions: Allocation of a tagged table entry happens on a misprediction  Generally bimodal prediction = the bias of the branch 256Kbits TAGE, bimodal= very accurate prediction Often less than 1 MKP, always significantly lower than the global misprediction rate 16Kbits TAGE: Often bimodal= very accurate prediction On demanding apps: bimodal not better than average

17 Discriminating the bimodal predictions Weak counters: Systematically more than 250 MKP (generally more than 300 MKP)  Can be classified as low confidence « Identify » conflicts due to limited predictor size: Was there a misprediction provided by the bimodal recently (10 last branches) ?  ≈ MKP for 16Kbits, ≈50-70 MKP for 64Kbits  Can be classified as medium confidence The remaining: High confidence: <10 MKP, generally much less

18 A tagged component as the provider Discrimate on the values of the prediction counter |2ctr +1|TAGE 16Kbits TAGE 256Kbits Weak: MKP 325 MKP Nearly Weak: MKP 312 MKP Nearly Sat.: MKP 225 MKP Saturated : 7 29 MKP 17 MKP

19 Tagged component as provider: a more thorough analysis Weak, Nearly Weak, Nearly Saturated: For all benchmarks, for the three TAGE configurations in the range of 200 MKP or higher Saturated: Slightly lower than the global misprediction rate of the applications  Very high confidence for predictable applications (< 10 MKP)  Not that high confidence for poorly predictable applications (> 50 MKP) Problem: Saturated often represents more than 50 % of the predictions

20

21 Intermediate summary High confidence class: (Bimodal saturated, no recent misprediction by bimodal) Low confidence class: Bimodal weak and not saturated tagged Medium confidence class: (Bimodal and recent misprediction by bimodal) Tagged saturated: Depends on applications, predictor size etc  Very large class..

22 Tweaking the predictor to improve confidence

23 How to improve confidence on tagged counter saturated class Widening the prediction counter ? Not that good:  Slightly decreased accuracy  Only marginal improvement on accuracy on saturated class Modifying the counter update: Transition to saturated state with a very low probability  P=1/128 in our experiments  Marginal accuracy loss ( ≈ 0.02 MPKI)

24 Towards 3 confidence classes Tagged Saturated is high confidence Nearly Saturated is enlarged and is medium confidence 16 Kbits 64Kbits 256 Kbits Maximum 16 MKP13 MKP12 MKP Average 4 MKP 2 MKP 16 Kbits 64Kbits 256 Kbits Maximum 169 MKP 173 MKP 174 MKP Average 85 MKP 71 MKP 73 MKP

25 Towards 3 confidence classes Low confidence: Weak bimodal + Weak tagged + Nearly Weak tagged Medium confidence: Bimodal recently mispredicted + Nearly Saturated tagged High confidence: Bimodal saturated + Saturated tagged

26 Prediction and misprediction coverage high conf medium conf low conf 16Kbits (5) (85) (317) 64Kbits (3) (71) (316) 256 Kbits (2) (73) (325) Misprediction rate Prediction coverage Misprediction coverage

27 Behavior examples, 64Kbits high conf medium conf low conf twolf MPKI (13) (137) (390) gcc MPKI (3) (51) (295) vortex MPKI (0) (110) (207) Misprediction rate Prediction coverage Misprediction coverage

28 Predictions Mispredictions low medium high

29 Summary on confidence estimation Many studies on applications of confidence estimations, but a very few on confidence estimators. Each predictor requires a different confidence estimator A very cost-effective and efficient confidence estimator for TAGE Storage free, very limited logic Discriminate between 3 confidence classes:  Medium + low conf > 90 % of the mispredictions  High conf in the range of 1 % mispredictions or less

30 SYRANT with Nathanael Prémillieu « Moderate cost » control independence exploitation

31 Why ? Branch pred. accuracy is reaching a plateau: TAGE 2006, ? Try something else..

32 not-taken path Reconvergence point Branch (if) taken path (else) Instruction flow Control flow reconvergence

33 Exploiting Control flow reconvergence Misprediction ! Can we save some useful work after the the reconvergence point

34 Control Dependent (CD) Control Independent Data Independent (CIDI) Reconvergence point Control Independent Data Dependent (CIDD) Shoud be conserved To be detected To invalidate

35 Difficulties Not the same renaming scheme on both paths:  How to conserve results ? Identification of the reconvergence point: Check against all previously fetched instructions on the wrong path ? Identification of CIDI and CIDD instructions ?

36 Taken path Not-taken path Reconvergence point Branch P1 P2 P3 P4 P5 P6 P7 P8 P0 P1 P2 P3 P4 P5 P6 P7 P8 P0 Gap Unused registers SYmmetric Resource Allocation on Not- taken and Taken paths Physical registers (LSQ entries, ROB entries) Insert gaps to reuse same physical registers

37 I0 I1 I2 I3 X1 X2 Y4 Y5 Y6 X7 Y8 X9 Execution Reconvergence Branch X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Register validity through a tagging process at rename stage at refetch On a misprediction, increment the tag: X to Y Predicted path Corrected path

38 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags

39 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags

40 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags

41 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags

42 X1 X2 Y4 Y5 Y6 X7 Y8 X9 X3 T2 N4 N5 T7 X1 X2 X4 X5 X6 X7 X8 X9 R5,R1 R5,R21 R1,R2 R1 R2 R9 R5 R6 Y3 R5 R6 R7 R21 R7 R6,R7 Conserve tag and validity if 1)same instruction 2)same operands including tags

43 Reconvergence detection Precise detection would require checking every PC for each instruction Use approximate detection Detect the first branch after reconvergence

44 B1 B2 B3 B4 B5 B6 B T T NT T T B3 B4 B5 B6 B NT T T Branch NbR Direction Active Branch List Shadow Branch List Approximate detection of the reconvergence point Copy wrong path on branch misprediction detection

45 B1 B2 B'3 B'4 B'5 B T NT T T B3 B4 B5 B6 B NT T T ABLSBL Allows to monitor the resource consumption on both paths

46 B1 B2 RP2 RP1 WP Taken B1 RP1 RP Not-Taken Determine the gap B1 B2 RP2 RP1 Taken Use the gap RANT

47 Gap size issue The two paths may be very different: Waste of resource  Sometimes 100’s of instructions Different filters: Only try when gap size is limited Only try if wrong path was the longest Only try if branch confidence is low (or medium) Only try if reconvergence point/gap confidence is high

48 Continue execution after branch misprediction resolution On « normal » superscalar processors: Kill every instruction after the misprediction Control independence exploitation: Let execution continue until resources are claimed back Phantom execution

49 Preliminary performance evaluation 8-way superscalar, deep pipeline 20-stage Very large instruction window TAGE predictor SPEC 2006

50 Reconvergence is detected in most cases

51 Some speed-up but relatively poor

52 4-way issue processor

53 That’s preliminary.. No gap size limit on the predicted path No discrimination on medium/low confidence No retroaction on branch prediction Just did not use the computed path.

54 Preexecution of branches Just consider ABL/SBL mechanisms:  Can preexecution of branches be helpful ?  Without visibility on validity  With visibility on validity (in SYRANT) –To be done

55 Just use preexecution to guide the branch prediction

56 Summary on SYRANT Control Independence exists: Can be potentially exploited through a SYRANT-like mechanism:  Still to be improved/understood  Need to understand retroactions Can exploit pre-execution of branches  Reduce misprediction rate

57

58 Back Ups

59 T1 T2 T4 T5 T6 T7 T8 T9 Execution Reconvergence point Branch T9 T7,T8 Execution Branch T1 T2 N3 N4 N5 N6 T7 N8 N9 T8 T4,T2 T7 T1,T2 N9 T7,N8 N8 N5,T2 T7 T1,T2 R18,R25 R1 R12,R13 R2 R23,R24 R6 R26,R30 R4 R4,R16 R15 R1,R2 R3 R6,R2 R7 R7,R3 R9 R18,R25 R1 R12,R13 R2 R17,R11 R4 R19,R22 R6 R4,R16 R15 R1,R2 R3 R6,R2 R7 R7,R3 R9 R15,R14 R5 Taken path Not-taken path Control Dependent Instructions CID D CIDI CID D

60 Updating the U counter If (Altpred ≠ Pred) then Pred = taken : U= U + 1 Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters implemented through the reset of a single bit

61 Allocating a new entry on a misprediction Find a single “useless” entry with a longer history: Priviledge the smallest possible history  To minimize footprint But not too much  To avoid ping-pong phenomena Initialize Ctr as weak and U as zero

62 TAGE update policy General principle: Minimize the footprint of the prediction. Just update the longest history matching component and allocate at most one entry on mispredictions

63 Reconvergence point Branch (if) (else) incorrect path (then) Instruction flow

64 Taken path Not-taken path Reconvergence point Branch P1 P2 P3 P4 P5 P6 P7 P8 P0 P1 P2 P3 P4 P5 P6 P7 P8 P0 Gap Unused registers SYmmetric Resource Allocation on Not- taken and Taken paths Physical registers

65 B1 B2 B3 B4 B5 B6 B T T NT T T Branch CICI Direction Branch CICI Direction ABLSBL