TAGE-SC-L Again MTAGE-SC

Slides:

Advertisements

Similar presentations

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.

Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *

Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.

Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.

André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Dynamic Branch Prediction

Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.

A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.

TAGE-SC-L Branch Predictors

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

Branch Target Buffers BPB: Tag + Prediction

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Dynamic Branch Prediction

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

Perceptrons Branch Prediction and its’ recent developments

EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.

Analysis of Branch Predictors

1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.

1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.

1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.

Temporal Stream Branch Predictor (TS Predictor) Yongming Shen, Michael Ferdman.

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Value Prediction Kyaw Kyaw, Min Pan Final Project.

Samira Khan University of Virginia April 12, 2016

Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University.

Dynamic Branch Prediction

CSL718 : Pipelined Processors

Multilayer Perceptron based Branch Predictor

CS203 – Advanced Computer Architecture

Computer Structure Advanced Branch Prediction

Dynamic Branch Prediction

Computer Architecture Advanced Branch Prediction

Multiperspective Perceptron Predictor with TAGE

Dynamically Sizing the TAGE Branch Predictor

CS 704 Advanced Computer Architecture

FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.

Samira Khan University of Virginia Dec 4, 2017

CMSC 611: Advanced Computer Architecture

Exploring Value Prediction with the EVES predictor

Looking for limits in branch prediction with the GTL predictor

Module 3: Branch Prediction

Phase Capture and Prediction with Applications

Lecture: Branch Prediction

Dynamic Branch Prediction

Lecture 10: Branch Prediction and Instruction Delivery

5th JILP Workshop on Computer Architecture Competitions

Adapted from the slides of Prof

Dynamic Hardware Prediction

rePLay: A Hardware Framework for Dynamic Optimization

The O-GEHL branch predictor

Eshan Bhatia1, Gino Chacon1, Elvira Teran2, Paul V. Gratz1, Daniel A

Presentation transcript:

TAGE-SC-L Again MTAGE-SC André Seznec INRIA/IRISA

Where do these predictors come from ? GEHL: CBP 2004 , ISCA 2005 TAGE: JILP 2006, CBP 2006 Statistical correlation: CBP 2011 Combining more info: Micro 2011, CBP 2014, Micro 2015 Optimizing everything: CBP 2016 Unlimited: CBP 2014  CBP 2016

« long » global histories: Around 2002 Introduction of perceptron predictor (Jimenez01) State-of-the-art : EV8 predictor Lagging behind perceptron on a few benchmarks + with EV8-like: some applications would benefit from 100+ history bits Both able to handle « long » global histories: 30+ branches

GEOMETRIC HISTORY LENGTH PREDICTOR CBP 2004

A Multiple length global history predictor Σ L(4) L(3) L(2) L(1) T0 T1 T2 T3 T4 With a limited number of tables

Table T2 should allow to discriminate Underlying idea H and H’ two history vectors equal on N bits, but differ on bit N+1 e.g. L(1)NL(2) Branches (A,H) and (A,H’) biased in opposite directions Table T2 should allow to discriminate between (A,H) and (A,H’)

GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

GEHL (CBP 2004) Neural inspired Use of 200+ bits of global history Narrow counters Dynamic threshold update

TAgged GEometric history length predictor TAGE TAgged GEometric history length predictor JILP 2006

At CBP 2004, only neural predictors apart PPM-like predictor (Michaud 2004) but .. The update policy was poor

TAGE (JILP 2006) Partial tag match almost .. Geometric history length Very effective update policy

TAGE: Tagged and prediction by the longest history matching entry pc h[0:L1] ctr u tag =? prediction h[0:L2] h[0:L3] 1 Tagless base predictor

=? 1 Hit Altpred Pred Miss

Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through 4-bit counters

A tagged table entry Ctr: 3-bit prediction counter U: 1 or 2-bit counters Was the entry recently useful ? Tag: partial tag Tag Ctr U

Allocate entries on mispredictions Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 Limited storage budget: Allocate 2 entries (when 15 to 20 different history lengths)

Managing the (U)seful counter Increment when avoids a misprediction (Pred = taken) & (Altpred ≠ taken) Becomes « useful » Global decrement when it becomes « difficult » to allocate: Many possible heuristics (« difficult » ≈ 2/3 of the entries useful)  CBP 2016 heuristics: ≈ 0.5 % MPKI

May vary with individual benchmarks ! TAGE vs GEHL: At equal sizes: ≈ 10 % MPKI reduction May vary with individual benchmarks !

Optimizations for CBP2016 ≈ 2 % MPKI reduction Sharing storage space Small hist. sharing a bank-interleaved table Small tag (8 bits) Long hist. sharing a bank-interleaved table Longer tag (12 bits) Partial associativity 2 banks for medium hist. Lengths ≈ 2 % MPKI reduction

Statistical Corrector TAGE + (G)SC Statistical Corrector (Global history) CBP2011

From CBP 2011, « the Statistical Corrector targets » Branches with poor correlation with history: Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that: « For this (PC, history, prediction), TAGE is likely (>50 %) to mispredict » statistically

TAGE-GSC ( CBP 2011) (was named a posteriori in Micro 2015) ≈3-5% MPKI red. (Main) TAGE Predictor Stat. Cor. Prediction + Confidence PPC + Glob hist PC +Global history Just a global hist neural predictor: + tables indexed with PC, TAGE pred. and confidence

Confidence for TAGE (HPCA 2011) The value of the counter providing the prediction: Saturated = high confidence Intermediate= medium confidence Weak = low confidence

Why does it work The bias tables indexed with PC+TAGE outputs: Correct (most of the time) High counter value Dominates, not many updates Wrong Other counters can be trained (Statistical) Correlation (if it exists) can be captured

Optimizations for CBP 2016 Use TAGE confidence for indexing SC ≈ 1 % MPKI red. On (very) low SC confidence: May use TAGE prediction (if high conf, ..) ≈ 0.4 % MPKI red.

The beauty of neural predictors TAGE-SC The beauty of neural predictors Micro 2011, CBP 2014, Micro 2015

From Compaq in 1999 I learnt: Use global history Avoid local history OK, I cheated with loops I learnt: Use global history Avoid local history Did manage to submit only global history at CBP 2004, 2006 and 2011

Speculative history must be managed !? Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!? Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history

without using local history Would not have won CBP 2014 without using local history

How to use local histories with TAGE+(G)SC Add the local history tables in the neural SC as in the perceptron [Jimenez2002] ≈ 0.9 % MPKI reduction with 2Kbits on the 8KB predictor ≈ 2.5 % MPKI reduction with 28Kbits on the 64KB predictor I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS

The beauty of neural predictors TAGE-SC: Just the right framework to test information vectors Add extra tables: some benefit ! continue to explore

Can add extra components in SC IMLI-based components Micro2015 Capture correlation in multidimensional loops Very disappointing results essentially no benefit on CBP5 traces Other forms of history: E.g. only backward branches

TAGE-SC-L + a loop predictor (just in case)

Loop predictor Can predict loop exit for loops with large iteration numbers regular number of iterations Limited storage budget (a few entries) But marginal benefit I DO NOT ADVOCATE FOR LOCAL HISTORIES IN REAL HARDWARE PROCESSORS

TAGE-SC-L summary for CBP-5 Most of the budget on global hist. correlation: -TAGE with ≈ 1200 br. for 64 KB and ≈ 400 br. for 8KB -optimize the storage sharing -optimize the allocation Track the statistical correlation with a neural component: -use TAGE prediction AND confidence -incorporate other forms of history (even local history if you are trying to win CBP-5)

TAGE-SC-L is still far from the predictability limits MTAGE-SC TAGE-SC-L is still far from the predictability limits

poTAGE-SC: the previous champion poTAGE+COLT (Michaud2014) and TAGE-SC-L

poTAGE + COLT (Michaud2014) TAGE predictors a (PC + 5 pred) indexed table Global history Local history 1 Local history 2 COLT selection Local History 3 Frequency Use TAGE concept on other forms of hist.

Unlimited TAGE-SC TAGE predictor Statistical Corrector Global history Bias GEHL RHSP Final choser other GEHL and perceptrons ...

poTAGE-SC TAGE predictors Statistical Corrector Global history Bias Local history 1 Local history 2 Local History 3 Frequency COLT selection Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector

MTAGE-SC Global history Local history 1 Local history 2 Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector

MTAGE-SC ≈ 5 % MPKI reduction over poTAGE-SC Global history Local history 1 Local history 2 Local History 3 Frequency Global backward history TAGE prediction combiner Bias GEHL RHSP ... other GEHL and perceptrons Final choser TAGE predictors Statistical Corrector ≈ 5 % MPKI reduction over poTAGE-SC Leverages confidence from SC and TAGE pred. combiner TAGE prediction combiner: COLT pred + neural combination of outputs pred + confidence Global backward history: to capture long path correlation, but eliminate intermediate branches A few extra history forms: IMLI, ..

Seems that I am not making progress !! CBP 2006 misp. rate: 32KB L-TAGE ≈ 1.22 GTL CBP 2014 misp.rate: 32KB TAGE-SC-L ≈ 1.40 poTAGE-SC CBP 2016 misp.rate: 64KB TAGE-SC-L ≈ 1.55 MTAGE-SC Not the same traces, but ..

Conclusion TAGE-SC-L fits limited storage sizes: Most significant optimizations over CBP 2014 Use of TAGE confidence as index for SC Sharing and partial associativity MTAGE-SC: Predictability limits even (a little bit) further that previously expected