André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

Branch prediction Titov Alexander MDSP November, 2009.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
André Seznec Caps Team IRISA/INRIA Design tradeoffs for the Alpha EV8 Conditional Branch Predictor André Seznec, IRISA/INRIA Stephen Felix, Intel Venkata.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Combining Branch Predictors
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Perceptrons Branch Prediction and its’ recent developments
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
André Seznec Caps Team IRISA/INRIA HAVEGE HArdware Volatile Entropy Gathering and Expansion Unpredictable random number generation at user level André.
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
Prophet/Critic Hybrid Branch Prediction B B B
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Samira Khan University of Virginia April 12, 2016
CSL718 : Pipelined Processors
Data Prefetching Smruti R. Sarangi.
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Design tradeoffs for the Alpha EV8 Conditional Branch Predictor
Lecture: Static ILP, Branch Prediction
Lecture: Branch Prediction
Lecture 10: Branch Prediction and Instruction Delivery
Data Prefetching Smruti R. Sarangi.
TAGE-SC-L Again MTAGE-SC
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
Dynamic Hardware Prediction
rePLay: A Hardware Framework for Dynamic Optimization
The O-GEHL branch predictor
Gang Luo, Hongfei Guo {gangluo,
Samira Khan University of Virginia Mar 6, 2019
Presentation transcript:

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 2 Branch prediction is still an issue  Long pipelines  Several instructions per cycle  Any gain in branch prediction accuracy results in:  Performance gain  Power consumption gain

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 3 Conditional branch predictors 25 years of background work  Two main sources of information:  Local history the past behavior of this particular branch  Global history the (recent) past behavior of all branches

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 4 Speculative history must be managed !?  Global history:  Append a bit on a single history register  Use of a circular buffer and just a pointer to speculatively manage the history  Local history:  table of histories (unspeculatively updated)  must maintain a speculative history per inflight branch: Associative search, etc ?!?

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 5 From my experience with EV8 team Designers hate maintaining speculative local history

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 6 “Classical” stuff with the OGEHL predictor  Global history based:  Yeh and Patt 91, Pan and So 91  Multiple tables using different history lengths  McFarling 93, Evers et al. 96, EV8 predictor

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 7 Selecting between multiple predictions  Classic solution:  Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ??  Neural inspired predictors:  Use an adder tree instead of a meta-predictor Vintan and Iridon 99, Jiménez and Lin 01 Let’s use the adder tree

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 8 L(0) ∑ L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 Multiple history length predictor Final computation through a sum Prediction=Sign

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 9 GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 10 Dynamic update threshold fitting Reasonable fixed threshold= Number of tables On an O-GEHL predictor, best threshold depends on  the application   the predictor size   the counter width  By chance, on most applications, for the best fixed threshold, updates on mispredictions ≈ updates on correct predictions Monitor the difference and adapt the update threshold

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 11 Adaptative history length fitting (inspired by Juan et al 98) (½ applications: L(7) < 50) ≠ (½ applications: L(7) > 150 ) Let us adapt some history lengths to the behavior of each application  8 tables:  T2: L(2) and L(8)  T4: L(4) and L(9)  T6: L(6) and L(10)

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 12 Adaptative history length fitting (2) Intuition:  if high degree of conflicts on T7, stick with short history Implementation:  monitoring of aliasing on updates on T7 through a tag bit and a counter Simple is sufficient:  Flipping from short to long histories and vice-versa

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 13 ∑ L(4) L(3) or L(6) L(2) L(1) or L(5) L(0) TO T1 T2 T3 T4 Tag bits

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 14 Hashing 200+ bits for indexing !!  Need to compute 11 bits indexes :  Full hashing is unrealistic 1.Just regularly pick at most 33 bits in: address+branch history +path history 2.A single 3-entry exclusive-OR stage

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 15 Evaluation framework 1st Championship Branch Prediction traces: 20 traces including system activity Floating point apps : loop dominated Integer apps: usual SPECINT Multimedia apps Server workload apps: very large footprint

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 16 Reference configuration presented for Championship Branch Prediction  8 tables:  2 Kentries except T1, 1Kentries  5 bit counters for T0 and T1, 4 bit counters otherwise  1 Kbits of one bit tags associated with T7 10K + 5K + 6x8K + 1K = 64K  L(1) =3 and L(10)= 200  {0,3,5,8,12,19,31,49,75,125,200}

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 17 A case for the OGEHL predictor  2nd at CBP: 2.82 misp/KI  Best practice award:  The predictor the closest to a possible hardware implementation  Does not use exotic features: Various prime numbers, etc Strange initial state  Chaining simulations: 2.84 misp/KI  Very fast warming

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 18 A case for the OGEHL predictor (2)  High accuracy  32Kbits (3,150): 3.41 misp/KI better than any implementable 128 Kbits predictor before CBP 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI 176 Kbits PBNP (43) : 3.67 misp/KI  1Mbits (5,300): 2.27 misp/KI 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI 1888 Kbits PBNP (58): 3.23 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 19 A case for the OGEHL predictor (3)  Robustness to variations of history lengths choices:  L(1) in [2,6], L(10) in [125,300]  misp. rate < 2.96 misp/KI  Geometric series: not a bad formula !!  best geometric L(1)=3, L(10)=223, 2.80 misp/KI  best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} 2.78 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 20 Impact of the number of components  4 components — 8 components 64 Kbits: misp/KI 256Kbits: misp/KI 1Mbit: misp/KI  6 components — 12 components 48 Kbits: 3.02 – 3.03 misp/KI 768Kbits: 2.35 – 2.25 misp/KI 4 to 12 components bring high accuracy

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 21 Impact of counter width  Robustness to counter width variations:  3-bit counter, 49 Kbits: 3.09 misp/KI Dynamic update threshold fitting helps a lot  5-bit counter 79 Kbits: 2.79 misp/KI 4-bit is the best tradeoff

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 22 Prediction computation time  3 successive steps:  Index computation: a 3-entry XOR gate  Table read  Adder tree  May not fit on a single cycle:  But can be ahead pipelined !

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 23 Ahead pipelining a global history branch predictor (principle)  Initiate branch prediction X+1 cycles in advance to provide the prediction in time  Use information available: X-block ahead instruction address X-block ahead history  To ensure accuracy:  Use intermediate path information

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 24 Practice Ahead OGEHL: 8 // prediction computations bcd Ha A ABCD

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 25 Ahead Pipelined 64 Kbits OGEHL  3-block: 2.94 misp/KI  4-block: 2.99 misp/KI  5-block: 3.04 misp/KI Not such a huge accuracy loss

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 26 A final case for the O-GEHL predictor  delivers state-of-the-art accuracy  uses only global information:  Very long history: 200+ bits !!  can be ahead pipelined  many effective design points  Nb of tables, counter width, history lengths  prediction computation logic complexity is low (compared with concurrent predictors )

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 27 Still open questions  Does it exist better prediction combination functions ?  Indirect jump targets ?

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 28 The End

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 29 BACK UP

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 30 Improved the global history  Address + conditional branch history:  path confusion on short histories   Address + path:  Direct hashing leads to path confusion  1.Represent all branches in branch history 2.Use also path history ( 1 bit per branch, limited to 16 bits)

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 31 Piecewise linear -- OGEHL (1) accuracy  My own best implementation of piecewise linear:  Shares counters for the same history rank  but uses only power of two numbers of entries in tables  64 Kbits: H=31, 256 counters per rank, x 32 sums  3.72 misp/KI misp/KI  1 MBit : H=63, 2048 counters per rank, x 32 sums  2.64 misp/KI misp/KI  48 Mbit: H=95, 64K counters per rank, x 32 sums  2.30 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 32 Piecewise linear -- OGEHL (2) hardware complexity  Computation logic complexity:  OGEHL: only a few adders  PL: hundreds of adders  Number of storage tables  OGEHL: 4 to 12  PL: H tables ( update time)  Prediction computation time:  + a 3-entry XOR gate on OGEHL  Information to checkpoint:  Kilobits for PL