The O-GEHL branch predictor

Slides:



Advertisements
Similar presentations
Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.
Advertisements

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
TAGE-SC-L Branch Predictors
CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Combining Branch Predictors
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Perceptrons Branch Prediction and its’ recent developments
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Samira Khan University of Virginia April 12, 2016
CSL718 : Pipelined Processors
Data Prefetching Smruti R. Sarangi.
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Dynamic Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
Dynamically Sizing the TAGE Branch Predictor
Exploring Branch Prediction
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Milad Hashemi, Onur Mutlu, Yale N. Patt
Module 3: Branch Prediction
Lecture: Static ILP, Branch Prediction
15-740/ Computer Architecture Lecture 24: Control Flow
Phase Capture and Prediction with Applications
Lecture: Branch Prediction
Scaled Neural Indirect Predictor
Lecture 10: Branch Prediction and Instruction Delivery
Data Prefetching Smruti R. Sarangi.
TAGE-SC-L Again MTAGE-SC
Serene Banerjee, Lizy K. John, Brian L. Evans
Pipelining: dynamic branch prediction Prof. Eric Rotenberg
Adapted from the slides of Prof
Dynamic Hardware Prediction
rePLay: A Hardware Framework for Dynamic Optimization
Gang Luo, Hongfei Guo {gangluo,
Samira Khan University of Virginia Mar 6, 2019
Presentation transcript:

The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC

Branch prediction is still an issue Long pipelines Multiple instruction per cycle Any gain in branch prediction accuracy results in: Performance gain Power consumption gain

What must be predicted: Directions and targets: Indirect branch target Conditional branch direction This presentation: Only conditional branch direction

Conditional branch predictors Two main sources of information: Local history the past behavior of this particular branch Global history the (recent) past behavior of all branches

Speculative history must be managed !? Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!?

From my experience with EV8 team Designers hate maintaining speculative local history

“Classic” stuff with the OGEHL predictor  Global history based: Yeh and Patt 91, Pan and So 91 Multiple history lengths to index multiple tables McFarling 93, Evers et al. 96, EV8 predictor

Selecting between multiple predictions Classic solution: Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? Neural inspired predictors: Use an adder tree instead of a meta-predictor Vintan and Iridon 99, Jimènez and Lin 01 Let’s use the adder tree

Perceptron predictor One weight per history bit ∑ X Sign=prediction

What makes such predictor working ? Use of signed saturated counters: 8-bit on perceptron predictors Partial update policy: Update all the counters on misprediction Update all the counters when sum absolute value is under a threshold For perceptron predictors: Θ= 2.14 N + 20.58

Perceptron predictors: strong points and shortcomings Effective solution for selecting the prediction Hardware cost scales linearly with the history length ! Shortcomings: High number of adders Long latency Good accuracy, but no better than conventional hybrid predictors Hardware cost scales linearly with the history length ?

Multiple history length predictor ∑ L(4) L(3) L(2) L(1) L(0) TO T1 T2 T3 T4 Multiple history length predictor Prediction=Sign

GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

Counter width on O-GEHL predictors 8 bits are just overkilling  4 bits are sufficient  Mixing 5 bits for short histories and 4 bits for long histories is slightly better  3 bits are not so bad !!

Dynamic update threshold fitting Reasonable fixed threshold= Number of tables On an O-GEHL predictor, best threshold depends on the application  the predictor size  the counter width  By chance, on most applications, for the best fixed threshold, updates on mispredictions ≈ updates on correct predictions Monitor the difference and adapt the update threshold

Adaptative history length fitting (inspired by Juan et al 98) (½ applications: L(7) < 50) ≠ (½ applications: L(7) > 150) Let us adapt some history lengths to the behavior of each application 8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)

Adaptative history length fitting (2) Intuition: if high degree of conflicts on T7, stick with short history Implementation: monitoring of aliasing on updates on T7 through a tag bit and a counter Simple is sufficient: Flipping from short to long histories and vice-versa

TO T1 T2 L(0) ∑ T3 L(1) or L(5) L(2) T4 L(3) or L(6) L(4) Tag bits

What is global history ? Address + conditional branch history: path confusion on short histories  Address + path: Direct hashing leads to path confusion  Represent all branches in branch history Use also path history ( 1 bit per branch, limited to 16 bits)

Hashing 200+ bits for indexing !! Need to compute 11 bits indexes : Full hashing is unrealistic Just regularly pick at most 33 bits in: address+branch history +path history A single 3-entry exclusive-OR stage

Evaluation framework 1st Championship Branch Prediction traces: 20 traces including system activity Floating point apps : loop dominated Integer apps Multimedia apps Server workload apps: very large footprint

CBP A single rule: The storage budget must fit 64K + 256 bits. Rule artefacts Don’t care about implementability Initialize the predictor with what is convenient Adapt the configuration to each benchmark, but don’t care about transition between applications

Configuration presented for Championship Branch Prediction 8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters otherwise 1 Kbits of one bit tags associated with T7 10K + 5K + 6x8K + 1K = 64K L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}

A case for the OGEHL predictor 2nd at CBP: 2.82 misp/KI Best practice award: The predictor the closest to a possible hardware implementation Chaining simulations: 2.84 misp/KI The winner: 4.34 misp/KI (but respecting the rule is respecting the rule )

A case for the OGEHL predictor (2) High accuracy Half size 32Kbits (3,150): 3.41 misp/KI better than any “implementable” 64Kbits predictor before CBP Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, 2.80 misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} 2.78 misp/KI

A case for the OGEHL predictor (3) Reduce counter width by 1 bit: 49 Kbits would have been a finalist at CBP (3.09)  64 Kbits 4 components OGEHL predictor would have been a finalist at CBP (3.02)  50 Kbits 4 components OGEHL predictor (3-bit) would have been a finalist at CBP (3.21)  768 Kbits 12 components OGEHL predictor 2.25 misp/KI

Prediction computation time 3 components: Index computation Counter read Adder tree Does not fit on a single cycle: But may be ahead pipelined !

Ahead pipelining a global history branch predictor (principle) Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available: X-block ahead instruction address X-block ahead history To ensure accuracy: Use intermediate path information

8 // prediction computations Practice A B C D bcd Ahead OGEHL: 8 // prediction computations Ha A

Ahead Pipelined 64 Kbits OGEHL 3-block: 2.94 misp/KI 4-block: 2.99 misp/KI 5-block: 3.04 misp/KI Not such a huge accuracy loss 

A final case for the O-GEHL predictor delivers state-of-the-art accuracy uses only global information can be ahead pipelined many effective design points Nb of tables, counter width, .. prediction computation logic complexity is low (compared with concurrent predictors) (The End)