André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
André Seznec Caps Team IRISA/INRIA Design tradeoffs for the Alpha EV8 Conditional Branch Predictor André Seznec, IRISA/INRIA Stephen Felix, Intel Venkata.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Combining Branch Predictors
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
Dynamic Branch Prediction
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
Perceptrons Branch Prediction and its’ recent developments
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
André Seznec Caps Team IRISA/INRIA HAVEGE HArdware Volatile Entropy Gathering and Expansion Unpredictable random number generation at user level André.
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.
1 Register Write Specialization Register Read Specialization A path to complexity effective wide-issue superscalar processors André Seznec, Eric Toullec,
Adaptive GPU Cache Bypassing Yingying Tian *, Sooraj Puthoor†, Joseph L. Greathouse†, Bradford M. Beckmann†, Daniel A. Jiménez * Texas A&M University *,
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
CSL718 : Pipelined Processors
Data Prefetching Smruti R. Sarangi.
CS203 – Advanced Computer Architecture
COSC3330 Computer Architecture Lecture 15. Branch Prediction
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Design tradeoffs for the Alpha EV8 Conditional Branch Predictor
Lecture: Static ILP, Branch Prediction
Phase Capture and Prediction with Applications
Lecture: Branch Prediction
Lecture 10: Branch Prediction and Instruction Delivery
Data Prefetching Smruti R. Sarangi.
TAGE-SC-L Again MTAGE-SC
Adapted from the slides of Prof
Dynamic Hardware Prediction
rePLay: A Hardware Framework for Dynamic Optimization
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
The O-GEHL branch predictor
Gang Luo, Hongfei Guo {gangluo,
Presentation transcript:

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 2 Objectives  State of the art accuracy: Any gain in branch prediction accuracy results in performance gain and power consumption gain  Keep the design implementable:  Rely on a single prediction scheme  Use only global history information: Designers hate maintaining speculative local history

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 3 L(0) ? L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 The basis : A Multiple history length predictor

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 4 Selecting between multiple predictions  Classic solution:  Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ??  Neural inspired predictors:  Use an adder tree instead of a meta-predictor Let’s use the adder tree

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 5 L(0) ∑ L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 Final computation through a sum Prediction=Sign

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 6 From “old experience” on 2bcgskew  Some applications benefit from 100+ bits histories  Some don’t !!

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 7 GEometric History Length predictor The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 8 Update policy  Perceptron inspired threshold based  Perceptron-like threshold did not work  Reasonable fixed threshold= Number of tables

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 9 Dynamic update threshold fitting On an O-GEHL predictor, best threshold depends on  the application   the predictor size   the counter width  By chance, on most applications, for the best fixed threshold, updates on mispredictions ≈ updates on correct predictions Monitor the difference and adapt the update threshold

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 10 Adaptative history length fitting (inspired by Juan et al 98) (½ applications: L(7) < 50) ≠ (½ applications: L(7) > 150 ) Let us adapt some history lengths to the behavior of each application  8 tables:  T2: L(2) and L(8)  T4: L(4) and L(9)  T6: L(6) and L(10)

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 11 Adaptative history length fitting (2) Intuition:  if high degree of conflicts on T7, stick with short history Implementation:  monitoring of aliasing on updates on T7 through a tag bit and a counter Simple is sufficient:  Flipping from short to long histories and vice-versa

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 12 Evaluation framework 1st Championship Branch Prediction traces: 20 traces including system activity Floating point apps : loop dominated Integer apps: usual SPECINT Multimedia apps Server workload apps: very large footprint

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 13 Reference configuration presented for Championship Branch Prediction  8 tables:  2 Kentries except T1, 1Kentries  5 bit counters for T0 and T1, 4 bit counters otherwise  1 Kbits of one bit tags associated with T7 10K + 5K + 6x8K + 1K = 64K  L(1) =3 and L(10)= 200  {0,3,5,8,12,19,31,49,75,125,200}

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 14 A case for the OGEHL predictor  2nd at CBP: 2.82 misp/KI  Best practice award:  The predictor the closest to a possible hardware implementation  Does not use exotic features: Various prime numbers, etc Strange initial state  Very short warming intervals:  Chaining all simulations: 2.84 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 15 A case for the OGEHL predictor (2)  High accuracy  32Kbits (3,150): 3.41 misp/KI better than any implementable 128 Kbits predictor before CBP 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI 176 Kbits PBNP (43) : 3.67 misp/KI  1Mbits (5,300): 2.27 misp/KI 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI 1888 Kbits PBNP (58): 3.23 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 16 A case for the OGEHL predictor (3)  Robustness to variations of history lengths choices:  L(1) in [2,6], L(10) in [125,300]  misp. rate < 2.96 misp/KI  Geometric series: not a bad formula !!  best geometric L(1)=3, L(10)=223, 2.80 misp/KI  best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} 2.78 misp/KI

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 17 Impact of the number of components  4 components — 8 components 64 Kbits: misp/KI 256Kbits: misp/KI 1Mbit: misp/KI  6 components — 12 components 48 Kbits: 3.02 – 3.03 misp/KI 768Kbits: 2.35 – 2.25 misp/KI 4 to 12 components bring high accuracy

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 18 Impact of counter width  Robustness to counter width variations:  3-bit counter, 49 Kbits: 3.09 misp/KI Dynamic update threshold fitting helps a lot  5-bit counter 79 Kbits: 2.79 misp/KI 4-bit is the best tradeoff

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 19 Prediction computation time  3 successive steps:  Index computation: a 3-entry XOR gate  Table read  Adder tree  May not fit on a single cycle:  But can be ahead pipelined !

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 20 Ahead pipelining a global history branch predictor (principle)  Initiate branch prediction X+1 cycles in advance to provide the prediction in time  Use information available: X-block ahead instruction address X-block ahead history  To ensure accuracy:  Use intermediate path information

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 21 Practice Ahead OGEHL: 8 // prediction computations bcd Ha A ABCD

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 22 Ahead Pipelined 64 Kbits OGEHL  3-block: 2.94 misp/KI  4-block: 2.99 misp/KI  5-block: 3.04 misp/KI Not such a huge accuracy loss

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 23 A final case for the O-GEHL predictor  delivers state-of-the-art accuracy  uses only global information:  Very long history: 200+ bits !!  can be ahead pipelined  many effective design points  Nb of tables, counter width, history lengths  prediction computation logic complexity is low (compared with concurrent predictors )

Analysis of the O-GEHL branch predictor André Seznec Caps Team Irisa 24 The End