André Seznec Caps Team IRISA/INRIA Design tradeoffs for the Alpha EV8 Conditional Branch Predictor André Seznec, IRISA/INRIA Stephen Felix, Intel Venkata.

Slides:



Advertisements
Similar presentations
André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
Advertisements

Branch prediction Titov Alexander MDSP November, 2009.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Lecture 12 Reduce Miss Penalty and Hit Time
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
TAGE-SC-L Branch Predictors
Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
CS 7810 Lecture 7 Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching E. Rotenberg, S. Bennett, J.E. Smith Proceedings of MICRO-29.
EECC722 - Shaaban #1 Lec # 5 Fall Decoupled Fetch/Execute Superscalar Processor Engines Superscalar processor micro-architecture is divided.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.
Combining Branch Predictors
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Goal: Reduce the Penalty of Control Hazards
Branch Target Buffers BPB: Tag + Prediction
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
1 Lecture 20: Core Design Today: Innovations for ILP, TLP, power ISCA workshops Sign up for class presentations.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
Arvind and Joel Emer Computer Science and Artificial Intelligence Laboratory M.I.T. Branch Prediction.
Evaluation of Dynamic Branch Prediction Schemes in a MIPS Pipeline Debajit Bhattacharya Ali JavadiAbhari ELE 475 Final Project 9 th May, 2012.
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.
Analysis of Branch Predictors
André Seznec Caps Team IRISA/INRIA HAVEGE HArdware Volatile Entropy Gathering and Expansion Unpredictable random number generation at user level André.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Microprocessor Microarchitecture Instruction Fetch Lynn Choi Dept. Of Computer and Electronics Engineering.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.
Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal {prateek,
SIMULTANEOUS MULTITHREADING Ting Liu Liu Ren Hua Zhong.
1 Register Write Specialization Register Read Specialization A path to complexity effective wide-issue superscalar processors André Seznec, Eric Toullec,
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
1 Register Write Specialization Register Read Specialization A path to complexity effective wide-issue superscalar processors André Seznec, Eric Toullec,
Computer Structure Advanced Branch Prediction
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Effective ahead pipelining of instruction block address generation André Seznec and Antony Fraboulet IRISA/ INRIA.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Computer Structure Advanced Branch Prediction
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Design tradeoffs for the Alpha EV8 Conditional Branch Predictor
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Lecture 10: Branch Prediction and Instruction Delivery
TAGE-SC-L Again MTAGE-SC
Serene Banerjee, Lizy K. John, Brian L. Evans
Dynamic Hardware Prediction
rePLay: A Hardware Framework for Dynamic Optimization
CAPS project-team Compilation et Architectures pour Processeurs Superscalaires et Spécialisés.
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
The O-GEHL branch predictor
Presentation transcript:

André Seznec Caps Team IRISA/INRIA Design tradeoffs for the Alpha EV8 Conditional Branch Predictor André Seznec, IRISA/INRIA Stephen Felix, Intel Venkata Krishnan, Stargen Inc Yiannakis Sazeides, University of Cyprus

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Alpha EV8 (cancelled june 2001)  SMT: 4 threads  wide-issue superscalar processor:  8-way issue Single process performance is the goal Multithreaded performance is a bonus 5-10 % overhead for SMT

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Challenges on the EV8 conditional branch predictor  High accuracy is needed:  14 cycles minimum miss penalty  Up to 16 predictions per cycle:  from two non-contiguous fetch blocks!  Various implementation constraints:  master the number of physical memory arrays  use of single-ported memory cells  timing constraints

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa instruction fetch blocks on EV8 br taken not taken br not taken not taken

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Alpha EV8 front-end pipeline  Fetches up to two, 8-instruction blocks per cycle from the I-cache:  a block ends either on an aligned 8-instruction end or on a taken control flow  up to 16 conditional branches fetched and predicted per cycle  Next two block addresses must be predicted in a single cycle:  critical path: use of a line predictor backed with a complex PC address generator: conditional branch predictor, RAS, jump predictor..

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa PC address generation pipeline Cycle 1Cycle 2Cycle 3 Line prediction is completed Prediction table read is completed PC address generation is completed C and DA and BY and Z

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa EV8 predictor: (derived from) (2Bc-gskew) e-gskew

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa 2Bc-gskew: degrees of freedom partial update policy  on correct predictions, only updates correct components:  do not destroy other predictions  better accuracy !  On correct predictions:  prediction bit is only read  hysteresis bit is only written USE OF DISTINCT PREDICTION AND HYSTERESIS ARRAYS !! No reason for same size for hysteresis and prediction arrays

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa EV8 predictor: leveraging degrees of freedom Different history lengths Smaller bimodal table

André Seznec Caps Team IRISA/INRIA Dealing with implementation constraints

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Issues on global history Blocks A and BBlocks Y and Z Blocks C and D Branch infos from C, B and A are not valid to predict D! On each cycle, upto 16 branch are predicted: 0 to 16 bits to be inserted in the history vector !?

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Block compressed history lghist  Incorporate at most one bit in the history per fetch block:  0, 1 or 2 bits to be incorporated in history vector per cycle  Which bit ?  Direction of the last conditional branch in the block previous ones are not taken  XORed with position (1st half/ 2nd half) in the block more uniform distribution of the history vectors

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa instruction fetch blocks on EV8 br taken 1 is inserted br takennot taken 0 is inserted

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa The EV8 branch predictor information vector  History information is not available on the three previous blocks A, B, and C  but, addresses are available !! Information vector to index the predictor: 1. Instruction address 2. Lghist (3-blocks-old history + path) 3. Path info on the last three blocks

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Using single-ported memory arrays The challenge: 16 predictions to be performed per cycle from two non-contiguous blocks ! 8 updates per cycle: for two non-contiguous blocks ! But single-ported arrays are highly desirable :-)

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Bank-interleaved or double-ported branch predictor ?  Reads of predictions for two 8-instructions blocks:  double-porting: memory cells twice as large losing half of the entries ?  bank-interleaving: need for arbitration longer critical electrical path losing throughput short loops fitting in a single 8-instruction block !? ????????

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Conflict free interleaved bank predictor Key idea: Force adjacent predictions to lie in distinct banks Bank for A is determined by Y and Z if (y6,y5)== Bz then Ba =(y6,y5+1) else Ba = (y6,y5) 4-way interleaved:

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Conflict free bank-interleaved predictor (2)  Conflicts are avoided by construction  Bank number is computed one cycle ahead  not on the critical path Single ported bank-interleaved memory arrays !

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa « Logical view » vs real implementation  4 tables * 4 banks * 2 (pred. +hyst.):  32 memory arrays  Indexing functions are computed, then arrays are accessed  4 banks * 2 (pred. + hyst.)  4 tables in a single array  8 memory arrays  No time to lose:  start access and compute part of the index in //

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Reading the branch prediction tables Bank selection 1 out of 4 MetaG0G1 BIM Wordline selection 1 out 64 Column selection: 8 out of 256 Unshuffle: 8 to 8

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Reading the branch prediction tables (2)  Span over 5 cycle phases:  Cycle -1: bank number computation bank selection  Cycle 0: phase 0: wordline selection phase 1: column selection  Cycle 1: phase 0: unshuffle permutation

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Constraints for indices composition  Strong: Wordline bits:  immediate availability  common to the four logical tables  Medium: Column bits  a single 2-entry XOR gate  Weak: Unshuffle bits:  near complete freedom, a full tree of XOR gates if needed

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Designing the indexing functions (1) 6 wordline bits  Must be available at the beginning of the cycle:  block address bits  3-block old lghist bits  path bits  Tradeoff:  address bits for emphasizing bimodal component behavior  lghist bits are more uniformly distributed 4 lghist bits + 2 address bits

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Designing the indexing functions (2) Column selection and unshuffle  Favor independance of the four indexing functions:  if two (address,history) pairs conflict on a table then try to avoid repeating the conflict on an other table  Guarantee that for a single address, two histories that differ by only one or two bits will not map on the same entry  Favor usage of the whole table:  lghist bits are more uniformly distributed than address bits XORing 2 lghist bits for column bits a XOR tree with up to 11 bits for unshuffle

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa EV8 branch predictor configuration  208 Kbits for prediction and 144 Kbits for hysteresis  «BIM»: 16 K + 16 K, 4 lghist bits (+ 3-block path)  G0: 64 K + 32 K, 13 lghist bits  G1: 64 K + 64 K, 21 lghist bits  Meta: 64 K + 32 K, 17 lghist bits  4 prediction banks and 4 hysteresis banks

André Seznec Caps Team IRISA/INRIA Performance evaluation Sorry, SPEC 95 :-)

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Benchmarks characteristics  Highly optimized SPECint 95:  much more not-taken than taken  ratio lghist/ghist length: from 1.12 to 1.59  from 8.9 to 16.2 branches per 100 instructions

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa 2Bc-gskew vs other global history predictors

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Quality of information vector

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Reducing some table sizes no significant impact

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Quality of indexing functions

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Conclusion  Design of a real branch predictor leads to challenges ignored in most academic studies:  3-block old history vector  inability to maintain a complete history  simultaneous accesses to the predictor  minimization of the number of memory arrays  timing constraints on the indexing functions We overcame these difficulties and adapted a state of the art academic branch predictor to real world constraints.

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Summary of the contributions  Efficient information vector can be built with mixing path and compressed history:  don’t focus on the info vector, use what is convenient!  Use of different table sizes, history lengths in the predictor.  Sharing of hysteresis bits  Conflict free parallel access scheme for the predictor  Engineering of indexing functions

The Alpha EV8 Conditional Branch Predictor André Seznec Caps Team Irisa Acknowledgements To the whole EV8 design team Special mention to: Ta-chung Chang, George Chrysos, John Edmondson, Joel Emer, Tryggve Fossum, Glenn Giacalone, Balakrishnan Iyer, Manickavelu Balasubramanian, Harish Patil, George Tien and James Vash.