Perceptrons Branch Prediction and its’ recent developments

Slides:

Advertisements

Similar presentations

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.

Advertisements

Branch prediction Titov Alexander MDSP November, 2009.

Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

EE8365/CS8203 ADVANCED COMPUTER ARCHITECTURE A Survey on BRANCH PREDICTION METHODOLOGY By, Baris Mustafa Kazar Resit Sendag.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

Combining Branch Predictors

Branch Target Buffers BPB: Tag + Prediction

Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Radial Basis Function (RBF) Networks

Chapter 6-2 Multiplier Multiplier Next Lecture Divider

Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

Data Representation By- Mr. S. S. Hire. Data Representation.

Analysis of Branch Predictors

1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.

1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

Perceptron-based Coherence Predictors Naveen R. Iyer Publication: Perceptron-based Coherence Predictors. D. Ghosh, J.B. Carter, and H. Duame. In the Proceedings.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Recursive Architectures for 2DLNS Multiplication RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR 11 Recursive Architectures for 2DLNS.

Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.

Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Prophet/Critic Hybrid Branch Prediction B B B

ELEC692 VLSI Signal Processing Architecture Lecture 12 Numerical Strength Reduction.

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Samira Khan University of Virginia April 12, 2016

Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University.

Multilayer Perceptron based Branch Predictor

CS203 – Advanced Computer Architecture

Dynamic Branch Prediction

CMSC 611: Advanced Computer Architecture

Looking for limits in branch prediction with the GTL predictor

Perceptrons for Dummies

Module 3: Branch Prediction

Dynamic Hardware Branch Prediction

Arithmetic Logical Unit

Artificial Intelligence Lecture No. 28

TAGE-SC-L Again MTAGE-SC

Adapted from the slides of Prof

Dynamic Hardware Prediction

The O-GEHL branch predictor

Samira Khan University of Virginia Mar 6, 2019

Presentation transcript:

Perceptrons Branch Prediction and its’ recent developments Mostly based on the Dynamic Branch Prediction with Perceptrons Daniel A. Jim´enez Calvin Lin By Shugen Li

Introduction As the new technology development on the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism. Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.

Introduction (cont’) Figure 1. A conceptual system model for branch prediction Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,

Introduction (cont’) we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties

Why perceptrons ? The major benefit of perceptrons is that by examining theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make. many neural networks is difficult or impossible to determine exactly how the neural network is making its decision. perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.

Perceptrons Model Input Xi as the bits of the global branch history shift register Weight W0-n is the Weights vector Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken

Perceptrons training Let branch outcome t be -1 if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done. These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.

Perceptrons limitation Only capable of learning linearly separable functions It means a perceptron can learn the logical AND of two inputs, but not the exclusive-OR

Predictor block Diagram

Experimental result Use Spec2000 interger benchmark and compare with gshare and bi-mode. Also compare with a hybrid gshare/perceptron predictor. Its ability to make use of longer history lengths. Done well when the branch being predicted exhibits linearly separable behavior.

much longer history lengths than traditional two-level schemes

Performance

Implementation Computing the Perceptron Output. not needed to compute the dot product. Instead, simply add when the input bit is 1 and subtract (add the two’s complement) when the input bit is -1. similar to that performed by multiplication circuits, which must find the sum of partial products that are each a function of an integer and a single bit. Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.

Implementation (cont’) Training

Litimations Delay-huge latency even if simplified method Low performance on the non linearly separable Aliasing and Hardware

Recent development (1) Low-power Perceptrons (selective weight) by Kaveh Aasaraai, Amirali Baniasadi Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM. Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM. Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.

Recent development (2) The Combined Perceptron Branch Predictor By Matteo Monchiero Gianluca Palermo The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.

Recent development (3) Path-based neural prediction By Daniel A Recent development (3) Path-based neural prediction By Daniel A.Jimennez On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel. A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums. The delay is the perceptron table read delay followed by a single multiply-add delay. No consider the table read delay. Also the misprediction penalty.

Recent development (4) Revisiting the perceptron predictor By A. Seznec the accuracy of perceptron predictors is further improved with the following extensions: using pseudo-tag to reduce aliasing impact skewing perceptron weight tables to improve table utilization, introducing redundant history to handle linearly inseparable data sets. The nonlinear redundant history also leads to a more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights Increasing hardware complexity.

Recent development (5) the O-GEometric History Length branch predictor By A. Seznec The GEHL predictor features M distinct predictor tables Ti The predictor tables store predictions as signed saturated counters. A single counter C(i) is read on each predictor table Ti.(1< i < M) The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation. The prediction is taken when S is positive or nul and not-taken when S is negative.

Recent development(5) Cont’ the O-GEometric History Length branch predictor By A. Seznec The history lengths used the second equation for computing the indexing functions for tables Ti The element on all T(i) table is easy to train, similar like in the perceptrons predictor for Low hardware cost and better latency.

Conclusion Perceptrons is attractive as using long history lengths without requiring exponential resources. It’s weakness is the increased computational complexity and following latency and hardware cost. As the new idea, it can be combined with the tranditional methods to obtain better performance. There are several methods being developed to reduce the latency and handle the mis-prediction. Finally this technology will be more practical as the hardware cost go down quickly. There should be more space for the further development.

Reference [1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001. [2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002. [3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004. [4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003. [5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004 [6] K. Aasaraai and A. Baniasadi Low-power Perceptrons [7] A. Seznec. The O-GEometric History Length branch predictor [8] M. Monchiero and G. Palermo The Combined Perceptron Branch Predictor [9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.

Thank You! Question?