Perceptrons Branch Prediction and its’ recent developments Mostly based on the Dynamic Branch Prediction with Perceptrons Daniel A. Jim´enez Calvin Lin By Shugen Li
Introduction As the new technology development on the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism. Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.
Introduction (cont’) Figure 1. A conceptual system model for branch prediction Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,
Introduction (cont’) we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties
Why perceptrons ? The major benefit of perceptrons is that by examining theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make. many neural networks is difficult or impossible to determine exactly how the neural network is making its decision. perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.
Perceptrons Model Input Xi as the bits of the global branch history shift register Weight W0-n is the Weights vector Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken
Perceptrons training Let branch outcome t be -1 if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done. These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.
Perceptrons limitation Only capable of learning linearly separable functions It means a perceptron can learn the logical AND of two inputs, but not the exclusive-OR
Predictor block Diagram
Experimental result Use Spec2000 interger benchmark and compare with gshare and bi-mode. Also compare with a hybrid gshare/perceptron predictor. Its ability to make use of longer history lengths. Done well when the branch being predicted exhibits linearly separable behavior.
much longer history lengths than traditional two-level schemes
Performance
Implementation Computing the Perceptron Output. not needed to compute the dot product. Instead, simply add when the input bit is 1 and subtract (add the two’s complement) when the input bit is -1. similar to that performed by multiplication circuits, which must find the sum of partial products that are each a function of an integer and a single bit. Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.
Implementation (cont’) Training
Litimations Delay-huge latency even if simplified method Low performance on the non linearly separable Aliasing and Hardware
Recent development (1) Low-power Perceptrons (selective weight) by Kaveh Aasaraai, Amirali Baniasadi Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM. Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM. Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.
Recent development (2) The Combined Perceptron Branch Predictor By Matteo Monchiero Gianluca Palermo The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.
Recent development (3) Path-based neural prediction By Daniel A Recent development (3) Path-based neural prediction By Daniel A.Jimennez On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel. A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums. The delay is the perceptron table read delay followed by a single multiply-add delay. No consider the table read delay. Also the misprediction penalty.
Recent development (4) Revisiting the perceptron predictor By A. Seznec the accuracy of perceptron predictors is further improved with the following extensions: using pseudo-tag to reduce aliasing impact skewing perceptron weight tables to improve table utilization, introducing redundant history to handle linearly inseparable data sets. The nonlinear redundant history also leads to a more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights Increasing hardware complexity.
Recent development (5) the O-GEometric History Length branch predictor By A. Seznec The GEHL predictor features M distinct predictor tables Ti The predictor tables store predictions as signed saturated counters. A single counter C(i) is read on each predictor table Ti.(1< i < M) The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation. The prediction is taken when S is positive or nul and not-taken when S is negative.
Recent development(5) Cont’ the O-GEometric History Length branch predictor By A. Seznec The history lengths used the second equation for computing the indexing functions for tables Ti The element on all T(i) table is easy to train, similar like in the perceptrons predictor for Low hardware cost and better latency.
Conclusion Perceptrons is attractive as using long history lengths without requiring exponential resources. It’s weakness is the increased computational complexity and following latency and hardware cost. As the new idea, it can be combined with the tranditional methods to obtain better performance. There are several methods being developed to reduce the latency and handle the mis-prediction. Finally this technology will be more practical as the hardware cost go down quickly. There should be more space for the further development.
Reference [1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001. [2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002. [3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004. [4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003. [5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004 [6] K. Aasaraai and A. Baniasadi Low-power Perceptrons [7] A. Seznec. The O-GEometric History Length branch predictor [8] M. Monchiero and G. Palermo The Combined Perceptron Branch Predictor [9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.
Thank You! Question?