Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Slides:

Advertisements

Similar presentations

Perceptron Branch Prediction with Separated T/NT Weight Tables Guangyu Shi and Mikko Lipasti University of Wisconsin-Madison June 4, 2011.

Advertisements

Dead Block Replacement and Bypass with a Sampling Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University and Departament d'Arquitectura de Computadors.

Pipelining V Topics Branch prediction State machine design Systems I.

Slides from: Doug Gray, David Poole

Perceptron Learning Rule

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.

Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang.

André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.

Dibakar Gope and Mikko H. Lipasti University of Wisconsin – Madison Championship Branch Prediction 2014 Bias-Free Neural Predictor.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

Performance Optimization

Hash Table indexing and Secondary Storage Hashing.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.

Branch Target Buffers BPB: Tag + Prediction

1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)

Run-time Environment and Program Organization

Hashing General idea: Get a large array

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Perceptrons Branch Prediction and its’ recent developments

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Neural Networks Lecture 8: Two simple learning algorithms

Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Microprocessor Arch. 김인식 - 인사

CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.

1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.

Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.

March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.

March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

Perceptron-based Coherence Predictors Naveen R. Iyer Publication: Perceptron-based Coherence Predictors. D. Ghosh, J.B. Carter, and H. Duame. In the Proceedings.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Array Size Arrays use static allocation of space. That is, when the array is created, we must specify the size of the array, e.g., int[] grades = new int[100];

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Samira Khan University of Virginia April 12, 2016

Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.

Multiperspective Perceptron Predictor Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University.

Dynamic Branch Prediction

Multilayer Perceptron based Branch Predictor

CS203 – Advanced Computer Architecture

COSC3330 Computer Architecture Lecture 15. Branch Prediction

Dynamically Sizing the TAGE Branch Predictor

Subject Name: File Structures

Samira Khan University of Virginia Dec 4, 2017

Looking for limits in branch prediction with the GTL predictor

Perceptrons for Dummies

Linear machines 28/02/2017.

Scaled Neural Indirect Predictor

TAGE-SC-L Again MTAGE-SC

The O-GEHL branch predictor

Samira Khan University of Virginia Mar 6, 2019

Phase based adaptive Branch predictor: Seeing the forest for the trees

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University

2 Previous Neural Predictors u The perceptron predictor uses only pattern history information u The same weights vector is used for every prediction of a static branch u The i th history bit could come from any number of static branches u So the i th correlating weight is aliased among many branches u The newer path-based neural predictor uses path information u The i th correlating weight is selected using the i th branch address u This allows the predictor to be pipelined, mitigating latency u This strategy improves accuracy because of path information u But there is now even more aliasing since the i th weight could be used to predict many different branches

3 Piecewise Linear Branch Prediction u Generalization of perceptron and path-based neural predictors u Ideally, there is a weight giving the correlation between each u Static branch b, and u Each pair of branch and history position (i.e. i) in b’s history u b might have 1000s of correlating weights or just a few u Depends on the number of static branches in b’s history u First, I’ll show a “practical version”

4 The Algorithm: Parameters and Variables u GHL – the global history length u GHR – a global history shift register u GA – a global array of previous branch addresses u W – an n × m × (GHL + 1) array of small integers

5 The Algorithm: Making a Prediction Weights are selected based on the current branch and the i th most recent branch

6 The Algorithm: Training

7 Why It’s Better u Forms a piecewise linear decision surface u Each piece determined by the path to the predicted branch u Can solve more problems than perceptron Perceptron decision surface for XOR doesn’t classify all inputs correctly Piecewise linear decision surface for XOR classifies all inputs correctly

8 Learning XOR  From a program that computes XOR using if statements perceptron predictionpiecewise linear prediction

9 A Generalization of Neural Predictors u When m = 1, the algorithm is exactly the perceptron predictor u W[n,1,h+1] holds n weights vectors u When n = 1, the algorithm is path-based neural predictor u W[1,m,h+1] holds m weights vectors u Can be pipelined to reduce latency u The design space in between contains more accurate predictors u If n is small, predictor can still be pipelined to reduce latency

10 Generalization Continued Perceptron and path- based are the least accurate extremes of piecewise linear branch prediction!

11 Idealized Piecewise Linear Branch Prediction u Get rid of n and m u Allow 1 st and 2 nd dimensions of W to be unlimited u Now branches cannot alias one another; accuracy much better u One small problem: unlimited amount of storage required u How to squeeze this into 65,792 bits for the contest?

12 Hashing u 3 indices of W : i, j, & k, index arbitrary numbers of weights u Hash them into 0..N-1 weights in an array of size N u Collisions will cause aliasing, but more uniformly distributed u Hash function uses three primes H 1 H 2 and H 3:

13 More Tricks u Weights are 7 bits, elements of GA are 8 bits u Separate arrays for bias weights and correlating weights u Using global and per-branch history u An array of per-branch histories is kept, alloyed with global history u Slightly bias the predictor toward not taken u Dynamically adjust history length u Based on an estimate of the number of static branches u Extra weights u Extra bias weights for each branch u Extra correlating weights for more recent history bits u Inverted bias weights that track the opposite of the branch bias

14 Parameters to the Algorithm #define NUM_WEIGHTS 8590 #define NUM_BIASES 599 #define INIT_GLOBAL_HISTORY_LENGTH 30 #define HIGH_GLOBAL_HISTORY_LENGTH 48 #define LOW_GLOBAL_HISTORY_LENGTH 18 #define INIT_LOCAL_HISTORY_LENGTH 4 #define HIGH_LOCAL_HISTORY_LENGTH 16 #define LOW_LOCAL_HISTORY_LENGTH 1 #define EXTRA_BIAS_LENGTH 6 #define HIGH_EXTRA_BIAS_LENGTH 2 #define LOW_EXTRA_BIAS_LENGTH 7 #define EXTRA_HISTORY_LENGTH 5 #define HIGH_EXTRA_HISTORY_LENGTH 7 #define LOW_EXTRA_HISTORY_LENGTH 4 #define INVERTED_BIAS_LENGTH 8 #define HIGH_INVERTED_BIAS_LENGTH 4 #define LOW_INVERTED_BIAS_LENGTH 9 #define NUM_HISTORIES 55 #define WEIGHT_WIDTH 7 #define MAX_WEIGHT 63 #define MIN_WEIGHT -64 #define INIT_THETA_UPPER 70 #define INIT_THETA_LOWER -70 #define HIGH_THETA_UPPER 139 #define HIGH_THETA_LOWER -136 #define LOW_THETA_UPPER 50 #define LOW_THETA_LOWER -46 #define HASH_PRIME_ U #define HASH_PRIME_ U #define HASH_PRIME_ U #define TAKEN_THRESHOLD 3 All determined empirically with an ad hoc approach

15 References u Me and Lin, HPCA 2001 (perceptron predictor) u Me and Lin, TOCS 2002 (global/local perceptron) u Me, MICRO 2003 (path-based neural predictor) u Juan, Sanjeevan, Navarro, SIGARCH Comp. News, 1998 (dynamic history length fitting) u Skadron, Martonosi, Clark, PACT 2000 (alloyed history)

16 The End

17 Program to Compute XOR int f () { int a, b, x, i, s = 0; for (i=0; i<100; i++) { a = rand () % 2; b = rand () % 2; if (a) { if (b) x = 0; else x = 1; } else { if (b) x = 1; else x = 0; } if (x) s++; /* this is the branch */ } return s; }