Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Estimation of TLD dose measurement uncertainties and thresholds at the Radiation Protection Service Du Toit Volschenk SABS.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Wrong Path Events and Their Application to Early Misprediction Detection and Recovery David N. Armstrong Hyesoon Kim Onur Mutlu Yale N. Patt University.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin,
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS Instance Based Learning1 Instance Based Learning.
Perceptrons Branch Prediction and its’ recent developments
Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Neural Networks Lecture 8: Two simple learning algorithms
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Revisiting Load Value Speculation:
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Analysis of Branch Predictors
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
A STUDY OF BRANCH PREDICTION STRATEGIES JAMES E.SMITH Presented By: Prasanth Kanakadandi.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.
Classification Ensemble Methods 1
Methodology to Compute Architectural Vulnerability Factors Chris Weaver 1, 2 Shubhendu S. Mukherjee 1 Joel Emer 1 Steven K. Reinhardt 1, 2 Todd Austin.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Perceptron-based Coherence Predictors Naveen R. Iyer Publication: Perceptron-based Coherence Predictors. D. Ghosh, J.B. Carter, and H. Duame. In the Proceedings.
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Prophet/Critic Hybrid Branch Prediction B B B
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.
Does the brain compute confidence estimates about decisions?
Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Multilayer Perceptron based Branch Predictor
CS203 – Advanced Computer Architecture
Dynamic Branch Prediction
Discussion 2 1/13/2014.
Exploring Value Prediction with the EVES predictor
Perceptrons for Dummies
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
Artificial Neural Networks
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Serene Banerjee, Lizy K. John, Brian L. Evans
Patrick Akl and Andreas Moshovos AENAO Research Group
rePLay: A Hardware Framework for Dynamic Optimization
Gang Luo, Hongfei Guo {gangluo,
Phase based adaptive Branch predictor: Seeing the forest for the trees
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

Thesis Objectives To present a viable global confidence estimator using perceptrons To quantify predictability relationships between instructions To study the performance of the global confidence estimator when used with common value prediction methods

Presentation Outline Background: –Data Value Prediction –Confidence Estimation Predictability Relationships Perceptrons Perceptron-based Confidence Estimator Experimental Results and Conclusions

Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C)... I 1: 6 (A) = 4 (B) + 2 (C)... I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8

Data Value Prediction A data value predictor predicts A from instruction 1’s past outcomes Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1

Types of Value Predictors Computational: Performs a mathematical operation on past values –Last-Value:5, 5, 5, 55 –Stride:1, 3, 5, 79 Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 36

Types of Value History Local History: Predicts using data from past instances of instructions Global History: Predicts using data from other instructions Local value prediction is more conventional

Are mispredictions a problem? If a prediction is incorrect, speculatively executed instructions must be re-executed This can result in: –Cycle penalties for detecting the misprediction –Cycle penalties for restarting dependent instructions –Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict

Confidence Estimator Decides whether to make a prediction for an instruction Bases decisions on the accuracy of past predictions Common confidence estimation method: Saturating Up-Down Counter

Up-Down Counter Don’t Predict Predict Start Correct Incorrect Correct Incorrect Threshold

Local vs. Global Up-Down counter is local –Only past instances of an instruction affect its counter Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions Problem with global: –Not every past instruction affects the predictability of the current instruction

Example I 1.A = B + C I 2.F = G – H I 3.E = A + A Instruction 3 depends on 1 but not on 2 –Instruction 3’s predictability is related to 1 but not 2 If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly

Is global confidence worthwhile? Fewer mispredictions than local –If an instruction mispredicts, its dependent instructions know not to predict Less warm-up time than local –Instructions need not be executed several times before accurate confidence decisions can be made

How common are predictability relationships? Simulation study: –How many instructions in a program predict correctly only when a previous instruction predicts correctly? –Which past instructions have the most influence?

Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!

Predictability Relationships The most recent 10 instructions have the most influence

Global Confidence Estimation A global confidence estimator must: 1.Identify for each instruction which past instructions have similar predictability 2.Use their prediction accuracy to decide whether to predict or not predict

Neural Network Used to iteratively learn unknown functions from examples Consists of nodes and links Each link has a numeric weight Data is fed to input nodes and propagated to output nodes by the links Desired output used to adjust (“train”) the weights

Perceptron Perceptrons only have input and output nodes They are much easier to implement and train than larger neural networks Can only learn linearly separable functions

Perceptron Computation Each bit of input data sourced to an input node Dot product calculated between input data and weights Output is “1” if dot product exceeds a threshold; otherwise “0”

Perceptron Training Weights adjusted so that the perceptron output = the desired output for the given input Error value (ε) = desired value – perceptron output ε times each input bit added to each weight

Weights Weights determine the effect of each input on the output Positive weight: Output varies directly with input bit Negative weight: Output varies inversely with input bit Large weight: Input has strong effect on output Zero weight Input bit has no effect on output

Linear Separability An input may have a direct influence on the output An input may instead have an inverse influence on the output But an input cannot have a direct influence sometimes and an inverse influence at other times

Perceptron Confidence Estimator Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) The output is the decision to predict: (1 = predict, 0 = don’t predict) Weights determine past instruction’s predictability influence on the current instruction: –Positive weight: current instruction mispredicts when past instruction mispredicts –Negative weight: current instruction mispredicts when past instruction predicts correctly –Zero weight: past instruction does not affect current

Perceptron Confidence Estimator Example weights: bias weight = –1 I 1:A = B  Cweight = 1 I 2:D = E + Fweight = 1 I 3:P = Q  Rweight = 0 I 4:G = A + D (current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly

Confidence Estimator Organization

Perceptron Implementation

Weight Value Distribution Simulation Study: –What are typical perceptron weight values? –How does the type of predictor influence the weight distribution? –What minimum range do the weights need to have?

Weight Value Distribution

Simulation Methodology Measurements simulated using SimpleScalar 2.0a SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex Each benchmark is run for 500 million instructions Value predictors: Stride, Last-Value, Context Baseline confidence estimator: 2-bit up-down counter

Simulation Metrics P CORRECT : # of correct predictions P INCORRECT : # of incorrect predictions N: # of cases where no prediction was made

Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter

Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter

Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter

Sensitivity to GPH size

Coverage Sensitivity to the Unavailability of Past Instructions

Accuracy Sensitivity to the Unavailability of Past Instructions

Coverage Sensitivity to Weight Range Limitations

Accuracy Sensitivity to Weight Range Limitations

Conclusions Mispredictions are a problem in data value prediction Benchmark programs exhibit strong predictability relationships between instructions Perceptrons enable confidence estimators to exploit these predictability relationships Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation