Download presentation
Presentation is loading. Please wait.
Published byStephanie Mathews Modified over 9 years ago
1
Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio
2
Branch Prediction with Perceptrons 2
3
Branch Prediction with Perceptrons cont. 3
4
4 SNP/SNAP [St. Amant et al. 2008] u A version of piecewise linear neural prediction [Jiménez 2005] u Based on perceptron prediction u SNAP is a mixed digital/analog version of SNP u Uses analog circuit for costly dot-product operation u Enables interesting tricks e.g. scaling
5
5 Weight Scaling u Scaling weights by coefficients Different history positions have different importance!
6
6 The Algorithm: Parameters and Variables u C – array of scaling coefficients u h – the global history length u H – a global history shift register u A – a global array of previous branch addresses u W – an n × (GHL + 1) array of small integers u θ – a threshold to decide when to train
7
7 The Algorithm: Making a Prediction Weights are selected based on the current branch and the i th most recent branch
8
The Algorithm: Training u If the prediction is wrong or |output| ≤ θ then u For the i th correlating weight used to predict this branch: u Increment it if the branch outcome = outcome of i th in history u Decrement it otherwise u Increment the bias weight if branch is taken u Decrement otherwise 8
9
SNP/SNAP Datapath 9
10
10 Tricks u Use alloyed [Skadron 2000] global and per-branch history u Separate table of local perceptrons u Output from this stage multiplied by empircally determined coefficient u Training coefficients vector(s) u Multiple vectors initialized to f(i) = 1 / (A + B × i) u Minimum coefficient value determined empircally u Indexed by branch PC u Each vector trained with perceptron-like learning on-line
11
Tricks(2) u Branch cache u Highly associative cache with entries for branch information u Each entry contains: u A partial tag for this branch PC u The bias weight for this branch u An “ever taken” bit u A “never taken” bit u The “ever/never” bits avoid needless use of weight resources u The bias weight is protected from destructive interference u LRU replacement u >99% hit rate 11
12
Tricks(3) u Hybrid predictor u When perceptron output is below some threshold: u If a 2-bit counter gshare predictor has high confidence, use it u Else use a 1-bit counter PAs predictor u Multiple θs indexed by branch PC u Each trained adaptively [Seznec 2005] u Ragged array u Not all rows of the matrix are the same size 12
13
Benefit of Tricks 13 u Graph shows effect of one trick in isolation u Training coefficients yields most benefit
14
14 References u Jiménez & Lin, HPCA 2001 (perceptron predictor) u Jiménez & Lin, TOCS 2002 (global/local perceptron) u Jiménez ISCA 2005 (piecewise linear branch predictor) u Skadron, Martonosi & Clark, PACT 2000 (alloyed history) u Seznec 2005 (adaptively trained threshold) u St. Amant, Jiménez & Burger, MICRO 2008 (SNP/SNAP) u McFarling 1993, gshare u Yeh & Patt 1991, PAs
15
15 The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.