Download presentation
Presentation is loading. Please wait.
1
1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007
2
2 Presentation Outline Background and Objectives Perceptron behavior Local value prediction Global value prediction Criticality prediction Conclusions
3
3 Motivation: Jimenez’s Perceptron Branch Predictor 27% reduction in misprediction over gshare 15.8% increase in performance over gshare 1 why better? can consider longer history 1 Jimenez and Lin, “Dynamic Branch Prediction with Perceptrons.”, 2002.
4
4 Problem of Lookup Tables Size grows exponentially with history Result: must consider small subset of available data
5
5 Global vs. Local Local history: past iterations of same instruction Global history: all past dynamic instructions
6
6 Perceptron Predictions: 1. Dot product of binary inputs and integer weights 2. Apply threshold: if +, predict 1; if -, predict 0 Learning objective: Weight values should reflect input’s correlation
7
7 Training strategies Training by correlation if actual==input k : w k ++ else: w k -- Training by error error = actual - predicted w k = w k + input k error
8
8 Linear Separability Weight can only learn one correlation: direct (positive) inverse (negative)
9
9 Dissertation Objectives Analyze behavior of perceptrons when used to replace tables Coping with limitations of perceptrons and their implementations Applying perceptrons to value prediction Applying perceptrons to criticality prediction
10
10 Dissertation Contributions Perceptron Local Value Predictor can consider longer local histories Perceptron Global-based Local Value Predictor can use global information to choose local values Two Perceptron Global Value Predictors Perceptron Global Criticality Predictor Comparison and analysis of: perceptron training approaches multiple-bit topologies interference reduction strategies
11
11 Analyses How perceptrons behave when replacing tables What effect the training approach has Design and behavior of different multiple-bit perceptrons Dealing with history interference
12
12 Context-based Learning Concatenated history pattern (“context”) indexes table
13
13 Pattern Compatibility
14
14 What affects perceptron learning? Noise from uncorrelated inputs Imbalance between pattern occurrences False correlations Effects: Perceptron takes longer to learn Perceptron never learns
15
15 Noise Training by correlation: weights grow large rapidly: less susceptible Training by error: weights don’t grow until misprediction: susceptible Solution? Exponential Weight Growth
16
16 Studying Noise Perceptron modeled independently of application p random patterns chosen for each level of correlation: At n bits correlated, a random correlation direction (direct/inverse) chosen for each of n bits Target randomly chosen for each pattern; Correlation direction determines first n bits of each pattern Remaining bits chosen randomly for each pattern Perceptron is trained on each pattern set Average of training time for 1000 random pattern sets plotted pattern set generation for n=4, p=2: ddid 1101xxxx – 1 0010xxxx – 0 11010101 – 1 00101110 – 0
17
17 How does noise affect training time?
18
18 How does imbalance affect training time?
19
19 How does imbalance affect learning?
20
20 Why can’t training-by-correlation handle imbalance?
21
21 Findings Increasing history size is bad if the percentage of correlated inputs decrease Must use training-by-error if there is poor correlation and imbalance
22
22 Multibit Perceptron Predicts values, not single bits What is a value correlation? Input value infers a particular output value 5 --> 4 Approaches: Disjoint Fully Coupled Weight per value
23
23 Disjoint Perceptron Tradeoff: + small size - can only learn from respective bits
24
24 Fully Coupled Perceptron Tradeoff: + can learn from any past bit - more weights
25
25 Learning abilities compared
26
26 Weight-per-Value Perceptron Tradeoff: + Can always learn - Tons of weights
27
27 History Interference
28
28 How common is interference?
29
29 How does interference affect perceptrons? constructive destructive neutral weight-destructive value-destructive
30
30 Interference in Perceptron Branch Prediction
31
31 Coping: Assigned Seats Tradeoff: + no additional size - can’t consider multiple iterations of an instruction
32
32 Weight for each interfering branch (“Piecewise Linear”) Tradeoff: + interference is completely removed - massive size
33
33 Simulator new superscalar cycle-accurate execution-driven simulator can accurately model value prediction & criticality
34
34 Value Prediction What is it? predicting instructions’ data values to overcome data dependencies Why consider it? requires a multiple-bit prediction, not a single-bit
35
35 Table-based Predictor Limitations: exponential growth in past values & value history can only consider local history Storage: 70kB for 4 values, 34MB for 8 values, 74*10 18 B for 16 values
36
36 Perceptron in Pattern Table (PPT) Tradeoff: + Few perceptrons needed (for 4 past values) Can consider longer histories - Exponential growth with # of past values
37
37 Perceptron in Value Table (PVT) Tradeoff: + Linear growth in both value history and # past values - More perceptrons needed
38
38 Results: PVT 2.4-5.6% accuracy increase, 0.5-1.2% performance increase 102kB-1.3MB storage needed
39
39 Results: PPT 1.4-2.8% accuracy decrease: not a good approach 72kB-115kB storage needed
40
40 Global-Local Value Prediction Uses global correlation to predict locally available values
41
41 Global-Local Predictor
42
42 Global-Global Prediction Tradeoff: + Less value storage - More bits needed per perceptron input
43
43 Global Bitwise Tradeoff: + No value storage Not limited to past values only - Many more bits needed per perceptron input
44
44 Global Predictors Compared Global-Local: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Global-Global: 7.6% accuracy increase, 6.7% performance increase 1.3MB storage needed Bitwise: 12.7% accuracy increase, 5.3% performance increase 4.2MB storage needed
45
45 Can Bitwise Predict New Values? 5.0% of all predictions are correct values never seen before Further 9.8% are correct values not seen in local history
46
46 Multibit Topologies Compared Disjoint: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Fully Coupled: 6.8% accuracy decrease, 1.5% performance decrease 3.8MB storage needed Weight per Value: 10.7% accuracy increase, 4.4% performance increase 21.5MB storage needed
47
47 Training Approaches Compared: Global-Local
48
48 Training Approaches Compared: PVT Local
49
49 Final Weight Values: Distribution and Accuracy
50
50 Anti-Interference Compared
51
51 Criticality Prediction What is it? Predicting whether each instruction is on the critical path Why consider it? lack of good training information multiple input factors
52
52 Counter-based Criticality Predicts four “criteria” that indicate criticality: QOLD oldest waiting instruction QOLDDEP parent of a QOLD instruction ALOLD oldest instruction in machine QCONS instruction with the most dependencies
53
53 Perceptron-per-Criteria (PEC) Tradeoff: + One input per history entry - Can’t learn relationships between criteria
54
54 Single Perceptron (SP) Tradeoff: + One input per history entry & one perceptron - Can’t learn effects of individual criteria
55
55 Single Perceptron with Input for Each Criterion (SPC) Tradeoff: + Can learn relative relationships of each criterion - Four inputs per perceptron
56
56 Accuracy Compared PEC: 2.9% accuracy increase, 4.2MB storage needed SP: 4.1% accuracy increase, 1.0MB storage needed SPC: 6.6% accuracy increase, 4.2MB storage needed
57
57 Performance with Value Prediction
58
58 Training Approaches Compared
59
59 Final SPC Weight Distribution
60
60 Conclusions Perceptron Local Value Predictor 5.6% accuracy increase with 1.3MB storage Perceptron Global-based Local Value Predictor 3.1% accuracy increase with 1.2MB storage; 10.7% increase for 21.5MB storage Two Perceptron Global Value Predictors 7.6% accuracy increase with 1.3MB storage; 12.7% increase for 4.2MB storage Perceptron Global Criticality Predictor 6.6% accuracy increase with 4.2MB storage
61
61 Conclusions (continued) Perceptron training approaches Training-by-error must be used for poorly correlated applications Multiple-bit topologies Disjoint - best approach if hardware is a concern Fully coupled - performs poorly with low correlation Weight-per-value - performs very well but requires high hardware costs Interference reduction Assigned Seats - modest improvement but no additional hardware Piecewise - substantially more hardware, significant improvement
62
62 Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.