1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007

2 Presentation Outline Background and Objectives Perceptron behavior Local value prediction Global value prediction Criticality prediction Conclusions

3 Motivation: Jimenez’s Perceptron Branch Predictor 27% reduction in misprediction over gshare 15.8% increase in performance over gshare 1 why better? can consider longer history 1 Jimenez and Lin, “Dynamic Branch Prediction with Perceptrons.”, 2002.

4 Problem of Lookup Tables Size grows exponentially with history Result: must consider small subset of available data

5 Global vs. Local Local history: past iterations of same instruction Global history: all past dynamic instructions

6 Perceptron Predictions: 1. Dot product of binary inputs and integer weights 2. Apply threshold: if +, predict 1; if -, predict 0 Learning objective: Weight values should reflect input’s correlation

7 Training strategies Training by correlation if actual==input k : w k ++ else: w k -- Training by error error = actual - predicted w k = w k + input k error

8 Linear Separability Weight can only learn one correlation: direct (positive) inverse (negative)

9 Dissertation Objectives Analyze behavior of perceptrons when used to replace tables Coping with limitations of perceptrons and their implementations Applying perceptrons to value prediction Applying perceptrons to criticality prediction

10 Dissertation Contributions Perceptron Local Value Predictor can consider longer local histories Perceptron Global-based Local Value Predictor can use global information to choose local values Two Perceptron Global Value Predictors Perceptron Global Criticality Predictor Comparison and analysis of: perceptron training approaches multiple-bit topologies interference reduction strategies

11 Analyses How perceptrons behave when replacing tables What effect the training approach has Design and behavior of different multiple-bit perceptrons Dealing with history interference

12 Context-based Learning Concatenated history pattern (“context”) indexes table

13 Pattern Compatibility

14 What affects perceptron learning? Noise from uncorrelated inputs Imbalance between pattern occurrences False correlations Effects: Perceptron takes longer to learn Perceptron never learns

15 Noise Training by correlation: weights grow large rapidly: less susceptible Training by error: weights don’t grow until misprediction: susceptible Solution? Exponential Weight Growth

16 Studying Noise Perceptron modeled independently of application p random patterns chosen for each level of correlation: At n bits correlated, a random correlation direction (direct/inverse) chosen for each of n bits Target randomly chosen for each pattern; Correlation direction determines first n bits of each pattern Remaining bits chosen randomly for each pattern Perceptron is trained on each pattern set Average of training time for 1000 random pattern sets plotted pattern set generation for n=4, p=2: ddid 1101xxxx – 1 0010xxxx – 0 11010101 – 1 00101110 – 0

17 How does noise affect training time?

18 How does imbalance affect training time?

19 How does imbalance affect learning?

20 Why can’t training-by-correlation handle imbalance?

21 Findings Increasing history size is bad if the percentage of correlated inputs decrease Must use training-by-error if there is poor correlation and imbalance

22 Multibit Perceptron Predicts values, not single bits What is a value correlation? Input value infers a particular output value 5 --> 4 Approaches: Disjoint Fully Coupled Weight per value

23 Disjoint Perceptron Tradeoff: + small size - can only learn from respective bits

24 Fully Coupled Perceptron Tradeoff: + can learn from any past bit - more weights

25 Learning abilities compared

26 Weight-per-Value Perceptron Tradeoff: + Can always learn - Tons of weights

27 History Interference

28 How common is interference?

29 How does interference affect perceptrons? constructive destructive neutral weight-destructive value-destructive

30 Interference in Perceptron Branch Prediction

31 Coping: Assigned Seats Tradeoff: + no additional size - can’t consider multiple iterations of an instruction

32 Weight for each interfering branch (“Piecewise Linear”) Tradeoff: + interference is completely removed - massive size

33 Simulator new superscalar cycle-accurate execution-driven simulator can accurately model value prediction & criticality

34 Value Prediction What is it? predicting instructions’ data values to overcome data dependencies Why consider it? requires a multiple-bit prediction, not a single-bit

35 Table-based Predictor Limitations: exponential growth in past values & value history can only consider local history Storage: 70kB for 4 values, 34MB for 8 values, 74*10 18 B for 16 values

36 Perceptron in Pattern Table (PPT) Tradeoff: + Few perceptrons needed (for 4 past values) Can consider longer histories - Exponential growth with # of past values

37 Perceptron in Value Table (PVT) Tradeoff: + Linear growth in both value history and # past values - More perceptrons needed

38 Results: PVT 2.4-5.6% accuracy increase, 0.5-1.2% performance increase 102kB-1.3MB storage needed

39 Results: PPT 1.4-2.8% accuracy decrease: not a good approach 72kB-115kB storage needed

40 Global-Local Value Prediction Uses global correlation to predict locally available values

41 Global-Local Predictor

42 Global-Global Prediction Tradeoff: + Less value storage - More bits needed per perceptron input

43 Global Bitwise Tradeoff: + No value storage Not limited to past values only - Many more bits needed per perceptron input

44 Global Predictors Compared Global-Local: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Global-Global: 7.6% accuracy increase, 6.7% performance increase 1.3MB storage needed Bitwise: 12.7% accuracy increase, 5.3% performance increase 4.2MB storage needed

45 Can Bitwise Predict New Values? 5.0% of all predictions are correct values never seen before Further 9.8% are correct values not seen in local history

46 Multibit Topologies Compared Disjoint: 3.1% accuracy increase, 1.6% performance increase 1.2MB storage needed Fully Coupled: 6.8% accuracy decrease, 1.5% performance decrease 3.8MB storage needed Weight per Value: 10.7% accuracy increase, 4.4% performance increase 21.5MB storage needed

47 Training Approaches Compared: Global-Local

48 Training Approaches Compared: PVT Local

49 Final Weight Values: Distribution and Accuracy

50 Anti-Interference Compared

51 Criticality Prediction What is it? Predicting whether each instruction is on the critical path Why consider it? lack of good training information multiple input factors

52 Counter-based Criticality Predicts four “criteria” that indicate criticality: QOLD  oldest waiting instruction QOLDDEP  parent of a QOLD instruction ALOLD  oldest instruction in machine QCONS  instruction with the most dependencies

53 Perceptron-per-Criteria (PEC) Tradeoff: + One input per history entry - Can’t learn relationships between criteria

54 Single Perceptron (SP) Tradeoff: + One input per history entry & one perceptron - Can’t learn effects of individual criteria

55 Single Perceptron with Input for Each Criterion (SPC) Tradeoff: + Can learn relative relationships of each criterion - Four inputs per perceptron

56 Accuracy Compared PEC: 2.9% accuracy increase, 4.2MB storage needed SP: 4.1% accuracy increase, 1.0MB storage needed SPC: 6.6% accuracy increase, 4.2MB storage needed

57 Performance with Value Prediction

58 Training Approaches Compared

59 Final SPC Weight Distribution

60 Conclusions Perceptron Local Value Predictor 5.6% accuracy increase with 1.3MB storage Perceptron Global-based Local Value Predictor 3.1% accuracy increase with 1.2MB storage; 10.7% increase for 21.5MB storage Two Perceptron Global Value Predictors 7.6% accuracy increase with 1.3MB storage; 12.7% increase for 4.2MB storage Perceptron Global Criticality Predictor 6.6% accuracy increase with 4.2MB storage

61 Conclusions (continued) Perceptron training approaches Training-by-error must be used for poorly correlated applications Multiple-bit topologies Disjoint - best approach if hardware is a concern Fully coupled - performs poorly with low correlation Weight-per-value - performs very well but requires high hardware costs Interference reduction Assigned Seats - modest improvement but no additional hardware Piecewise - substantially more hardware, significant improvement

62 Questions

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

Similar presentations

Presentation on theme: "1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

Similar presentations

Presentation on theme: "1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007."— Presentation transcript:

Similar presentations

About project

Feedback