Branch Prediction with Neural- Networks: Hidden Layers and Recurrent Connections Andrew Smith CSE Dept. June 10, 2004
Outline What is a Perceptron? –Learning? What is a Feed-Forward Network? –Learning? What is a Recurrent Network –Learning? How to do it on hardware??? Results – Adding hidden units Results – Modeling latency of slow networks. Results – Varying the hardware budget
The Perceptron Linear (affine) combination of inputs DECISION
Perceptron Learning inputs x j, outputs y i and targets t i are {-1, +1} Cycle through training set if X i = (x1, x2, …, xd) is misclassified, do w j w j + a * t i * x j end if
Feed-Forward Network A network of perceptrons…
Feed-forward Network Learning Use A gradient-descent algorithm. Network output is: Error is: Derivatives of error are:
Feed-Forward Networks, BACKPROP But no error defined for hidden units??? Solution, assign responsibility for output units error to each hidden unit, then descend gradient This is called “back-propagation”
Recurrent Networks Now it has state…
Learning weights for a RNN Unroll it and use back-propagation? No! Too Slow, and wrong…
Use Real-Time Recurrent Learning Keep list, at each time T: –For each Unit u –For each Weight w –Keep partial derivative du/dw Update with recurrence relation:
But on hardware??? Idea, represent real numbers in [-4, +4] with integers in [ ] Adding, is ok… –1024 i j = (i+j)1024 Multiplying requires a divide (shift): –(1024 i) * (1024 j) = (i*j)1024^2 Compute activation function by looking up in a discretized table.
Results, different numbers of hidden units
Results, Different latencies
Results, different HW budget (crafty)
Results, Different HW budges (BZIP-PROGRAM)
Conclusions DON’T use a RNN! Maybe use a NNet with a few hidden units, but don’t over do it Future work: explore trade-off between –Number, size (hidden units), inputs