IJCNN, July 27, 2004 Extending SpikeProp Benjamin Schrauwen Jan Van Campenhout Ghent University Belgium
IJCNN, July 27, Overview ● Introduction ● SpikeProp ● Improvements ● Results ● Conclusions
IJCNN, July 27, Introduction ● Spiking neural networks get increased attention: ● Biologically more plausible ● Computationally stronger (W. Maass) ● Compact and fast implementation in hardware possible (analogue and digital) ● Have temporal nature ● Main problem: supervised learning algorithms
IJCNN, July 27, SpikeProp ● Introduced by S. Bohte et al. in 2000 ● An error-backpropagation learning algorithm ● Only for SNN using “time-to-first-spike” coding t ~1/a
IJCNN, July 27, Architecture of SpikeProp ● Originally introduced by Natschläger and Ruf ● Every connection consists of several synaptic connections ● All 16 synaptic connections have enumerated delays (1- 16ms) and different weights, originally same filter
IJCNN, July 27, SRM neuron ● Modified Spike Response Model (Gerstner) t Neuron reset of no interest because only one spike needed !
IJCNN, July 27, Idea behind SpikeProp Minimize SSE between actual output spike time and desired output spike time Change weight along negative direction of the gradient
IJCNN, July 27, Math of SpikeProp Only output layer given Linearise around threshold crossing time
IJCNN, July 27, Problems with SpikeProp ● Overdetermined architecture ● Tendency to get stuck when a neuron stops firing ● Problems with weight initialisation
IJCNN, July 27, Solving some of the problems ● Instead of enumerating parameters: learn them ● Delays ● Synaptic time constants ● Thresholds ● We can use much more limited architecture ● Add specific mechanism to keep neurons firing: decrease threshold
IJCNN, July 27, Learn more parameters ● Quite similar to weight update rule ● Gradient of error with respect to parameter ● Parameter specific learning rate
IJCNN, July 27, Math of the improvements - delays Delta is the same as for weight rule, thus different delta formula for output as for inner layers.
IJCNN, July 27, Math of the improvements – synaptic time constants
IJCNN, July 27, Math of the improvements - thresholds
IJCNN, July 27, What if training gets stuck? ● If one of the neurons in the network stops firing: training rule stops working ● Solution: actively lower threshold of neuron whenever it stops firing (multiply by 0.9) ● Same as scaling all the weights up ● Improves convergence
IJCNN, July 27, What about weight initialisation ● Weight initialisation is a difficult problem ● Original publication has vague description of process ● S. M. Moore contacted S. Bohte personally for clarifying the subject for his masters thesis ● Weight initialisation is done by a complex procedure ● Moore concluded that: ”weights should be initialized in such a way that every neuron initially fires, and that its membrane potential doesn’t surpass the threshold too much”
IJCNN, July 27, What about weight initialisation ● In this publication we chose a very simple initialisation procedure ● Initialise all weights randomly ● Afterwards, set a weight such that the sum of all weights is equal to 1.5 ● Convergence rates could be increased by using more complex initialisation procedure
IJCNN, July 27, Problem with large delays During the testing of the algorithm a problem arose when the trained delays got very large: delay learning stopped If input is preceded by output: problem Solved by constraining delays Output of neuron Input of neuron
IJCNN, July 27, Results ● Tested for binary XOR (MSE = 1ms) Bohte: architecture 16 synaptic terminals 20*16 = 320 weights 250 training cycles Improvements: architecture 2 synaptic terminals 20*2 = 40 weights 130 training cycles 90% convergence architecture 2 synaptic terminals 12*2 = 24 weights 320 training cycles 60% convergence
IJCNN, July 27, Results ● Optimal learning rates (found by experiment): ● ● Some rates seem very high, but that is because the values we work with are times expressed in ms ● Idea that learning rate must be approx. 0.1 is only correct when input and weights are normalised !!
IJCNN, July 27, Conclusions ● Because parameters can be learned, no enumeration is necesarry, thus architectures are much smaller ● For XOR: ● 8 times less weights needed ● Learning converges faster (50% of original) ● No complex initialisation functions ● Positive and negative weights can be mixed ● But convergence deteriorate with further reduction of weights
IJCNN, July 27, Conclusions ● Technique only tested on small problem, should be tested on real world applications ● But, we are currently preparing a journal paper on a new backprop rule that: ● supports a multitude of coding hypotheses (population coding, convolution coding,...) ● better convergence ● simpler weight initialisation ●...