U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Neural Networks and Self-Organising Maps CSC3417 Semester 1, 2007
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Perceptron (one node neural network) Y = f( w i x i )Y = f( w i x i ) + w0w0 w1w1 w2w2 x0x0 x1x1 x2x2 y
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d R Code for a Perceptron step <- function(sigma) {step <- function(sigma) { if (sigma<0) { if (sigma<0) { return (0); return (0); } else { return (1); } } else { return (1); } } neural <- function(weights,inputs,output,eta) {neural <- function(weights,inputs,output,eta) { sigma <- t(inputs) %*% weights; sigma <- t(inputs) %*% weights; yhat <- step(sigma); yhat <- step(sigma); diff <- output-yhat; diff <- output-yhat; for (k in 1:length(weights)) { for (k in 1:length(weights)) { weights[[k]] <- weights[[k]] + eta*inputs[[k]]*diff; weights[[k]] <- weights[[k]] + eta*inputs[[k]]*diff; } } return (weights); return (weights); } Start with random weights and use this function iteratively.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Try it: > w w <- runif(3); > w [1] > w w <- neural(w,c(1,0,1),0,0.1) > w> w [1] [1] > w w <- neural(w,c(1,1,0),1,0.1) > w> w [1] [1] > w w <- neural(w,c(1,1,1),0,0.1) > w> w [1] [1] ……………. > w w <- neural(w,c(1,1,1),0,0.1) > w> w [1] [1] > w w <- neural(w,c(1,1,0),1,0.1) > w> w [1] [1]
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Scilab version function [stp] = step (sigma) if sigma<0 if sigma<0 stp = 0; stp = 0; else else stp = 1; stp = 1; end endendfunction function [w] = neural(weights,inputs,output,eta) sigma = weights * inputs'; sigma = weights * inputs'; yhat = step(sigma); yhat = step(sigma); diff = output-yhat; diff = output-yhat; w = weights; w = weights; for k=1:length(weights) for k=1:length(weights) w(k) = weights(k) + eta*inputs(k)*diff; w(k) = weights(k) + eta*inputs(k)*diff; end endendfunction
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Try it: -->w = rand(1,3) w = w = >w = neural(w,[1,0,1],0,0.1) w = w = >w = neural(w,[1,1,0],1,0.1) w = w = >w = neural(w,[1,1,1],0,0.1) w = w = ………..
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Aspects of Neural Network Use: ArchitectureArchitecture –Input : hidden : output Activation ruleActivation rule –Weights, add, sigmoidal function Learning ruleLearning rule –Correct weights, back-propagation Avoiding over-learningAvoiding over-learning –Stop learning sufficiently early.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Multilayer networks Learning by back-propagation.Learning by back-propagation.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Back Propagation Our objective is to minimise:Our objective is to minimise: Here x is the inputs and w the weights; y are output values, and y hat our estimates.Here x is the inputs and w the weights; y are output values, and y hat our estimates. Back-propagation is a way to compute the gradient of the derivative of this objective function.Back-propagation is a way to compute the gradient of the derivative of this objective function.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d How to avoid overfitting Stop learning earlyStop learning early Reduce the learning speedReduce the learning speed Bayesian approach:Bayesian approach: –“The overfitting problem can be solve by using a Bayesian approach to control model complexity.” MacKay, Information theory, Inference and Learning Algorithms, CUP, 2003.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Advantages of Bayesian Approach (MacKay 2003) 1.No ‘test set’ required – hence more data efficient. 2.Overfitting parameters just appear as ordinary parameters. 3.Bayesian objective function is smooth, hence easier to optimize. 4.Gradient of objective function can be evaluated. McKay’s book is online: ac.uk/mackay/Book.html
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Self-organising Maps Each node has a vector of parameters of the same dimension as the inputs.Each node has a vector of parameters of the same dimension as the inputs. Training moves nodes in the neighbourhood of the closest node to an input closer to each other.Training moves nodes in the neighbourhood of the closest node to an input closer to each other. Many choices of neighbourhood and adjustment are possible.Many choices of neighbourhood and adjustment are possible.
T h e U n i v e r s i t y o f S o u t h e r n Q u e e n s l a n dU n i v e r s i t y o f S o u t h e r n Q u e e n s l a n d Neighbourhood and adjustment where and