CSSE463: Image Recognition Day 21 Upcoming schedule: Upcoming schedule: Exam covers material through SVMs Exam covers material through SVMs
Multilayer feedforward neural nets Many perceptrons Many perceptrons Organized into layers Organized into layers Input (sensory) layer Hidden layer(s): 2 proven sufficient to model any arbitrary function Output (classification) layer Powerful! Powerful! Calculates functions of input, maps to output layers Calculates functions of input, maps to output layers Example Example x1x1 x2x2 x3x3 y1y1 Sensory (HSV) Hidden (functions) Classification (apple/orange/banana) y2y2 y3y3 Q4
XOR example 2 inputs 2 inputs 1 hidden layer of 5 neurons 1 hidden layer of 5 neurons 1 output 1 output
Backpropagation algorithm Initialize all weights randomly For each labeled example: For each labeled example: Calculate output using current network Update weights across network, from output to input, using Hebbian learning Iterate until convergence Iterate until convergence Epsilon decreases at every iteration Matlab does this for you. Matlab does this for you. matlabNeuralNetDemo.m matlabNeuralNetDemo.m x1x1 x2x2 x3x3 y1y1 y2y2 y3y3 a. Calculate output (feedforward) b. Update weights (feedback) R peat Q5
Parameters Most networks are reasonably robust with respect to learning rate and how weights are initialized Most networks are reasonably robust with respect to learning rate and how weights are initialized However, figuring out how to However, figuring out how to normalize your input normalize your input determine the architecture of your net determine the architecture of your net is a black art. You might need to experiment. One hint: is a black art. You might need to experiment. One hint: Re-run network with different initial weights and different architectures, and test performance each time on a validation set. Pick best. Re-run network with different initial weights and different architectures, and test performance each time on a validation set. Pick best.
References This is just the tip of the iceberg! See: This is just the tip of the iceberg! See: Sonka, pp Sonka, pp Laurene Fausett. Fundamentals of Neural Networks. Prentice Hall, Laurene Fausett. Fundamentals of Neural Networks. Prentice Hall, Approachable for beginner. Approachable for beginner. C.M. Bishop. Neural Networks for Pattern Classification. Oxford University Press, C.M. Bishop. Neural Networks for Pattern Classification. Oxford University Press, Technical reference focused on the art of constructing networks (learning rate, # of hidden layers, etc.) Technical reference focused on the art of constructing networks (learning rate, # of hidden layers, etc.) Matlab neural net help is very good Matlab neural net help is very good
SVMs vs. Neural Nets SVM: SVM: Training can take a long time with large data sets. Consider that you’ll want to experiment with parameters… Training can take a long time with large data sets. Consider that you’ll want to experiment with parameters… But the classification runtime and space are O(sd), where s is the number of support vectors, and d is the dimensionality of the feature vectors. But the classification runtime and space are O(sd), where s is the number of support vectors, and d is the dimensionality of the feature vectors. In the worst case, s = size of whole training set (like nearest neighbor) In the worst case, s = size of whole training set (like nearest neighbor) But no worse than implementing a neural net with s perceptrons in the hidden layer. But no worse than implementing a neural net with s perceptrons in the hidden layer. Empirically shown to have good generalizability even with relatively-small training sets and no domain knowledge. Empirically shown to have good generalizability even with relatively-small training sets and no domain knowledge. Neural networks: Neural networks: can tune architecture. Lots of parameters! can tune architecture. Lots of parameters! Q3
How does svmfwd compute y1? y1 is just the weighted sum of contributions of individual support vectors: d = data dimension, e.g., 294, = kernel width. numSupVecs, svcoeff (alpha) and bias are learned during training. Note: looking at which of your training examples are support vectors can be revealing! (Keep in mind for sunset detector and term project) Much easier computation than training Much easier computation than training Could implement on a device without MATLAB (e.g., a smartphone) easily Could implement on a device without MATLAB (e.g., a smartphone) easily Check out sample Java code Check out sample Java code
CSSE463 Roadmap