Lecture 6, CS5671 Neural Networks Introduction –Biological neurons –Artificial neurons –Concepts –Conventions Single Layer Perceptron –Example –Limitation
Lecture 6, CS5672 Biological neuron Neuron = Cell superclass in nervous system Specs –Total number = ~10 11 (Size of hard disk circa ’03) Maximum number before birth 10 4 lost/day (More if you don’t study everyday!) – Connections/neuron = ~10 4 –Signal Rate = ~10 3 Hz (Cpu = 10 9 Hz circa ’03) –Signal Propagation Velocity = 10 (-1 to 2) /sec –Power = 40W
Lecture 6, CS5673 Biological Neuron Connectivity important (Just like human society) –Connected To what and To what extent –Basis of memory and learning (revising opinions; learning lessons in life) –Revision important (And why reading for the first time on eve of exam is a flawed strategy) –Covering eye to prevent loss of vision in squint (Why advertising industry persists, subliminally or blatantly)
Lecture 6, CS5674 Artificial Neural Networks What –Connected units with inputs and outputs Why –Can “learn” and approximate any function, including non-linear functions (XOR) When –Basic idea more than 60 years old –Resurgence of interest once coverage extended to non-linear problems
Lecture 6, CS5675 Concepts Trial –Output = Verdict = Guilty/Not guilty –Processing neurons = Jury members –Output neuron = Jury Foreman –Inputs = Witnesses/Lawyers –Weights = Credibility of Witnesses/Lawyers Investment –Output decision = Buy/Sell –Inputs = Financial advisors –Weights = Past reliability of advice –Iterate = Revise weights after results
Lecture 6, CS5676 Concepts Types of learning –Supervised NN learns from a series of labeled examples (human propagation of prejudice) Distinction between training and prediction phases –Unsupervised NN discovers clusters and classifies examples Also called self-organizing networks (human tendency) Typically, prediction rules cannot be derived from an NN
Lecture 6, CS5677 Conventions p1 p2 p3 pN 1h1 1h2 2h1 2h2 1hM2hP o1 o2 oK (Input)( Hidden )(Output) LAYERS w 1,1 w M,N w 1,2
Lecture 6, CS5678 Conventions Generally, rich connectivity between, but not within layers Output for any neuron = Transfer/Activation function f(x) = f(WP + b) where W = Weight Matrix [w 1,1 w 1,2 w 1,3 …. w 1,N ] P = Input Matrix WP = Matrix product = [w 1,1 p1+w 1,2 p2+w 1,3 p3... +w 1,N pN] b = Bias/Offset p1 p2 pN
Lecture 6, CS5679 Activation Functions Hard limit: f(x) = [0/1]. If x < 0, f(x) = 0, else 1 Symmetric hard limit: f(x) = [-1/1]. If x < 0, f(x) = -1, else 1 Linear: f(x) = x Positive linear: f(x) = [0,x]. If x < 0, f(x) = 0, else x Saturating linear: f(x) = [0,1]. If x 1, then 1, else x Symmetric Saturating linear: f(x) = [-1,1]. If x 1, then 1, else x Log-sigmoid: f(x) = 1/(1+e -x ) Competitive (multiple neuron layer; winner takes all): f(x i ) = 1 | x i > (not x i ); f(not x i ) = 0;
Lecture 6, CS56710 Conventions Output for any layer = column matrix = [ f(W 1 P + b 1 ) f(W 2 P + b 2 ). f(W M P + b M )] where W i = Weight Matrix [w i,1 w i,2 w i,3 …. w 1,N ]
Lecture 6, CS56711 Single Layer Perceptron Single Layer Single Neuron Perceptron –Consider multiple inputs (column vector) with respective weights (row vector) to a neuron that serves as the output neuron –Assume f(x) is the hard limit function –Labeled training examples are provided {(P1,t1), (P2,t2) …. (PZ,tZ)}, where each t i is 0 or 1. –Learning rule (NOT the same as prediction rule) Error e = Target - f(x) For each input set W current = W previous + eP b current = b previous + e Iterate till e is zero for all training examples
Lecture 6, CS56712 Single Layer Perceptron Single Layer Multiple Neuron Perceptron –Consider multiple inputs (column vector) with respective weights (row vector) to a layer of several neurons that serve as the output –Assume f(x) is the hard limit function –Labeled training examples are provided {(P1,t1), (P2,t2) …. (PZ,tZ)}, where each t i is a column vector consisting of 0s and/or 1s. –Learning rule (NOT the same as prediction rule; use vectors for the error and bias) Error E = Target - f(x) For each input set W current = W previous + EP B current = B previous + E Iterate till E is zero for all training examples