The Perceptron CS/CMPE 333 – Neural Networks
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one of the earliest neural network model proposed by Rosenblatt in 1958, 1962 It is based on the McCulloch-Pitts model of a neuron Characteristics Single-layer feedforward network (A layer of input nodes and one layer of computation/output nodes) Threshold activation function (or hard limiter function) Performs classification of linearly separable patterns Trained using error-correcting learning
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS3 The Perceptron Consider a single neuron perceptron
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS4 Linear Separability (1) The function of the single neuron perceptron is to classify the input x into one of two classes, C 1 and C 2. In general, a q neuron perceptron can classify the input x into 2 q classes For the two classes case, the decision boundary is defined by the hyperplane Σ i=1 p w i x i – Θ = 0
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS5 Linear Separability (2) When p = 2 (i.e. two inputs), the decision boundary is a line
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS6 Error-Correction Learning and Pattern Classification
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS7 Pattern Classification (1) Let n = number of training samples (set X); X 1 = set of training sample belonging to C 1 ; X 2 = set of training sample belonging to C 2 For a given sample n x(n) = [-1, x 1 (n),…, x p (n)] T = input vector w(n) = [Θ(n), w 1 (n),…, w p (n)] T = weight vector Net activity level v(n) = w T (n)x(n) Output y(n) = +1 if v(n) >= 0 and y(n) = -1 otherwise The decision hyperplane separates classes C 1 and C 2
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS8 Pattern Classification (2) If the two classes C 1 and C 2 are linearly separable, then there exists a weight vector w such that w T x ≥ 0 for all x belonging to class C 1 w T x < 0 for all x belonging to class C 2
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS9 Error-Correction Learning Update rule: w(n + 1) = w(n) + Δw(n) Learning process If x(n) is correctly classified by w(n), then w(n + 1) = w(n) Otherwise, the weight vector is updated as follows w(n + 1) = w(n) – η(n)x(n) if w(n) T x(n) ≥ 0 and x(n) belongs to C 2 and w(n + 1) = w(n) + η(n)x(n) if w(n) T x(n) < 0 and x(n) belongs to C 1
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS10 Perceptron Convergence Algorithm (1) Variables and parameters x(n) = [-1, x 1 (n),…, x p (n)]; w(n) = [Θ(n), w 1 (n),…,w p (n)] y(n) = actual response (output); d(n) = desired response η = learning rate, a positive number less than 1 Step 1: Initialization Set w(0) = 0, then do the following for n = 1, 2, 3, … Step 2: Activation Activate the perceptron by applying input vector x(n) and desired output d(n) Step 3: Computation of actual response y(n) = sgn[w T (n)x(n)] Where sgn(.) is the signum function
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS11 Perceptron Convergence Algorithm (2) Step 4: Adaptation of weight vector w(n+1) = w(n) + η[d(n) – y(n)]x(n) Where d(n) = +1 if x(n) belongs to C 1 d(n) = -1 if x(n) belongs to C 2 Step 5 Increment n by 1, and go back to step 2
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS12 Performance Measure (1) A learning rule is designed to optimize a performance measure However, in the development of the perceptron convergence algorithm we did not mention a performance measure Intuitively, what would be an appropriate performance measure for a classification neural network? Define the performance measure J = -E[e(n)v(n)] Or, as an instantaneous estimate J’(n) = -e(n)v(n) e(n) = error at iteration n = d(n) – y(n); v(n) = linear combiner output at iteration n; E = expectation operator
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS13 Performance Measure (2) Can we derive our learning rule by minimizing this performance function… Now v(n) = w T (n)x(n), thus Learning rule
CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS14 Concluding Remarks A single layer perceptron can perform pattern classification only on linearly separable patterns, regardless of the type of nonlinearity (hard limiter, signoidal) Papert and Minsky in 1969 elucidated limitations of Rosenblatt’s single layer perceptron (e.g. requirement of linear separability, inability to solve XOR problem) and cast doubt on the viability of neural networks However, multilayer perceptron and the back- propagation algorithm overcomes many of the shortcomings of the single layer perceptron