ECE 471/571 – Lecture 12 Perceptron
Different Approaches - More Detail Pattern Classification Statistical Approach Syntactic Approach Supervised Unsupervised Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Parametric learning (ML, BL) k-means Non-Parametric learning (kNN) Winner-take-all NN (Perceptron, BP) Kohonen maps Dimensionality Reduction Fisher’s linear discriminant K-L transform (PCA) Performance Evaluation ROC curve TP, TN, FN, FP Stochastic Methods local optimization (GD) global optimization (SA, GA) ECE471/571, Hairong Qi
Perceptron x1 w1 w2 x2 S …… z wd xd w0 1 Bias or threshold
Perceptron A program that learns “concepts” based on examples and correct answers It can only respond with “true” or “false” Single layer neural network By training, the weight and bias of the network will be changed to be able to classify the training set with 100% accuracy
Perceptron Criterion Function Y: the set of samples misclassified by a Gradient descent learning
Perceptron Learning Rule x1 w1 w2 x2 S …… z wd xd -w0 w(k+1) = w(k) + (T – z) x -w0(k+1) = -w0(k) + (T – z) 1 T is the expected output z is the real output
Training Step1: Samples are presented to the network Step2: If the output is correct, no change is made; Otherwise, the weight and biases will be updated based on perceptron learning rule Step3: An entire pass through all the training set is called an “epoch”. If no change has been made for the epoch, stop. Otherwise, go back Step1
Exercise (AND Logic) x1 w1 w2 z x2 -w0 1 x1 x2 T 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 1 1 w1 w2 -w0
% demonstration on perceptron % AND gate % % Hairong Qi input = 2; tr = 4; w = rand(1,input+1); % generate the initial weight vector x = [0 0; 1 0; 0 1; 1 1]; % the training set (or the inputs) T = [0; 0; 0; 1]; % the expected output % learning process (training process) finish = 0; while ~finish disp(w) for i=1:tr z(i) = w(1:input) * x(i,:)' > w(input+1); w(1:input) = w(1:input) + (T(i)-z(i))* x(i,:); w(input+1) = w(input+1) - (T(i)-z(i)); end if sum(abs(z-T')) == 0 finish = 1; disp(z) pause
Visualizing the Decision Rule
Limitations The output only has two values (1 or 0) Can only classify samples which are linearly separable (straight line or straight plane) Can’t train network functions like XOR
Examples
Examples