Download presentation
Presentation is loading. Please wait.
Published byBambang Adi Tan Modified over 6 years ago
5
Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type
Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning Perceptron : Nonlinear Learning Real valued output Binary output Binary output Objective : W x > 0 , x ω₁ W x < 0 , x ω₂ T ⊂ T T Objective : W x = Any Desired Real Value Can be used for Both Classification / Regression Objective : W x = +- T ⊂ Correction for Correct Sign, too No correction for correct sign Always Finds Perfect hyperplanes for Linearly Separable Cases All Patterns x Move Hyperplanes by an Amount Proportional to x Finds Optimum Hyperplanes for Linearly Nonseparable Cases Error Patterns Push & Pull Hyperplanes
6
Chapter 2. Single Layer Perceptron
Ref. Perceptron, Madaline and Backpropagation, Widrow, Proc.IEEE 90. Rosenblatt - Perceptron Learning 1 2 D Adjusted Training Set Adjustable Weights x (n) 1 ) ( 1 n Adder s(n) 2 x (n) ) ( n y ) ( 2 n å M M M (n)=Bias D+1 x (n) D 1 (n) D + ) ( n e - Error signal ) ( n d Desired response
7
α = Learning Rate that scales x.
Perceptron Learning – Error Correction Learning Initialize to zero or some random vector. 2) Input and Adjust the weights as follows: . α = Learning Rate that scales x. Rosenblatt set α = 1. The choice does not affect stability but it affects convergence time only if w(0) ≠ 0. A. Case 1: d(n) – y(n) = 1 or d(n) = 1 and y(n) =0 ; y(n+1) y(n) B. Case 2: d(n) – y(n) = –1 or d(n) = 0 and y(n) = 1 ; C. Case 3: d(n) = y(n) Do nothing. 3) Repeat 1)2) until No Classification Error for a Linearly Separable Case.
10
(2) Patterns of Classification Error - push and pull the decision hyperplanes.
1 ( x D ( 1 ) ) 1 ( w ) 2 ( w ) 2 ( D Same as (1) except
11
(3) Example – OR Learning
x(1) pulls down to , x(2) pulls up to 1 2 2 3 or w (0) X = 0 T 1 : Error at x(1) 1 w (1) X = 0 2 T w (2) X = 0 T 3 Error at x(2) – Overcorrection 3 2 No Error for All Patterns– Terminate Learning
12
Of AND Function
20
2. Adaptive Linear Element – ADALINE = Linear Neuron
If d(n) = Real, Regression. If d(n) = Integer (0/1 or -1/+1), Classification. Perceptron Adaline s(n) y(n) x(n) - Perceptron error å D + D e(n) d(n)
22
- LMS Learning Rule – Normalized, controls stability and speed of conv.
In general, Data Set is not completely specified in advance. LMS is good for training data stream drawn from stationary distribution at least for x(n) In practice 0.1 < <1 cf LMS : Instantaneous Gradient Descent
26
(2) Minimum Disturbance Principle –
Adapt to reduce the Output Error for the Current Training Pattern, with Minimal disturbance to Responses already learned x(n) w(n+1) D w(n) w(n)
27
(3) Learning Rate Scheduling (Annealing)
α, μ = Learning Rate, Step Size μ(n+1) = μ(n) – β or μ(n) = c/n or μ0 / [1 + n/τ] The Perceptron Learning Rule can also be derived from the LMS-Rule :
28
Perceptron Learning LMS Learning Works with binary (bipolar) outputs Works with both binary and analog Always converges for linearly separable cases - Theorem May not converge even for linearly separable cases May be unstable for linearly nonseparable cases – Use pocket Alg. Also works well for linearly nonseparable cases – finds minimum error solution No correction for correct classification Always corrects Objective : > 0 or < 0 Objective: = 1 or − 1 Nonlinear rule Linear rule
29
Graphical Representation of Learning
For classification with Adaline, one could just stop learning when the signs are all correct – way to speed up Adaline learning. Perceptron ω 1 1 1 ω 1 Adaline After Learning -1 -1 ω 2 ω 2
30
To Noise Cancellation, Equalization, Echo Cancellation
(4) ADALINE To Noise Cancellation, Equalization, Echo Cancellation
36
Student Questions -2005 Adaline Learning takes longer than Perceptron in general. But, still it may have an edge over Perceptron in some aspect ? Diff. between α–LMS and μ–LMS. Self-normalizing vs. const. Latter always converges in the mean to the minimum MSE solution. Anyway to learn without desired outputs ? What does it mean when Perceptron learning can be derived from LMS using J = … Not clear on Regression. How can we guarantee a whole system convergence ? Which of Perceptron and Adaline performs better, is more efficient ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.