Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.

Similar presentations


Presentation on theme: "Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with."— Presentation transcript:

1 Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with just one neuron:

2 Neural NetworksNN 22 Single layer perceptrons Generalization to single layer perceptrons with more neurons is easy because: The output units are independent among each other Each weight only affects one of the outputs

3 Neural NetworksNN 23 Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: x1x1 x2x2 xnxn w2w2 w1w1 wnwn b (bias) vy  (v)  is the sign function:  (v) = +1IF v >= 0 -1IF v < 0 Is the function sign(v)

4 Neural NetworksNN 24 Perceptron: Applications The perceptron is used for classification: classify correctly a set of examples into one of the two classes C 1, C 2 : If the output of the perceptron is +1 then the input is assigned to class C 1 If the output is -1 then the input is assigned to C 2

5 Neural NetworksNN 25 Perceptron: Classification The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2 x2x2 C1C1 C2C2 x1x1 decision boundary w 1 x 1 + w 2 x 2 + b = 0 decision region for C1 w 1 x 1 + w 2 x 2 + b >= 0

6 Neural NetworksNN 26 Perceptron: Limitations The perceptron can only model linearly separable functions. The perceptron can be used to model the following Boolean functions: AND OR COMPLEMENT But it cannot model the XOR. Why?

7 Neural NetworksNN 27 Perceptron: Learning Algorithm Variables and parameters at iteration n of the learning algorithm: x (n) = input vector = [+1, x 1 (n), x 2 (n), …, x m (n)] T w(n) = weight vector = [b(n), w 1 (n), w 2 (n), …, w m (n)] T b(n) = bias y(n) = actual response d(n) = desired response  = learning rate parameter

8 Neural NetworksNN 28 The fixed-increment learning algorithm Initialization: w(1) = 0 (or random small values) Activation: activate perceptron by applying input example (input = x (n), desired output = d(n) ) Compute actual output of perceptron: y(n) = sgn[w T (n) x (n)] Adapt weight vector: d(n)  y(n) then w(n + 1) = w(n) +  d(n) x (n) where d(n) = +1 if x (n)  C 1 -1 if x (n)  C 2 Continuation: increment time step n by 1 and go to Activation step

9 Neural NetworksNN 29 Example Consider 2D training set C 1  C 2, where: C 1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C 2 = {(-1,-1), (-1,1), (0,1)} elements of class -1 Use the perceptron learning algorithm to classify these examples. w(1) = [1, 0, 0] T  = 1

10 Neural NetworksNN 210 Consider the augmented training set C’ 1  C’ 2, with first entry fixed to 1 (to deal with the bias as extra weight): (1, 1, 1), (1, 1, -1), (1, 0, -1) (1,-1, -1), (1,-1, 1), (1,0, 1) Replace  with -  for all   C 2 ’ and use the following simpler update rule: w(n) +  x(n)if w(n) x(n)  0 w(n+1) = w(n) otherwise Trick

11 Neural NetworksNN 211 Training set after application of trick: (1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, - 1) Application of perceptron learning algorithm: Example End epoch 1

12 Neural NetworksNN 212 Example End epoch 2 At epoch 3 no updates are performed. (check!)  stop execution of algorithm. Final weight vector: (0, 2, -1).  decision hyperplane is 2x 1 - x 2 = 0.

13 Neural NetworksNN 213 Example ++ - - x1x1 x2x2 C2C2 C1C1 - + 1 1 1/2 Decision boundary: 2x 1 - x 2 = 0

14 Neural NetworksNN 214 Convergence of the learning algorithm Suppose datasets C 1, C 2 are linearly separable. The perceptron convergence algorithm converges after n 0 iterations, with n 0  n max on training set C 1  C 2. Proof: suppose x  C 1  output = 1 and x  C 2  output = -1. For simplicity assume w(1) = 0,  = 1. Suppose perceptron incorrectly classifies x(1) … x(k) …  C 1. Then w T (k) x(k)  0.  Error correction rule: w(2) = w(1) + x(1) w(3) = w(2) + x(2)  w(k+1) = x(1) + … + x(k) w(k+1) = w(k) + x(k)

15 Neural NetworksNN 215 Convergence theorem (proof) Let w 0 be such that w 0 T x(n) > 0  x(n)  C 1. w 0 exists because C 1 and C 2 are linearly separable. Let  = min w 0 T x(n)x(n)  C 1. Then w 0 T w(k+1) = w 0 T x(1) + … + w 0 T x(k)  k  Cauchy-Schwarz inequality: ||w 0 || 2 ||w(k+1)|| 2  [w 0 T w(k+1)] 2 ||w(k+1)|| 2  (A) k 2  2 ||w 0 || 2

16 Neural NetworksNN 216 Now we consider another route: w(k+1) = w(k) + x(k) || w(k+1)|| 2 = || w(k)|| 2 + ||x(k)|| 2 + 2 w T (k)x(k) euclidean norm   0 because x(k) is misclassified  ||w(k+1)|| 2  ||w(k)|| 2 + ||x(k)|| 2 =0 ||w(2)|| 2  ||w(1)|| 2 + ||x(1)|| 2 ||w(3)|| 2  ||w(2)|| 2 + ||x(2)|| 2  ||w(k+1)|| 2  Convergence theorem (proof)

17 Neural NetworksNN 217 Let  = max ||x(n)|| 2 x(n)  C 1 ||w(k+1)|| 2  k  (B) For sufficiently large values of k: (B) becomes in conflict with (A). Then k cannot be greater than k max such that (A) and (B) are both satisfied with the equality sign. Perceptron convergence algorithm terminates in at most n max =iterations. convergence theorem (proof)  ||w 0 || 2  2

18 Neural NetworksNN 218 Adaline: Adaptive Linear Element Adaline: uses a linear neuron model and the Least-Mean- Square (LMS) learning algorithm The idea: try to minimize the square error, which is a function of the weights We can find the minimum of the error function E by means of the Steepest descent method

19 Neural NetworksNN 219 Steepest Descent Method start with an arbitrary point find a direction in which E is decreasing most rapidly make a small step in that direction

20 Neural NetworksNN 220 Least-Mean-Square algorithm (Widrow-Hoff algorithm) Approximation of gradient(E) Update rule for the weights becomes:

21 Neural NetworksNN 221 Summary of LMS algorithm Training sample: input signal vector  (n) desired response d(n) User selected parameter  >0 Initializationset ŵ(1) = 0 Computationfor n = 1, 2, … compute e(n) = d(n) - ŵ T (n)  (n) ŵ(n+1) = ŵ(n) +   (n)e(n)

22 Neural NetworksNN 222 Comparison LMS and Perceptron Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning. LMS:Linear. Model of a neuron Perceptron: Non linear. Hard-Limiter activation function. McCulloch-Pitts model. LMS:Continuous. Learning Process Perceptron: A finite number of iterations.

23 Neural NetworksNN 223 Example +1 +1 +1 +1 +1 +1

24 Neural NetworksNN 224 Example How to train a perceptron to recognize this 3? Assign –1 to weights of input values that are equal to +1, and +1 to weights of input values that are equal to -1


Download ppt "Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with."

Similar presentations


Ads by Google