Disadvantages of Discrete Neurons

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Beyond Linear Separability
Introduction to Neural Networks Computing
Longin Jan Latecki Temple University
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Radial Basis Functions
Artificial Neural Networks
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Artificial Neural Networks
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Non-Bayes classifiers. Linear discriminants, neural networks.
ADALINE (ADAptive LInear NEuron) Network and
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter 2 Single Layer Feedforward Networks
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
SUPERVISED LEARNING NETWORK
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Today’s Lecture Neural networks Training
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Supervised Learning in ANNs
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Ranga Rodrigo February 8, 2014
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Artificial Neural Networks
Computational Intelligence
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Computational Intelligence
Unsupervised learning
Computational Intelligence
Classification Neural Networks 1
Biological and Artificial Neuron
Outline Single neuron case: Nonlinear error correcting learning
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Perceptrons Introduced in1957 by Rosenblatt
Deep Neural Networks (DNN)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
Computer Vision Lecture 19: Object Recognition III
CS623: Introduction to Computing with Neural Nets (lecture-3)
Computational Intelligence
Seminar on Machine Learning Rada Mihalcea
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Disadvantages of Discrete Neurons Only boolean valued functions can be computed A simple learning algorithm for multi-layer discrete-neuron perceptrons is lacking The computational capabilities of single-layer discrete-neuron perceptrons is limited These disadvantages disappear when we consider multi-layer continuous-neuron perceptrons 23-Nov-18 Rudolf Mak TU/e Computer Science

Preliminaries A continuous-neuron perceptron with n input and m outputs computes: a function Rn ! [0,1]m ,when the sigmoid activation function is used a function Rn ! Rm ,when a linear activation function is used The learning rules for continuous-neuron perceptrons are based on optimization techniques for error-functions. This requires a continuous and differentiable error function. [0,1] denotes an interval Single-layer cn-perceptrons are also limited. Two-layers can approximate any continuous function 23-Nov-18 Rudolf Mak TU/e Computer Science

Sigmoid transfer function Similar property for tanh. For that function derivative can also be expressed in the original function. d tanh(x)/dx = tanh2(x) -1 Tanh(z/2) = 2 sig(z) -1 Small practical advantage using tanh 23-Nov-18 Rudolf Mak TU/e Computer Science

Computational Capabilities Let g:[0,1]n!R be a continuous function and let . Then there exists a two layer perceptron with: First layer build from neurons with threshold and standard sigmoid activation function Second layer build from one neuron without threshold and linear activation function such that the function G computed by this network satis- fies g(x) = Σxn/n! g(n)(o) G(x) = Σwngn(x) Truncated Taylor series gn(x) = xn Other basis function are possible Sin cosine (Fourier) Orthogonal polynomials How-many neurons needed? We start with single-layer (single neuron) networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Single-layer networks Compute function from Rn to [0, 1]m Sufficient to consider a single neuron Compute a function f(w0 + 1 · j · n wjxj ) Assume x0 = 1 then compute a function f(0 · j · n wjxj ) Limited capabilities for single layer networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Error function Again weights are extended with bias w_0 and inputs with component xo = 1 We do not use the prime notation any longer Factor ½ is for computational convenience 23-Nov-18 Rudolf Mak TU/e Computer Science

Gradient Descent 23-Nov-18 Rudolf Mak TU/e Computer Science Least mean square error function LMS 23-Nov-18 Rudolf Mak TU/e Computer Science

Update of Weight i by Training Pair q Hence Δw is in the direction of x Simple cases arise when f is the sigmoid or tanh Even simpler when f is the identity function f(z) = z. Then f’(z) = 1. 23-Nov-18 Rudolf Mak TU/e Computer Science

Delta Rule Learning (incremental version, arbitrary transfer function) In the lecture notes vector manipulation is replaced by a repetition For i:= 0 to n do wi := wi + alpha (t-y) dy xi 23-Nov-18 Rudolf Mak TU/e Computer Science

Stopcriteria The mean square error becomes small enough The mean square error does not decrease any- more, i.e. the gradient has become very small or even changes sign The maximum number of iterations has been exceeded 23-Nov-18 Rudolf Mak TU/e Computer Science

Remarks Delta rule learning is also called L(east) M(ean) S(quare) learning or Widrow Hoff learning Note that the incremental version of the delta rule is strictly not a gradient descent algorithm, because in each step a different error function E(q) is used Convergence of the incremental version can only be guaranteed if the learning parameter a goes to 0 during learning 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version, arbitrary transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Delta Rule (batch version, sigmoidal transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version, linear transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Convergence of the batch version For small enough learning parameter the batch version of the delta rule always converges. The resulting weights, however, may correspond to a local minimum of the error function, instead of the global minimum Batch always converges, for linear neuron we will Analyze this further 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Neurons and Least Squares 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Neurons and Least Squares 23-Nov-18 Rudolf Mak TU/e Computer Science

C is non-singular 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence 23-Nov-18 Rudolf Mak TU/e Computer Science

Rudolf Mak TU/e Computer Science Gradient is a linear operator Recall alpha’ = P alpha Inspect batch version X = <x(1), …, x(P)> 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence 23-Nov-18 Rudolf Mak TU/e Computer Science

Find the line: 23-Nov-18 Rudolf Mak TU/e Computer Science

Solution: 23-Nov-18 Rudolf Mak TU/e Computer Science