Disadvantages of Discrete Neurons

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Beyond Linear Separability

Introduction to Neural Networks Computing

Longin Jan Latecki Temple University

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Overview over different methods – Supervised Learning

Simple Neural Nets For Pattern Classification

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.

Radial Basis Functions

Artificial Neural Networks

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

Artificial Neural Networks

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Artificial Neural Networks

Biointelligence Laboratory, Seoul National University

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.

Machine Learning Chapter 4. Artificial Neural Networks

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Non-Bayes classifiers. Linear discriminants, neural networks.

ADALINE (ADAptive LInear NEuron) Network and

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

Chapter 2 Single Layer Feedforward Networks

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

SUPERVISED LEARNING NETWORK

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Today’s Lecture Neural networks Training

Neural networks and support vector machines

Fall 2004 Backpropagation CS478 - Machine Learning.

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Supervised Learning in ANNs

Chapter 2 Single Layer Feedforward Networks

One-layer neural networks Approximation problems

第 3 章神经网络.

Real Neurons Cell structures Cell body Dendrites Axon

Ranga Rodrigo February 8, 2014

CSE 473 Introduction to Artificial Intelligence Neural Networks

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Neural Networks CS 446 Machine Learning.

Artificial Neural Networks

Computational Intelligence

CS621: Artificial Intelligence

Machine Learning Today: Reading: Maria Florina Balcan

CSC 578 Neural Networks and Deep Learning

Computational Intelligence

Unsupervised learning

Computational Intelligence

Classification Neural Networks 1

Biological and Artificial Neuron

Outline Single neuron case: Nonlinear error correcting learning

Artificial Intelligence Chapter 3 Neural Networks

Perceptron as one Type of Linear Discriminants

Neural Networks Chapter 5

Perceptrons Introduced in1957 by Rosenblatt

Deep Neural Networks (DNN)

CS 621 Artificial Intelligence Lecture 25 – 14/10/05

Capabilities of Threshold Neurons

Lecture Notes for Chapter 4 Artificial Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Chapter - 3 Single Layer Percetron

Artificial Intelligence Chapter 3 Neural Networks

CS623: Introduction to Computing with Neural Nets (lecture-5)

Computer Vision Lecture 19: Object Recognition III

CS623: Introduction to Computing with Neural Nets (lecture-3)

Computational Intelligence

Seminar on Machine Learning Rada Mihalcea

CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.

Artificial Intelligence Chapter 3 Neural Networks

Presentation transcript:

Disadvantages of Discrete Neurons Only boolean valued functions can be computed A simple learning algorithm for multi-layer discrete-neuron perceptrons is lacking The computational capabilities of single-layer discrete-neuron perceptrons is limited These disadvantages disappear when we consider multi-layer continuous-neuron perceptrons 23-Nov-18 Rudolf Mak TU/e Computer Science

Preliminaries A continuous-neuron perceptron with n input and m outputs computes: a function Rn ! [0,1]m ,when the sigmoid activation function is used a function Rn ! Rm ,when a linear activation function is used The learning rules for continuous-neuron perceptrons are based on optimization techniques for error-functions. This requires a continuous and differentiable error function. [0,1] denotes an interval Single-layer cn-perceptrons are also limited. Two-layers can approximate any continuous function 23-Nov-18 Rudolf Mak TU/e Computer Science

Sigmoid transfer function Similar property for tanh. For that function derivative can also be expressed in the original function. d tanh(x)/dx = tanh2(x) -1 Tanh(z/2) = 2 sig(z) -1 Small practical advantage using tanh 23-Nov-18 Rudolf Mak TU/e Computer Science

Computational Capabilities Let g:[0,1]n!R be a continuous function and let . Then there exists a two layer perceptron with: First layer build from neurons with threshold and standard sigmoid activation function Second layer build from one neuron without threshold and linear activation function such that the function G computed by this network satis- fies g(x) = Σxn/n! g(n)(o) G(x) = Σwngn(x) Truncated Taylor series gn(x) = xn Other basis function are possible Sin cosine (Fourier) Orthogonal polynomials How-many neurons needed? We start with single-layer (single neuron) networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Single-layer networks Compute function from Rn to [0, 1]m Sufficient to consider a single neuron Compute a function f(w0 + 1 · j · n wjxj ) Assume x0 = 1 then compute a function f(0 · j · n wjxj ) Limited capabilities for single layer networks 23-Nov-18 Rudolf Mak TU/e Computer Science

Error function Again weights are extended with bias w_0 and inputs with component xo = 1 We do not use the prime notation any longer Factor ½ is for computational convenience 23-Nov-18 Rudolf Mak TU/e Computer Science

Gradient Descent 23-Nov-18 Rudolf Mak TU/e Computer Science Least mean square error function LMS 23-Nov-18 Rudolf Mak TU/e Computer Science

Update of Weight i by Training Pair q Hence Δw is in the direction of x Simple cases arise when f is the sigmoid or tanh Even simpler when f is the identity function f(z) = z. Then f’(z) = 1. 23-Nov-18 Rudolf Mak TU/e Computer Science

Delta Rule Learning (incremental version, arbitrary transfer function) In the lecture notes vector manipulation is replaced by a repetition For i:= 0 to n do wi := wi + alpha (t-y) dy xi 23-Nov-18 Rudolf Mak TU/e Computer Science

Stopcriteria The mean square error becomes small enough The mean square error does not decrease any- more, i.e. the gradient has become very small or even changes sign The maximum number of iterations has been exceeded 23-Nov-18 Rudolf Mak TU/e Computer Science

Remarks Delta rule learning is also called L(east) M(ean) S(quare) learning or Widrow Hoff learning Note that the incremental version of the delta rule is strictly not a gradient descent algorithm, because in each step a different error function E(q) is used Convergence of the incremental version can only be guaranteed if the learning parameter a goes to 0 during learning 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version, arbitrary transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Delta Rule (batch version, sigmoidal transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Perceptron Learning Rule (batch version, linear transfer function) 23-Nov-18 Rudolf Mak TU/e Computer Science

Convergence of the batch version For small enough learning parameter the batch version of the delta rule always converges. The resulting weights, however, may correspond to a local minimum of the error function, instead of the global minimum Batch always converges, for linear neuron we will Analyze this further 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Neurons and Least Squares 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Neurons and Least Squares 23-Nov-18 Rudolf Mak TU/e Computer Science

C is non-singular 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence 23-Nov-18 Rudolf Mak TU/e Computer Science

Rudolf Mak TU/e Computer Science Gradient is a linear operator Recall alpha’ = P alpha Inspect batch version X = <x(1), …, x(P)> 23-Nov-18 Rudolf Mak TU/e Computer Science

Linear Least Squares Convergence 23-Nov-18 Rudolf Mak TU/e Computer Science

Find the line: 23-Nov-18 Rudolf Mak TU/e Computer Science

Solution: 23-Nov-18 Rudolf Mak TU/e Computer Science