Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Aula 3 Single Layer Percetron
Advertisements

Multi-Layer Perceptron (MLP)
Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Introduction to Neural Networks Computing
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
Artificial Neural Networks
Back-Propagation Algorithm
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Artificial Neural Network
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
SUPERVISED LEARNING NETWORK
Artificial Neural Network
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Artificial Neural Networks Dr. Lahouari Ghouti Information & Computer Science Department Artificial Neural Networks

Single-Layer Perceptron (SLP) Artificial Neural Networks

Artificial Neural Networks Architecture 10-00 We consider the following architecture: feed-forward neural network with one layer It is sufficient to study single-layer perceptrons with just one neuron: Artificial Neural Networks

Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: b (bias) x1 w1 z y x2 w2 g(z) wm xm g is the sign function: g(z) = +1 IF z >= 0 -1 IF z < 0 Is the function sign(z) Artificial Neural Networks

Perceptron: Applications 10-00 The perceptron is used for classification (?): classify correctly a set of examples into one of the two classes C1, C2: If the output of the perceptron is +1 then the input is assigned to class C1 If the output is -1 then the input is assigned to C2 Artificial Neural Networks

Perceptron: Classification 10-00 The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2 decision region for C1 x2 w1x1 + w2x2 + b > 0 decision boundary C1 x1 decision region for C2 C2 Weighted Bias w1x1 + w2x2 + b <= 0 w1x1 + w2x2 + b = 0 Artificial Neural Networks

Perceptron: Limitations 10-00 The perceptron can only model linearly-separable functions. The perceptron can be used to model the following Boolean functions: AND OR COMPLEMENT But it cannot model the XOR. Why? Artificial Neural Networks

Perceptron: Limitations (Cont’d) The XOR is not a linearly-separable problem It is impossible to separate the classes C1 and C2 with only one line C1 1 -1 x2 x1 C2 C1 Artificial Neural Networks

Perceptron: Learning Algorithm 10-00 Variables and parameters: x(n) = input vector = [+1, x1(n), x2(n), …, xm(n)]T w(n) = weight vector = [b(n), w1(n), w2(n), …, wm(n)]T b(n) = bias y(n) = actual response d(n) = desired response  = learning rate parameter (More elaboration later) Artificial Neural Networks

The Fixed-Increment Learning Algorithm Initialization: set w(0) =0 Activation: activate perceptron by applying input example (vector x(n) and desired response d(n)) Compute actual response of the perceptron: y(n) = sgn[wT(n)x(n)] Adapt the weight vector: if d(n) and y(n) are different then w(n + 1) = w(n) + [d(n)-y(n)]x(n) Where d(n) = +1 if x(n)  C1 -1 if x(n)  C2 Continuation: increment time index n by 1 and go to Activation step Artificial Neural Networks

Artificial Neural Networks A Learning Example 10-00 Consider a training set C1  C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1 Use the perceptron learning algorithm to classify these examples. w(0) = [1, 0, 0]T  = 1 Artificial Neural Networks

A Learning Example (Cont’d) Decision boundary: 2x1 - x2 = 0 x2 1 - - + C2 -1 1 x1 1/2 - -1 C1 + + Artificial Neural Networks

The Learning Algorithm: Convergence Let n = Number of training samples (Set X); X1 = Set of training sample belonging to class C1; X2 = set of training sample belonging to C2 For a given sample n: x(n) = [+1, x1(n),…, xp(n)]T = input vector w(n) = [b(n), w1(n),…, wp(n)]T = weight vector Net activity Level: v(n) = wT(n)x(n) Output: y(n) = +1 if v(n) >= 0 -1 if v(n) < 0 Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

The Learning Algorithm: Convergence (Cont’d) The decision hyperplane separates classes C1 and C2 If the two classes C1 and C2 are linearly separable, then there exists a weight vector w such that wTx ≥ 0 for all x belonging to class C1 wTx < 0 for all x belonging to class C2 Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Error-Correction Learning Update rule: w(n + 1) = w(n) + Δw(n) Learning process If x(n) is correctly classified by w(n), then w(n + 1) = w(n) Otherwise, the weight vector is updated as follows w(n + 1) = w(n) – η(n)x(n) if w(n)Tx(n) ≥ 0; x(n) belongs to C2 w(n) + η(n)x(n) if w(n)Tx(n) < 0; x(n) belongs to C1 Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Perceptron Convergence Algorithm Variables and parameters x(n) = [+1, x1(n),…, xp(n)]; w(n) = [b(n), w1(n),…,wp(n)] y(n) = actual response (output); d(n) = desired response η = learning rate, a positive number less than 1 Step 1: Initialization Set w(0) = 0, then do the following for n = 1, 2, 3, … Step 2: Activation Activate the perceptron by applying input vector x(n) and desired output d(n) Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Perceptron Convergence Algorithm (Cont’d) Step 3: Computation of actual response y(n) = sgn[wT(n)x(n)] Where sgn(.) is the signum function Step 4: Adaptation of weight vector w(n+1) = w(n) + η[d(n) – y(n)]x(n) Where d(n) = Step 5 Increment n by 1, and go back to step 2 +1 if x(n) belongs to C1 -1 if x(n) belongs to C2 Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Learning: Performance Measure A learning rule is designed to optimize a performance measure However, in the development of the perceptron convergence algorithm we did not mention a performance measure Intuitively, what would be an appropriate performance measure for a classification neural network? Define the performance measure: J = -E[e(n)v(n)] Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Learning: Performance Measure Or, as an instantaneous estimate: J’(n) = -e(n)v(n) The error at iteration n: e(n) = = d(n) – y(n) v(n) = linear combiner output at iteration n; E[.] = expectation operator Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Learning: Performance Measure (Cont’d) Can we derive our learning rule by minimizing this performance function [Haykin’s textbook]: Now v(n) = wT(n)x(n), thus Learning rule: Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Presentation of Training Examples Presenting all training examples once to the ANN is called an epoch. In incremental stochastic gradient descent training examples can be presented in: Fixed order (1,2,3…,M) Randomly permutated order (5,2,7,…,3) Completely random (4,1,7,1,5,4,……) Artificial Neural Networks

Artificial Neural Networks Concluding Remarks A single layer perceptron can perform pattern classification only on linearly separable patterns, regardless of the type of nonlinearity (hard limiter, sigmoidal) Papert and Minsky in 1969 elucidated limitations of Rosenblatt’s single layer perceptron (e.g. requirement of linear separability, inability to solve XOR problem) and cast doubt on the viability of neural networks However, multilayer perceptron and the back-propagation algorithm overcomes many of the shortcomings of the single layer perceptron Artificial Neural Networks CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

Adaline: Adaptive Linear Element The output y is a linear combination of the input x: x1 w1 y x2 w2  wm xm Artificial Neural Networks

Adaline: Adaptive Linear Element (Cont’d) 10-00 Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithm The idea: try to minimize the square error, which is a function of the weights We can find the minimum of the error function E by means of the Steepest descent method (Optimization Procedure) Artificial Neural Networks

Steepest Descent Method: Basics 10-00 Start with an arbitrary point find a direction in which E is decreasing most rapidly make a small step in that direction Artificial Neural Networks

Steepest Descent Method: Basics (Cont’d) (w1,w2) (w1+w1,w2 +w2) Artificial Neural Networks

Steepest Descent Method: Basics (Cont’d) gradient? global min local min Artificial Neural Networks

Least-Mean-Square algorithm (Widrow-Hoff Algorithm) 10-00 Approximation of gradient(E) Update rule for the weights becomes: Artificial Neural Networks

Summary of LMS algorithm 10-00 Training sample: Input signal vector x(n) Desired response d(n) User selected parameter  >0 Initialization set ŵ(1) = 0 Computation for n = 1, 2, … compute e(n) = d(n) - ŵT(n)x(n) ŵ(n+1) = ŵ(n) +  x(n)e(n) Artificial Neural Networks

Neuron with Sigmoid-Function x1 w1 Activation Output x2 y w2  Inputs wm xm Weights Artificial Neural Networks

Multi-Layer Neural Networks Output layer Hidden layer Input layer Artificial Neural Networks

Backpropagation Principal yj dj wjk Backward Step: Propagate errors from output to hidden layer dk xk wki xi Forward Step: Propagate activation from input to output layer Artificial Neural Networks

Backpropagation Algorithm Initialize each wi to some small random value Until the termination condition is met, Do For each training example <(x1,…xn),t> Do Input the instance (x1,…,xn) to the network and compute the network outputs yk For each output unit k k=yk(1-yk)(tk-yk) For each hidden unit h h=yh(1-yh) k wh,k k For each network weight wi,j Do wi,j=wi,j+wi,j where wi,j=  j xi,j Artificial Neural Networks

Backpropagation Algorithm (Cont’d) Gradient descent over entire network weight vector Easily generalized to arbitrary directed graphs Will find a local, not necessarily global error minimum -in practice often works well (can be invoked multiple times with different initial weights) Often include weight momentum term wi,j(n)=  j xi,j +  wi,j (n-1) Minimizes error training examples Will it generalize well to unseen instances (over-fitting)? Training can be slow typical 1000-10000 iterations (use Levenberg-Marquardt instead of gradient descent) Using network after training is fast Artificial Neural Networks

Convergence of Backpropagation Gradient descent to some local minimum perhaps not global minimum Add momentum term: wki(n) wki(n) = a dk(n) xi (n) + l Dwki(n-1) with l  [0,1] Stochastic gradient descent Train multiple nets with different initial weights Nature of convergence Initialize weights near zero Therefore, initial networks near-linear Increasingly non-linear functions possible as training progresses Artificial Neural Networks

Artificial Neural Networks Optimization Methods There are other more efficient (faster convergence) optimization methods than gradient descent: Newton’s method uses a quadratic approximation (2nd order Taylor expansion) F(x+Dx) = F(x) + F(x) Dx + Dx 2F(x) Dx + … Conjugate gradients Levenberg-Marquardt algorithm Artificial Neural Networks

Universal Approximation Property of ANN Boolean Functions: Every boolean function can be represented by network with single hidden layer But might require exponential (in number of inputs) hidden units Continuous Functions: Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer [Cybenko 1989, Hornik 1989] Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988] Artificial Neural Networks

Using Weight Derivatives How often to update after each training case? after a full sweep through the training data? How much to update Use a fixed learning rate? Adapt the learning rate? Add momentum? Don’t use steepest descent? Artificial Neural Networks

Artificial Neural Networks What Next? Bias Effect Batch vs. Continuous Learning Variable Learning Rate (Update Rule?) Effect of Neurons/Layer Effect of Hidden Layers Artificial Neural Networks