Before we start ADALINE

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Backpropagation Learning Algorithm
Slides from: Doug Gray, David Poole
Introduction to Neural Networks Computing
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
G5BAIM Artificial Intelligence Methods Graham Kendall Neural Networks.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Overview over different methods – Supervised Learning
Simple Neural Nets For Pattern Classification
A Review: Architecture
The back-propagation training algorithm
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
PERCEPTRON. Chapter 3: The Basic Neuron  The structure of the brain can be viewed as a highly interconnected network of relatively simple processing.
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
An Illustrative Example
Back-Propagation Algorithm
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Data Mining with Neural Networks (HK: Chapter 7.5)
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS532 Neural Networks Dr. Anwar Majid Mirza Lecture No. 3 Week2, January 22 nd, 2008 National University of Computer and Emerging.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Chapter 9 Neural Network.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Artificial Neural Network Supervised Learning دكترمحسن كاهاني
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
The Perceptron. Perceptron Pattern Classification One of the purposes that neural networks are used for is pattern classification. Once the neural network.
Multi-Layer Perceptron
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
ADALINE (ADAptive LInear NEuron) Network and
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Chapter 2 Single Layer Feedforward Networks
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
SUPERVISED LEARNING NETWORK
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
IE 585 History of Neural Networks & Introduction to Simple Learning Rules.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
Hebb and Perceptron.
Data Mining with Neural Networks (HK: Chapter 7.5)
Ch2: Adaline and Madaline
Presentation transcript:

Before we start ADALINE Test the response of your Hebb and Perceptron on this following noisy version Exercise pp98 2.6(d)

ADALINE ADAPTIVE LINEAR NEURON Typically uses bipolar (1, -1) activations for its input signal and its target output The weights are adjustable, has bias whose activation is always 1 w1 1 X1 Xn Y b w2 Output Unit Input Unit : Architecture of an ADALINE

ADALINE In general ADALINE can be trained using the delta rule also known as least mean squares (LMS) or Widrow-Hoff rule The delta rule can also be used for single layer nets with several output units ADALINE – a special one - only one output unit

ADALINE Activation of the unit Is the net input with identity function The learning rule minimizes the mean squares error between the activation and the target value Allows the net to continue learning on all training patterns, even after the correct output value is generated

If net_input ≥ 0 then activation = 1 ADALINE After training, if the net is being used for pattern classification in which the desired output is either a +1 or a -1, a threshold function is applied to the net input to obtain the activation If net_input ≥ 0 then activation = 1 Else activation = -1

The Algorithm Step 0: Initialize all weights and bias: (small random values are usually used0 Set learning rate  (0 <  ≤ 1)  = 0 Step 1: While stopping condition is false, do steps 2-6. Step2:For each bipolar training pair s:t, do steps 3-5 Step 3. Set activations for input units: i = 1, …, n: xi = si Step 4.Compute net input to output unit: NET = y_in = b +  xi wi ;

The Algorithm Step 5. Update weights and bias i = 1, …, n wi(new) = wi(old) +  (t – y_in)xi b(new) = b(old) +  (t – y_in) else wi(new) = wi(old) b(new) = b(old) Step 6. Test stopping condition: If the largest weight change that occurred in Step 2 is smaller than a specified tolerance, then stop; otherwise continue.

Setting the learning rate  Common to take a small value for  = 0.1 initially If  too large, the learning process will not converge If  too small learning will be extremely slow For single neuron, a practical range is 0.1 ≤ n ≤ 1.0

Application After training, an ADALINE unit can be used to classify input patterns. If the target values are bivalent (binary or bipolar), a step function can be applied as activation function for the output unit Step 0: Initialize all weights Step 1: For each bipolar input vector x, do steps 2-4 Step 2. Set activations for input units to x Step 3. Compute net input to output unit: net = y_in = b +  xi wi ; Step 4. Apply the activation function 1 if y_in ≥ 0; -1 if y_in < 0. f(y_in)

Example 1 ADALINE for AND function: binary input, bipolar targets (x1 x2 t) (1 1 1) (1 0 -1) (0 1 -1) (0 0 -1) Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p

Example 1 ADALINE for AND function: binary input, bipolar targets Delta rule in ADALINE is designed to find weights that minimize the total error Weights that minimize this error are w1 = 1, w2 = 1, w0 = -3/2 Separating lines x1 + x2 – 3/2 = 0

Example 2 ADALINE for AND function: bipolar input, bipolar targets (x1 x2 t) (1 1 1) (1 -1 -1) (-1 1 -1) (-1 -1 -1) Delta rule in ADALINE is designed to find weights that minimize the total error Associated target for pattern p 4 E =  (x1(p) w1 + x2(p)w2 + w0 – t(p))2 p=1 Net input to the output unit for pattern p

Example 2 ADALINE for AND function: bipolar input, bipolar targets Weights that minimize this error are w1 = 1/2, w2 = 1/2, w0 = -1/2 Separating lines 1/2x1 +1/2 x2 – 1/2 = 0

Example Example 3: ADALINE for AND NOT function: bipolar input, bipolar targets Example 4: ADALINE for OR function: bipolar input, bipolar targets

Derivations Delta rule for single output unit The delta rule changes the weights of the connections to minimize the difference between input and output unit By reducing the error for each pattern one at a time The delta rule for Ith weight(for each pattern) is wI =  (t – y_in)xI

Derivations The squared error for a particular training pattern is E = (t – y_in)2. E : function of all weights wi, I = 1, …, n The gradient of E is the vector consisting of the partial derivatives of E with respect to each of the weights The gradient gives the direction of most rapid increase in E Opposite direction gives the most rapid decrease in the error The error can be reduced by adjusting the weight wI in the direction of - E wI

Derivations Since y_in =  xi wi , - E - y_in = -2(t – y_in) wI wI = -2(t – y_in)xI The local error will be reduced most rapidly by adjusting the weights according to the delta rule wI =  (t – y_in)xI

Derivations Delta rule for multiple output unit The delta rule for Ith weight(for each pattern) is wIJ =  (t – y_inJ)xI

Derivations The squared error for a particular training pattern is E = (tj – y_inj)2. E : function of all weights wi, I = 1, …, n The error can be reduced by adjusting the weight wI in the direction of j=1 m - E m =  wI  (tj – y_inj)2 wIJ j=1 =  wI (tJ – y_inJ)2 Continued pp 88

Exercise http://www.neural-networks-at-your-fingertips.com/adaline.html Adaline Network Simulator

MADALINE MANY ADAPTIVE LINEAR NEURON 1 b1 X1 b3 Z1 w11 v1 w12 Y w21 v2 Architecture of an MADALINE with two hidden ADALINES and one output ADALINE

MADALINE Derivation of delta rule for several outputs shows no change in the training process with several combination of ADALINEs The outputs of two hidden ADALINES, z1 and z2 are determined by signal from input units X1 and X2 Each output signal is the result of applying a threshold function to the unit’s net input y is the non-linear function of the input vector (x1, x2)

MADALINE Why we need hidden units??? The use of hidden units Z1 and Z2 give the net Computational capabilities not found in single layer nets But…complicate the training process Two algorithms MRI – only weights for hidden ADALINES are adjusted, the weights for output unit are fixed MRII – provides methods for adjusting all weights in the net

ALGORITHM: MRI Set v1 = ½, v2 = ½ and b3 = ½ The weights v1 and v2 and bias b3 that feed into the output unit Y are determined so that the response of unit Y is 1 if the signal it receives from either Z1 or Z2 (or both) is 1 and is -1 if both Z1 and Z2 send a signal of -1. The unit Y performs the logic function OR on the signals it receives from Z1 and Z2 v1 1 Z1 Z2 Y v2 X2 X1 b3 b1 b2 w22 w21 w12 w11 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function

ALGORITHM: MRI Set v1 = ½, v2 = ½ and b3 = ½ x1 x2 t 1 -1 1 -1 1 -1 1 1 -1 -1 -1 Set  = 0.5 Weights into Z1 Z2 Y w11 w21 b1 w12 w22 b2 v1 v2 b3 .05 .2 .3 .1 .2 .15 .5 .5 .5 v1 1 Z1 Z2 Y v2 X2 X1 b3 b1 b2 w22 w21 w12 w11 Set v1 = ½, v2 = ½ and b3 = ½ see example 2.19 the OR function

Step 0: Initialize all weights and bias: wi = 0 (i= 1 to n), b=0 Set learning rate  (0 <  ≤ 1)  = 0 Step 1: While stopping condition is false, do steps 2-8. Step2: For each bipolar training pair s:t, do steps 3-7 Step 3. Set activations for input units: xi = si Step 4.Compute net input to each hidden ADALINE unit: z_in1 = b1+ x1 w11 + x2 w21 ; z_in2 = b2+ x2 w12 + x2 w22 ; Step 5. Determine output of each hidden ADALINE z1 = f(z_in1) z2 = f(z_in2) Step 6. Determine output of net: y_in = b3+ z1 v1 + z2 v2 1 if x ≥ 0 -1 if x < 0 f(x) v1 1 Z1 Z2 Y v2 X2 X1 b3 b1 b2 w22 w21 w12 w11

The Algorithm Step 7. Update weights and bias if an error occurred for this pattern If t = y, no weight updates are performed otherwise; If t = 1, then update weights on ZJ, the unit whose net input is closest to 0, wiJ(new) = wiJ(old) +  (1 – z_in)xi bJ(new) = bJ(old) +  (1 – z_inJ) If t = -1, then update weights on all units ZK, that have positive net input, wik(new) = wik(old) +  (-1 – z_in)xi bk(new) = bk(old) +  (-1 – z_ink) Step 8. Test stopping condition: Of weight changes have stopped(or reached an acceptable level), or if a specified maximum number of weight update iterations (Step 2) have been performed, then stop; otherwise continue