Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

3.6 Support Vector Machines
Feed-forward Networks
© Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems Introduction.
Multi-Layer Perceptron (MLP)
Computational Intelligence Winter Term 2009/10 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2009/10 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2013/14 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
1 Computation in neural networks M. Meeter. 2 Perceptron learning problem Input Patterns Desired output [+1, +1, -1, -1] [+1, -1, +1] [-1, -1, +1, +1]
Computational Intelligence Winter Term 2010/11 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Artificial Neural Networks
Beyond Linear Separability
Backpropagation Learning Algorithm
There are two basic categories: There are two basic categories: 1. Feed-forward Neural Networks These are the nets in which the signals.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2014/15 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence
Introduction to Neural Networks Computing
NEURAL NETWORKS Perceptron
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks Lecture 17: Self-Organizing Maps
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
CS621 : Artificial Intelligence
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
EEE502 Pattern Recognition
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Learning with Perceptrons and Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Computational Intelligence
CS621: Artificial Intelligence
Computational Intelligence
ECE 471/571 - Lecture 17 Back Propagation.
Computational Intelligence
Computational Intelligence
Computational Intelligence
Capabilities of Threshold Neurons
Chapter - 3 Single Layer Percetron
Computational Intelligence
Computational Intelligence
Computational Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Computational Intelligence
Computational Intelligence
Presentation transcript:

Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 2 Plan for Today Single-Layer Perceptron Accelerated Learning Online- vs. Batch-Learning Multi-Layer-Perceptron Model Backpropagation

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 3 Acceleration of Perceptron Learning Assumption: x 2 { 0, 1 } n ) ||x|| 1 for all x (0,..., 0) If classification incorrect, then wx < 0. Consequently, size of error is just = -wx > 0. ) w t+1 = w t + ( + ) x for > 0 (small) corrects error in a single step, since 0> 0 w t+1 x= (w t + ( + ) x) x = w t x + ( + ) xx = - + ||x|| 2 + ||x|| 2 = (||x|| 2 – 1) + ||x|| 2 > 0 Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 4 Generalization: Assumption: x 2 R n ) ||x|| > 0 for all x (0,..., 0) as before: w t+1 = w t + ( + ) x for > 0 (small) and = - w t x > 0 < 0 possible!> 0 w t+1 x = (||x|| 2 – 1) + ||x|| 2 ) Idea: Scaling of data does not alter classification task! Let= min { || x || : x 2 B } > 0 Set x = ^ x ) set of scaled examples B ^ ) || x || 1 ) || x || 2 – 1 0 ) w t+1 x > 0 ^ ^^ Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 5 There exist numerous variants of Perceptron Learning Methods. Theorem: (Duda & Hart 1973) If rule for correcting weights is w t+1 = w t + t x (if w t x < 0) 1. 8 t 0 : t then w t w* for t 1 with 8 xw* > 0. e.g.: t = > 0 or t = / (t+1) for > 0 Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 6 as yet:Online Learning Update of weights after each training pattern (if necessary) now: Batch Learning Update of weights only after test of all training patterns w t+1 = w t + x w t x < 0 x 2 B Update rule: ( > 0) vague assessment in literature: advantage: usually faster disadvantage: needs more memory just a single vector! Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 7 find weights by means of optimization Let F(w) = { x 2 B : wx < 0 } be the set of patterns incorrectly classified by weight w. Objective function: f(w) = – wx min! x 2 F(w) Optimum: f(w) = 0 iff F(w) is empty Possible approach: gradient method w t+1 = w t – r f(w t )( > 0) converges to a local minimum (dep. on w 0 ) Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 8 Gradient method w t+1 = w t – r f(w t ) Gradient Gradient points in direction of steepest ascent of function f( ¢ ) Caution: Indices i of w i here denote components of vector w; they are not the iteration counters! Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 9 Gradient method gradient thus: gradient method batch learning Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 10 How difficult is it (a)to find a separating hyperplane, provided it exists? (b)to decide, that there is no separating hyperplane? Let B = P [ { -x : x 2 N } (only positive examples), w i 2 R, 2 R, |B| = m For every example x i 2 B should hold: x i1 w 1 + x i2 w x in w n trivial solution w i = = 0 to be excluded! Therefore additionally: 2 R x i1 w 1 + x i2 w x in w n – – 0 Idea: maximize if * > 0, then solution found Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 11 Matrix notation: Linear Programming Problem: f(z 1, z 2,..., z n, z n+1, z n+2 ) = z n+2 max! s.t. Az 0 calculated by e.g. Kamarkar- algorithm in polynomial time If z n+2 = > 0, then weights and threshold are given by z. Otherwise separating hyperplane does not exist! Single-Layer Perceptron (SLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 12 What can be achieved by adding a layer? Single-layer perceptron (SLP) ) Hyperplane separates space in two subspaces Two-layer perceptron ) arbitrary convex sets can be separated Three-layer perceptron ) arbitrary sets can be separated (depends on number of neurons)- P N connected by AND gate in 2nd layer several convex sets representable by 2nd layer, these sets can be combined in 3rd layer ) more than 3 layers not necessary! Multi-Layer Perceptron (MLP) convex sets of 2nd layer connected by OR gate in 3rd layer

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 13 XOR with 3 neurons in 2 steps x1x1 x2x2 y1y1 y2y2 z x1x1 x2x2 1 y1y1 z 1 y2y2 1 1 convex set Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 14 XOR with 3 neurons in 2 layers x1x1 x2x2 y1y1 y2y2 z x1x1 x2x2 1 1 y1y1 z 1 1 y2y2 1 without AND gate in 2nd layer Multi-Layer Perceptron (MLP) x 1 – x 2 1 x 2 – x 1 1 x 2 x 1 – 1 x 2 x 1 + 1, 1 1

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 15 XOR can be realized with only 2 neurons! 2 1 x1x1 x2x yz x1x1 x2x2 y-2yx 1 -2y+x 2 z BUT: this is not a layered network (no MLP) ! Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 16 Multi-Layer Perceptron (MLP) Evidently: MLPs deployable for addressing significantly more difficult problems than SLPs! But: How can we adjust all these weights and thresholds? Is there an efficient learning algorithm for MLPs? History: Unavailability of efficient learning algorithm for MLPs was a brake shoe until Rumelhart, Hinton and Williams (1986): Backpropagation Actually proposed by Werbos (1974)... but unknown to ANN researchers (was PhD thesis)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 17 Quantification of classification error of MLP Total Sum Squared Error (TSSE) output of net for weights w and input x target output of net for input x Total Mean Squared Error (TMSE) TSSE # training patters# output neurons const. leads to same solution as TSSE ) Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 18 Learning algorithms for Multi-Layer-Perceptron (here: 2 layers) m 1 2 x1x1 x2x2 xnxn w 11 w nm u 11 f(w t, u t ) = TSSE min! Gradient method u t+1 = u t - r u f(w t, u t ) w t+1 = w t - r w f(w t, u t ) Multi-Layer Perceptron (MLP) idea: minimize error! BUT: f(w, u) cannot be differentiated! Why? Discontinuous activation function a(.) in neuron! 0 1 idea: find smooth activation function similar to original function !

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 19 good idea: sigmoid activation function (instead of signum function) monotone increasing differentiable non-linear output 2 [0,1] instead of 2 { 0, 1 } threshold integrated in activation function e.g.: values of derivatives directly determinable from function values Multi-Layer Perceptron (MLP) Learning algorithms for Multi-Layer-Perceptron (here: 2 layers)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 20 Gradient method J 1 2 x1x1 x2x2 xIxI w 11 w nm u 11 f(w t, u t ) = TSSE u t+1 = u t - r u f(w t, u t ) w t+1 = w t - r w f(w t, u t ) K z1z1 z2z2 zKzK y1y1 y2y2 yJyJ y j : values after first layer z k : values after second layer x i : inputs y j = h( ¢ ) z k = a( ¢ ) Multi-Layer Perceptron (MLP) Learning algorithms for Multi-Layer-Perceptron (here: 2 layers)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 21 output of neuron j after 1st layer output of neuron k after 2nd layer error of input x: target output for input x output of net Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 22 error for input x and target output z*: total error for all training patterns (x, z*) 2 B: (TSSE) Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 23 gradient of total error: thus: and vector of partial derivatives w.r.t. weights u jk and w ij Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 24 assume: ) and: chain rule of differential calculus: outer derivative inner derivative Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 25 partial derivative w.r.t. u jk : error signal k Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 26 partial derivative w.r.t. w ij : error signal k from previous layer factors reordered error signal j from current layer Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 27 Generalization (> 2 layers) Let neural network have L layers S 1, S 2,... S L. Let neurons of all layers be numbered from 1 to N. All weights w ij are gathered in weights matrix W. Let o j be output of neuron j. j 2 S m neuron j is in m-th layer error signal: correction: in case of online learning: correction after each test pattern presented Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 28 error signal of neuron in inner layer determined by error signals of all neurons of subsequent layer and weights of associated connections. ) First determine error signals of output neurons, use these error signals to calculate the error signals of the preceding layer, and so forth until reaching the first inner layer. ) thus, error is propagated backwards from output layer to first inner ) backpropagation (of error) Multi-Layer Perceptron (MLP)

Lecture 02 G. Rudolph: Computational Intelligence Winter Term 2011/12 29 ) other optimization algorithms deployable! in addition to backpropagation (gradient descent) also: Backpropagation with Momentum take into account also previous change of weights: QuickProp assumption: error function can be approximated locally by quadratic function, update rule uses last two weights at step t – 1 and t – 2. Resilient Propagation (RPROP) exploits sign of partial derivatives: 2 times negative or positive ) increase step! change of sign ) reset last step and decrease step! typical values: factor for decreasing 0,5 / factor of increasing 1,2 evolutionary algorithms individual = weights matrix Multi-Layer Perceptron (MLP) later more about this!