Computational Intelligence

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Computational Intelligence Winter Term 2009/10 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Intelligence Winter Term 2014/15 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Introduction to Neural Networks Computing
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
The back-propagation training algorithm
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural networks.
Artificial Neural Networks
Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
CS621 : Artificial Intelligence
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
EEE502 Pattern Recognition
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Neural networks and support vector machines
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Computational Intelligence
CS621: Artificial Intelligence
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
ECE 471/571 - Lecture 17 Back Propagation.
Disadvantages of Discrete Neurons
Computational Intelligence
Computational Intelligence
Outline Single neuron case: Nonlinear error correcting learning
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
Computational Intelligence
Capabilities of Threshold Neurons
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Computational Intelligence
Computer Vision Lecture 19: Object Recognition III
Computational Intelligence
Computational Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Computational Intelligence
Computational Intelligence
Presentation transcript:

Computational Intelligence Winter Term 2012/13 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund

Single-Layer Perceptron Accelerated Learning Plan for Today Single-Layer Perceptron Accelerated Learning Online- vs. Batch-Learning Multi-Layer-Perceptron Model Backpropagation G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 2

Single-Layer Perceptron (SLP) Acceleration of Perceptron Learning Assumption: x 2 { 0, 1 }n ) ||x|| ≥ 1 for all x ≠ (0, ..., 0)‘ If classification incorrect, then w‘x < 0. Consequently, size of error is just  = -w‘x > 0. ) wt+1 = wt + ( + ) x for  > 0 (small) corrects error in a single step, since w‘t+1x = (wt + ( + ) x)‘ x = w‘t x + ( + ) x‘x = - +  ||x||2 +  ||x||2 =  (||x||2 – 1) +  ||x||2 > 0  ≥ 0 > 0 G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 3

Single-Layer Perceptron (SLP) Generalization: Assumption: x 2 Rn ) ||x|| > 0 for all x ≠ (0, ..., 0)‘ as before: wt+1 = wt + ( + ) x for  > 0 (small) and  = - w‘t x > 0 w‘t+1x =  (||x||2 – 1) +  ||x||2 ) < 0 possible! > 0 Idea: Scaling of data does not alter classification task! Let = min { || x || : x 2 B } > 0 Set x = ^ x ) set of scaled examples B ) || x || ≥ 1 ) || x ||2 – 1 ≥ 0 ) w’t+1 x > 0  ^ G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 4

Single-Layer Perceptron (SLP) There exist numerous variants of Perceptron Learning Methods. Theorem: (Duda & Hart 1973) If rule for correcting weights is wt+1 = wt + t x (if w‘t x < 0) 8 t ≥ 0 : t ≥ 0 then wt → w* for t → 1 with 8 x‘w* > 0. ■ e.g.: t =  > 0 or t =  / (t+1) for  > 0 G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 5

 Single-Layer Perceptron (SLP) as yet: Online Learning → Update of weights after each training pattern (if necessary) now: Batch Learning → Update of weights only after test of all training patterns wt+1 = wt +  x  w‘t x < 0 x 2 B → Update rule: ( > 0) vague assessment in literature: advantage : „usually faster“ disadvantage : „needs more memory“ just a single vector! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 6

 Single-Layer Perceptron (SLP) find weights by means of optimization Let F(w) = { x 2 B : w‘x < 0 } be the set of patterns incorrectly classified by weight w. Objective function:  f(w) = – w‘x → min! x 2 F(w) Optimum: f(w) = 0 iff F(w) is empty Possible approach: gradient method wt+1 = wt –  rf(wt) ( > 0) converges to a local minimum (dep. on w0) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 7

Gradient points in direction of steepest ascent of function f(¢) Single-Layer Perceptron (SLP) Gradient method Gradient points in direction of steepest ascent of function f(¢) wt+1 = wt – rf(wt) Gradient Caution: Indices i of wi here denote components of vector w; they are not the iteration counters! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 8

gradient method  batch learning Single-Layer Perceptron (SLP) Gradient method thus: gradient gradient method  batch learning G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 9

Single-Layer Perceptron (SLP) How difficult is it to find a separating hyperplane, provided it exists? to decide, that there is no separating hyperplane? Let B = P [ { -x : x 2 N } (only positive examples), wi 2 R,  2 R , |B| = m For every example xi 2 B should hold: xi1 w1 + xi2 w2 + ... + xin wn ≥  → trivial solution wi =  = 0 to be excluded! Therefore additionally:  2 R xi1 w1 + xi2 w2 + ... + xin wn –  –  ≥ 0 Idea:  maximize → if * > 0, then solution found G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 10

Single-Layer Perceptron (SLP) Matrix notation: Linear Programming Problem: f(z1, z2, ..., zn, zn+1, zn+2) = zn+2 → max! calculated by e.g. Kamarkar-algorithm in polynomial time s.t. Az ≥ 0 If zn+2 =  > 0, then weights and threshold are given by z. Otherwise separating hyperplane does not exist! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 11

Multi-Layer Perceptron (MLP) What can be achieved by adding a layer? P N Single-layer perceptron (SLP) ) Hyperplane separates space in two subspaces Two-layer perceptron ) arbitrary convex sets can be separated Three-layer perceptron ) arbitrary sets can be separated (depends on number of neurons)- connected by AND gate in 2nd layer several convex sets representable by 2nd layer, these sets can be combined in 3rd layer convex sets of 2nd layer connected by OR gate in 3rd layer ) more than 3 layers not necessary! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 12

Multi-Layer Perceptron (MLP) XOR with 3 neurons in 2 steps 1 x1 x2 y1 y2 z 1 x1 y1 1 -1 ≥ 2 z 1 1 y2 x2 -1 convex set G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 13

Multi-Layer Perceptron (MLP) XOR with 3 neurons in 2 layers ≥ 1 x1 x2 -1 1 y1 z y2 x1 x2 y1 y2 z 1 without AND gate in 2nd layer 1 x1 – x2 ≥ 1 x2 – x1 ≥ 1 x2 ≤ x1 – 1 x2 ≥ x1 + 1 , G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 14

Multi-Layer Perceptron (MLP) XOR can be realized with only 2 neurons! x1 x2 y -2y x1-2y+x2 z 1 -2 ≥ 2 ≥ 1 x1 x2 1 1 y z -2 1 1 BUT: this is not a layered network (no MLP) ! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 15

Multi-Layer Perceptron (MLP) Evidently: MLPs deployable for addressing significantly more difficult problems than SLPs! But: How can we adjust all these weights and thresholds? Is there an efficient learning algorithm for MLPs? History: Unavailability of efficient learning algorithm for MLPs was a brake shoe ... ... until Rumelhart, Hinton and Williams (1986): Backpropagation Actually proposed by Werbos (1974) ... but unknown to ANN researchers (was PhD thesis) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 16

Multi-Layer Perceptron (MLP) Quantification of classification error of MLP Total Sum Squared Error (TSSE) output of net for weights w and input x target output of net for input x Total Mean Squared Error (TMSE) TSSE # training patters # output neurons const. leads to same solution as TSSE ) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 17

Multi-Layer Perceptron (MLP) Learning algorithms for Multi-Layer-Perceptron (here: 2 layers) idea: minimize error! ... 1 2 m x1 x2 xn w11 wnm u11 f(wt, ut) = TSSE → min! Gradient method ut+1 = ut - ru f(wt, ut) wt+1 = wt - rw f(wt, ut) BUT: f(w, u) cannot be differentiated! Why? → Discontinuous activation function a(.) in neuron!  1 idea: find smooth activation function similar to original function ! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 18

Multi-Layer Perceptron (MLP) Learning algorithms for Multi-Layer-Perceptron (here: 2 layers)  1 good idea: sigmoid activation function (instead of signum function) 1 monotone increasing differentiable non-linear output 2 [0,1] instead of 2 { 0, 1 } threshold  integrated in activation function e.g.: ● values of derivatives directly determinable from function values ● G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 19

Multi-Layer Perceptron (MLP) Learning algorithms for Multi-Layer-Perceptron (here: 2 layers) w11 y1 u11 ... Gradient method z1 x1 1 y2 f(wt, ut) = TSSE 1 x2 z2 ut+1 = ut - ru f(wt, ut) 2 wt+1 = wt - rw f(wt, ut) 2 ... zK xi : inputs xI wnm yJ K yj : values after first layer J zk = a(¢) zk: values after second layer yj = h(¢) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 20

Multi-Layer Perceptron (MLP) output of neuron j after 1st layer output of neuron k after 2nd layer error of input x: output of net target output for input x G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 21

Multi-Layer Perceptron (MLP) error for input x and target output z*: total error for all training patterns (x, z*) 2 B: (TSSE) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 22

Multi-Layer Perceptron (MLP) gradient of total error: vector of partial derivatives w.r.t. weights ujk and wij thus: and G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 23

Multi-Layer Perceptron (MLP) assume: ) and: chain rule of differential calculus: outer derivative inner derivative G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 24

Multi-Layer Perceptron (MLP) partial derivative w.r.t. ujk: “error signal“ k G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 25

Multi-Layer Perceptron (MLP) partial derivative w.r.t. wij: factors reordered error signal k from previous layer error signal j from “current“ layer G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 26

Multi-Layer Perceptron (MLP) Generalization (> 2 layers) j 2 Sm → neuron j is in m-th layer Let neural network have L layers S1, S2, ... SL. Let neurons of all layers be numbered from 1 to N. All weights wij are gathered in weights matrix W. Let oj be output of neuron j. error signal: correction: in case of online learning: correction after each test pattern presented G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 27

Multi-Layer Perceptron (MLP) error signal of neuron in inner layer determined by error signals of all neurons of subsequent layer and weights of associated connections. ) First determine error signals of output neurons, use these error signals to calculate the error signals of the preceding layer, and so forth until reaching the first inner layer. ) thus, error is propagated backwards from output layer to first inner ) backpropagation (of error) G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 28

Multi-Layer Perceptron (MLP) ) other optimization algorithms deployable! in addition to backpropagation (gradient descent) also: Backpropagation with Momentum take into account also previous change of weights: QuickProp assumption: error function can be approximated locally by quadratic function, update rule uses last two weights at step t – 1 and t – 2. Resilient Propagation (RPROP) exploits sign of partial derivatives: 2 times negative or positive ) increase step! change of sign ) reset last step and decrease step! typical values: factor for decreasing 0,5 / factor of increasing 1,2 evolutionary algorithms individual = weights matrix later more about this! G. Rudolph: Computational Intelligence ▪ Winter Term 2012/13 29