Prof. Pushpak Bhattacharyya, IIT Bombay

Prof. Pushpak Bhattacharyya, IIT Bombay
CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Feedforward Nets Prof. Pushpak Bhattacharyya, IIT Bombay

Perceptron Cannot compute non-linearly separable data Real life problems are typically non-linear Prof. Pushpak Bhattacharyya, IIT Bombay

Basic Computing Paradigm
Setting up hyperplanes Use higher power surfaces Tolerate error Use multiple perceptrons Prof. Pushpak Bhattacharyya, IIT Bombay

A quadratic surface can separate the data. Difficult to train.
Prof. Pushpak Bhattacharyya, IIT Bombay

Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea: Always preserve the best weight obtained so far in the “pocket” Change weights, if found better (i.e. changed weights result in reduced error). Tolerate error Used in connectionist expert systems Prof. Pushpak Bhattacharyya, IIT Bombay

Multilayer Feedforward Network
Geometrically y x h2 h1 x x2 x1 Prof. Pushpak Bhattacharyya, IIT Bombay

Algebraically LINEARIZATION X1  X2 = X1 X2’ + X1’X2 = OR (AND (X1 , X2’ ) , AND (X1’ , X2 )) Prof. Pushpak Bhattacharyya, IIT Bombay

Example Output layer neurons Input layer neurons Hidden layer neurons 1 & 3 are also called computation neurons x2 x1 y 0.5 1 1.5 Prof. Pushpak Bhattacharyya, IIT Bombay

Hidden Layer Neurons They contribute to the power of network. How many hidden layers ? How many neurons/layer ? Pure feed-forward network – no jumping of connections Prof. Pushpak Bhattacharyya, IIT Bombay

XOR Example = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x1 x2 Prof. Pushpak Bhattacharyya, IIT Bombay

Constraints Constraints on neurons in multi-layer perceptrons : The compute-neurons must be non-linear. Non linearity is the source of power. Prof. Pushpak Bhattacharyya, IIT Bombay

Explanation y y = m1(h1.w1 + h2.w2) + c1 h1 = m2(w3.x1 + w4.x2) + c2 h2 = m3(w5.x1 + w6.x2) + c3 Substituting h1 & h2 y = k1x1 + k2x2 + c’ w2 w1 h2 h1 w5 w3 w6 w4 x1 x2 Prof. Pushpak Bhattacharyya, IIT Bombay

Explanation (Contd) y = mx + c yU yL y > yU is regarded as y = 1 y < yL is regarded as y = 0 yU > yL Prof. Pushpak Bhattacharyya, IIT Bombay

Linear Neuron Can a linear neuron compute XOR. We want y = w1x1 + w2x2 + c : characteristic y w2 w1 x2 x1 Prof. Pushpak Bhattacharyya, IIT Bombay

Linear Neuron (Contd 1) for (1,1), (0,0) y < yL For (0,1), (1,0) y > yU yU > yL Can (w1, w2, c) be found Prof. Pushpak Bhattacharyya, IIT Bombay

Linear Neuron (Contd 2) (0,0) y = w1.0 + w2.0 + c = c y < yL c < yL - (1) (0,1) y = w1.1 + w2.0 + c y > yU w1 + c > yU - (2) Prof. Pushpak Bhattacharyya, IIT Bombay

Linear Neuron (Contd 3) 1,0 w2 + c > yU - (3) 1,1 w1 + w2 + c < yL - (4) yU > yL - (5) Prof. Pushpak Bhattacharyya, IIT Bombay

Linear Neuron (Contd 4) c < yL - (1) w1 + c > yU - (2) w2 + c > yU - (3) w1 + w2 + c < yL - (4) yU > yL - (5) Inconsistent Prof. Pushpak Bhattacharyya, IIT Bombay

Observations A linear neuron cannot compute XOR A multilayer network with linear characteristic neurons is collapsible to a single linear neuron. Therefore addition of layers does not contribute to computing power. Neurons in feedforward network must be non-linear Threshold elements will do iff we can linearize a non-linearly function. Prof. Pushpak Bhattacharyya, IIT Bombay

Linearity Linearity is not in general possible – Need to know the function in closed form. Very large space even for boolean data. Prof. Pushpak Bhattacharyya, IIT Bombay

Training Algorithm Looks at the pre-classified data Arrives at weight values Prof. Pushpak Bhattacharyya, IIT Bombay

Why won’t PTA do? Since we do not know desired outputs at hidden layer neurons, PTA cannot be applied. So apply a training method called GRADIENT DESCENT. Prof. Pushpak Bhattacharyya, IIT Bombay

Minima E Parameters (w1,w2….) Prof. Pushpak Bhattacharyya, IIT Bombay

Gradient Descent Ensured by GRADIENT DESCENT E – error wmn - parameter Prof. Pushpak Bhattacharyya, IIT Bombay

Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used!  Sigmoid neurons with easy-to-compute derivatives required! (Radial basis functions are also differentiable) Computing power comes from non-linearity of sigmoid function. Prof. Pushpak Bhattacharyya, IIT Bombay

Summary Feed-forward network, pure or non-pure networks XOR computed by multi layer perceptron Non-linearity : must Gradient Descent Sigmoid Prof. Pushpak Bhattacharyya, IIT Bombay

Prof. Pushpak Bhattacharyya, IIT Bombay

Similar presentations

Presentation on theme: "Prof. Pushpak Bhattacharyya, IIT Bombay"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. Pushpak Bhattacharyya, IIT Bombay

Similar presentations

Presentation on theme: "Prof. Pushpak Bhattacharyya, IIT Bombay"— Presentation transcript:

Similar presentations

About project

Feedback