Prof. Pushpak Bhattacharyya, IIT Bombay CS 621 Artificial Intelligence Lecture 24 - 11/10/05 Prof. Pushpak Bhattacharyya Feedforward Nets 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Perceptron Cannot compute non-linearly separable data Real life problems are typically non-linear 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Basic Computing Paradigm Setting up hyperplanes Use higher power surfaces Tolerate error Use multiple perceptrons 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
A quadratic surface can separate the data. Difficult to train. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea: Always preserve the best weight obtained so far in the “pocket” Change weights, if found better (i.e. changed weights result in reduced error). Tolerate error Used in connectionist expert systems 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Multilayer Feedforward Network Geometrically y x h2 h1 x x2 x1 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Algebraically LINEARIZATION X1 X2 = X1 X2’ + X1’X2 = OR (AND (X1 , X2’ ) , AND (X1’ , X2 )) 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Example Output layer neurons Input layer neurons Hidden layer neurons 1 & 3 are also called computation neurons x2 x1 y 0.5 1 1.5 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Hidden Layer Neurons They contribute to the power of network. How many hidden layers ? How many neurons/layer ? Pure feed-forward network – no jumping of connections 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay XOR Example = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x1 x2 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Constraints Constraints on neurons in multi-layer perceptrons : The compute-neurons must be non-linear. Non linearity is the source of power. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Explanation y y = m1(h1.w1 + h2.w2) + c1 h1 = m2(w3.x1 + w4.x2) + c2 h2 = m3(w5.x1 + w6.x2) + c3 Substituting h1 & h2 y = k1x1 + k2x2 + c’ w2 w1 h2 h1 w5 w3 w6 w4 x1 x2 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Explanation (Contd) y = mx + c yU yL y > yU is regarded as y = 1 y < yL is regarded as y = 0 yU > yL 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linear Neuron Can a linear neuron compute XOR. We want y = w1x1 + w2x2 + c : characteristic y w2 w1 x2 x1 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linear Neuron (Contd 1) for (1,1), (0,0) y < yL For (0,1), (1,0) y > yU yU > yL Can (w1, w2, c) be found 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linear Neuron (Contd 2) (0,0) y = w1.0 + w2.0 + c = c y < yL c < yL - (1) (0,1) y = w1.1 + w2.0 + c y > yU w1 + c > yU - (2) 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linear Neuron (Contd 3) 1,0 w2 + c > yU - (3) 1,1 w1 + w2 + c < yL - (4) yU > yL - (5) 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linear Neuron (Contd 4) c < yL - (1) w1 + c > yU - (2) w2 + c > yU - (3) w1 + w2 + c < yL - (4) yU > yL - (5) Inconsistent 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Observations A linear neuron cannot compute XOR A multilayer network with linear characteristic neurons is collapsible to a single linear neuron. Therefore addition of layers does not contribute to computing power. Neurons in feedforward network must be non-linear Threshold elements will do iff we can linearize a non-linearly function. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Linearity Linearity is not in general possible – Need to know the function in closed form. Very large space even for boolean data. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Training Algorithm Looks at the pre-classified data Arrives at weight values 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Why won’t PTA do? Since we do not know desired outputs at hidden layer neurons, PTA cannot be applied. So apply a training method called GRADIENT DESCENT. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Minima E Parameters (w1,w2….) 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Gradient Descent Ensured by GRADIENT DESCENT E – error wmn - parameter 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used! Sigmoid neurons with easy-to-compute derivatives required! (Radial basis functions are also differentiable) Computing power comes from non-linearity of sigmoid function. 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay
Prof. Pushpak Bhattacharyya, IIT Bombay Summary Feed-forward network, pure or non-pure networks XOR computed by multi layer perceptron Non-linearity : must Gradient Descent Sigmoid 11-10-05 Prof. Pushpak Bhattacharyya, IIT Bombay