Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004, 27/08/2004, 08/09/2004,10/09/2004

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Outline Proof of Theorem : PLC and ~LS equivalence Feedforward neural nets Sigmoid networks Training through Gradient Descent Backpropagation algorithm

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS We have to prove PLC  ~LS and ~LS  PLC One proof for PLC  ~LS: Suppose LS. This is a Contradiction. QED.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS Given a set of augmented and preprocessed vectors X 1, X 2, ….,X m, each of dimension n, is there a vector w such that w.X i >0, i=1,2,…,m ? Or w. D > [0 0 ….0] m times Where D = [X 1 X 2 … X m ] is a m x n matrix and each X i is a n x 1 matrix

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS Alternatively, we could ask for a vector w 1 such that w 1. D > [1 1 ….1] m times Consider a Linear Programming formulation: Minimize w 1. subject to the constraint w 1. D > [1 1 ….1] Trivially w 1. is F1

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS Any such LP formulation has a Dual LP (consequence of Forkas Lemma) Dual LP says: Maximize [1 1 …. 1]. subject to the constraint D. = 0 where = [P 1 P 2 …. P m, ] T, P i  0 Observation : F1 has a feasible solution iff the objective function of F2 is bounded. F2

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS Suppose PLC exists. Then  If is a solution for D. =, then,  is also a solution for any  (as large as required) Thus, the objective function F2 becomes unbounded. => F1 does not have a feasible solution => w 1 does not exist => ~LS

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Proof of PLC  ~LS Suppose ~LS. => w 1 does not exist => F1 does not have a feasible solution => F2 becomes unbounded => PLC exists

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Limitations of perceptron Non-linear separability is all pervading Single perceptron does not have enough computing power Eg: XOR cannot be computed by perceptron.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Solutions Tolerate error (Ex: pocket algorithm used by connectionist expert systems). –Try to get the best possible hyperplane using only perceptrons Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola Use layered network

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea:  Always preserve the best weight obtained so far in the “pocket”  Change weights, if found better (i.e. changed weights result in reduced error).

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Connectionist Expert Systems Rules and facts form Knowledge Base (KB) Rules encoded in a neural network Main problem: “Knowledge engineering” – lots of training/experience required. Challenges in KE: –Obtaining knowledge from experts –Encoding this into the system Hybrid AI area –combines Symbolic (Program- based) AI and Neural Network AI

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes XOR using 2 layers Non-LS function expressed as a linearly separable function of individual linearly separable functions.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Example - XOR x1x1 x2x2 x1x2x1x2 000 011 100 110 w 2 =1.5w 1 =-1 θ = 1 x1x1 x2x2  Calculation of XOR Calculation of x1x2x1x2 w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Example - XOR w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2 x1x1 x2x2 1.5 11

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Some Terminology A multilayer feedforward neural network has –Input layer –Output layer –Hidden layer (asserts computation) Output units and hidden units are called computation units.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Training of the MLP Multilayer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Gradient Descent Technique Let E be the error at the output layer t i = target output; o i = observed output i is the index going over n neurons in the outermost layer j is the index going over the p patterns (1 to p) Ex: XOR :– p=4 and n=1

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Weights in a NN w mn is the weight of the connection from the n th neuron to the m th neuron E vs surface is a complex surface in the space defined by the weights w ij gives the direction in which a movement of the operating point in the w mn co-ordinate space will result in maximum decrease in error m n w mn

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used!  Sigmoid neurons with easy-to-compute derivatives required! Computing power comes from non-linearity of sigmoid function.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Derivative of Sigmoid function

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Training algorithm Initialize weights to random values. For input x =, modify weights as follows Target output = t, Observed output = o Iterate until E <  (threshold)

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Calculation of ∆w i

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Observations Does the training technique support our intuition? The larger the x i, larger is ∆w i –Error burden is borne by the weight values corresponding to large input values If t =0, ∆w i = 0 ∆w i α departure from target Saturation behaviour If o 0 and if o > t, ∆w i < 0  establishes Hebb’s law

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Hebb’s law If n j and n i are both in excitatory state (+1) –Then the change in weight might be such that it enhances the excitation –The change is proportional to both the levels of excitation ∆w ji α e(n j ) e(n i ) If n i and n j are in a mutual state of inhibition ( one is +1 and the other is -1), –Then the change in weight is such that the inhibition is enhanced (change in weight is negative) njnj nini w ji

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Saturation behavior The algorithm is iterative and incremental If the weight values or number of input values is very large, the output will be large, then the output will be in saturation region. The weight values hardly change in the saturation region

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Backpropagation algorithm Fully connected feed forward network Pure FF network (no jumping of connections over layers) Hidden layers Input layer (n i/p neurons) Output layer (m o/p neurons) j i w ji ….

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Gradient Descent Equations

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Example - Character Recognition Output layer – 26 neurons (all capital) First output neuron has the responsibility of detecting all forms of ‘A’ Centralized representation of outputs In distributed representations, all output neurons participate in output

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Backpropagation – for outermost layer

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Backpropagation for hidden layers Hidden layers Input layer (n i/p neurons) Output layer (m o/p neurons) j i …. k  k is propagated backwards to find value of  j

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Backpropagation – for hidden layers

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes General Backpropagation Rule General weight updating rule: Where for outermost layer for hidden layers

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Issues in the training algorithm Algorithm is greedy. It always changes weight such that E reduces. The algorithm may get stuck up in a local minimum. If we observe that E is not getting reduced anymore, the following may be the reasons: 1.Stuck in local minimum. 2.Network paralysis. (High –ve or +ve i/p makes neurons to saturate.) 3. (learning rate is too small)

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Diagnostics in action(1) 1) If stuck in local minimum, try the following: Re-initializing the weight vector. Increase the learning rate. Introduce more neurons in the hidden layer. 2) If it is network paralysis, then increase the number of neurons in the hidden layer.  Problem: How to configure the hidden layer ?  Known: One hidden layer seems to be sufficient. [Kolmogorov (1960’s)]

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Diagnostics in action(2) Kolgomorov statement: A feedforward network with three layers (input, output and hidden) with appropriate I/O relation that can vary from neuron to neuron is sufficient to compute any function.  More hidden layers reduce the size of individual layers.

Prof. Pushpak Bhattacharyya IIT Bombay 24/08/2004 - 10/09/2004 CS-621/CS-449 Lecture Notes Diagnostics in action(3) 3) Observe the outputs: If they are close to 0 or 1, try the following: 1.Scale the inputs or divide by a normalizing factor. 2.Change the shape and size of the sigmoid. 3.Introduce momentum factor.  Accelerates the movement out of the trough.  Dampens oscillation inside the trough.  Choosing : If is large, we may jump over the minimum.

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,

Similar presentations

Presentation on theme: "Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,

Similar presentations

Presentation on theme: "Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,"— Presentation transcript:

Similar presentations

About project

Feedback