CS623: Introduction to Computing with Neural Nets (lecture-3)

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Introduction to Neural Networks Computing
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
NEURAL NETWORKS Perceptron
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Simple Neural Nets For Pattern Classification
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 Introducing Neural Nets.
CS623: Introduction to Computing with Neural Nets Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 41,42– Artificial Neural Network, Perceptron, Capacity 2 nd, 4 th Nov,
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21 Computing power of Perceptrons and Perceptron Training.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
EEE502 Pattern Recognition
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621: Artificial Intelligence Lecture 10: Perceptrons introduction Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Chapter 2 Single Layer Feedforward Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Artificial Neural Networks
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Perceptrons for Dummies
Data Mining with Neural Networks (HK: Chapter 7.5)
Classification Neural Networks 1
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Neuro-Computing Lecture 4 Radial Basis Function Network
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Capabilities of Threshold Neurons
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
CS621 : Artificial Intelligence
CS621: Artificial Intelligence
CS 621 Artificial Intelligence Lecture 27 – 21/10/05
Neuro-Computing Lecture 2 Single-Layer Perceptrons
CS621: Artificial Intelligence Lecture 12: Counting no
CS623: Introduction to Computing with Neural Nets (lecture-5)
Computer Vision Lecture 19: Object Recognition III
CS344 : Introduction to Artificial Intelligence
CS 621 Artificial Intelligence Lecture 29 – 22/10/05
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
CS621 : Artificial Intelligence
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-3) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Computational Capacity of Perceptrons

Separating plane ∑ wixi = θ defines a linear surface in the (W,θ) space, where W=<w1,w2,w3,…,wn> is an n-dimensional vector. A point in this (W,θ) space defines a perceptron. y . . . θ w1 w2 w3 wn x2 x3 xn x1

The Simplest Perceptron θ y x1 w1 Depending on different values of w and θ, four different functions are possible

Simplest perceptron contd. True-Function 1 f4 f3 f2 f1 x θ<0 W<0 0-function Identity Function Complement Function θ≥0 w≤0 θ≥0 w>0 θ<0 w≤0

Counting the #functions for the simplest perceptron For the simplest perceptron, the equation is w.x=θ. Substituting x=0 and x=1, we get θ=0 and w=θ. These two lines intersect to form four regions, which correspond to the four functions. w=θ R4 R1 R3 θ=0 R2

Fundamental Observation The number of TFs computable by a perceptron is equal to the number of regions produced by 2n hyper-planes,obtained by plugging in the values <x1,x2,x3,…,xn> in the equation ∑i=1nwixi= θ Intuition: How many lines are produced by the existing planes on the new plane? How many regions are produced on the new plane by these lines?

The geometrical observation Problem: m linear surfaces called hyper-planes (each hyper-plane is of (d-1)-dim) in d-dim, then what is the max. no. of regions produced by their intersection? i.e. Rm,d = ?

Concept forming examples Max #regions formed by m lines in 2-dim is Rm,2 = Rm-1,2 + ? The new line intersects m-1 lines at m-1 points and forms m new regions. Rm,2 = Rm-1,2 + m , R1,2 = 2 Max #regions formed by m planes in 3 dimensions is Rm,3 = Rm-1,3 + Rm-1,2 , R1,3 = 2

Concept forming examples contd.. Max #regions formed by m planes in 4 dimensions is Rm,4 = Rm-1,4 + Rm-1,3 , R1,4 = 2 Rm,d = Rm-1,d + Rm-1,d-1 Subject to R1,d = 2 Rm,1 = 2

General Equation Subject to Rm,d = Rm-1,d + Rm-1,d-1 R1,d = 2 Rm,1 = 2 All the hyperplanes pass through the origin.

Method of Observation for lines in 2-D Rm,2 = Rm-1,2 + m Rm-1,2 = Rm-2,2 + m-1 Rm-2,2 = Rm-3,2 + m-2 : R2,2 = R1,2 + 2 Therefore, Rm,2 = Rm-1,2 + m = 2 + m + (m-1) + (m-2) + …+ 2 = 1 + ( 1 + 2 + 3 + … + m) = 1 + [m(m+1)]/2

Method of generating function Rm,2 = Rm-1,2 + m f(x) = R1,2 x + R2,2 x2 + R3,2 x3 + … + Ri,2 xm + … + α –>Eq1 xf(x) = R1,2 x2 + R2,2 x3 + R3,2 x4 + … + Ri,2 xm+1 + … + α –>Eq2 Observe that Rm,2 - Rm-1,2 = m

Method of generating functions cont… Eq1 – Eq2 gives (1-x)f(x) = R1,2 x + (R2,2 - R1,2)x2 + (R3,2 - R2,2)x3 + … + (Rm,2 - Rm-1,2)xm + … + α (1-x)f(x) = R1,2 x + (2x2 + 3x3 + …+mxm+..) = 2x2 + 3x3 + …+mxm+.. f(x) = (2x2 + 3x3 + …+mxm+..)(1-x)-1

Method of generating functions cont… f(x) = (2x2 + 3x3 + …+mxm+..)(1+x+x2+x3…) Eq3 Coeff of xm is Rm,2 = (2 + 2 + 3 + 4 + …+m) = 1+[m(m+1)/2]

The general problem of m hyperplanes in d dimensional space c(m,d) = c(m-1,d) + c(m-1,d-1) subject to c(m,1)= 2 c(1,d)= 2

Generating function f(x,y) = R1,1xy + R1,2xy2 + R1,3xy3 + … f(x,y) = ∑m≥1∑n≥1 Rm,d xmyd

# of regions formed by m hyperplanes passing through origin in the d dimensional space c(m,d)= 2.Σd-1i=0m-1ci

Machine Learning Basics Learning from examples: e1,e2,e3… are +ve examples f1, f2, f3… are –ve examples e1 e2 e3 … en Concept C f1 f2 f3 fn

Machine Learning Basics cont.. Training: arrive at hypothesis h based on the data seen. Testing: present new data to h test performance. hypothesis h concept c

Feedforward Network

Limitations of perceptron Non-linear separability is all pervading Single perceptron does not have enough computing power Eg: XOR cannot be computed by perceptron

Solutions Tolerate error (Ex: pocket algorithm used by connectionist expert systems). Try to get the best possible hyperplane using only perceptrons Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola Use layered network

Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea: Always preserve the best weight obtained so far in the “pocket” Change weights, if found better (i.e. changed weights result in reduced error).

XOR using 2 layers Non-LS function expressed as a linearly separable function of individual linearly separable functions.

Example - XOR = 0.5 w1=1 w2=1  Calculation of XOR x1x2 x1x2 x1 x2 1 Calculation of x1x2 = 1 w1=-1 w2=1.5 x1 x2

Example - XOR = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x1 x2

Some Terminology A multilayer feedforward neural network has Input layer Output layer Hidden layer (asserts computation) Output units and hidden units are called computation units.

Training of the MLP Multilayer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”

Gradient Descent Technique Let E be the error at the output layer ti = target output; oi = observed output i is the index going over n neurons in the outermost layer j is the index going over the p patterns (1 to p) Ex: XOR:– p=4 and n=1

Weights in a ff NN wmn is the weight of the connection from the nth neuron to the mth neuron E vs surface is a complex surface in the space defined by the weights wij gives the direction in which a movement of the operating point in the wmn co-ordinate space will result in maximum decrease in error m wmn n

Sigmoid neurons Gradient Descent needs a derivative computation - not possible in perceptron due to the discontinuous step function used!  Sigmoid neurons with easy-to-compute derivatives used! Computing power comes from non-linearity of sigmoid function.

Derivative of Sigmoid function

Training algorithm Initialize weights to random values. For input x = <xn,xn-1,…,x0>, modify weights as follows Target output = t, Observed output = o Iterate until E <  (threshold)

Calculation of ∆wi

Observations Does the training technique support our intuition? The larger the xi, larger is ∆wi Error burden is borne by the weight values corresponding to large input values