CS344 : Introduction to Artificial Intelligence

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.
Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
Neural Networks Lecture 8: Two simple learning algorithms
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS344 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 Introducing Neural Nets.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
CS623: Introduction to Computing with Neural Nets Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 41,42– Artificial Neural Network, Perceptron, Capacity 2 nd, 4 th Nov,
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21 Computing power of Perceptrons and Perceptron Training.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Perceptrons Michael J. Watts
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621: Artificial Intelligence Lecture 10: Perceptrons introduction Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence
Chapter 2 Single Layer Feedforward Networks
Computational Learning Theory
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
CS621: Artificial Intelligence
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
Perceptrons Introduced in1957 by Rosenblatt
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Artificial Intelligence Lecture No. 28
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
Artificial Intelligence Chapter 3 Neural Networks
CS344 : Introduction to Artificial Intelligence
Artificial Intelligence 12. Two Layer ANNs
CS621: Artificial Intelligence
Artificial Intelligence Chapter 3 Neural Networks
CS 621 Artificial Intelligence Lecture 27 – 21/10/05
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
ARTIFICIAL INTELLIGENCE
CS621: Artificial Intelligence Lecture 12: Counting no
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
Lecture 8. Learning (V): Perceptron Learning
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
CS621 : Artificial Intelligence
Artificial Intelligence Chapter 3 Neural Networks
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Computing power of Perceptron; Perceptron Training

Perceptron recap

The Perceptron Model A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron. Output = y Threshold = θ w1 wn Wn-1 x1 Xn-1

Computation of Boolean functions AND of 2 inputs X1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 The parameter values (weights & thresholds) need to be found. y θ w1 w2 x1 x2

Computing parameter values w1 * 0 + w2 * 0 <= θ  θ >= 0; since y=0 w1 * 0 + w2 * 1 <= θ  w2 <= θ; since y=0 w1 * 1 + w2 * 0 <= θ  w1 <= θ; since y=0 w1 * 1 + w2 *1 > θ  w1 + w2 > θ; since y=1 w1 = w2 = = 0.5 satisfy these inequalities and find parameters to be used for computing AND function.

Threshold functions n # Boolean functions (2^2^n) #Threshold Functions (2n2) 1 4 4 2 16 14 3 256 128 64K 1008 Functions computable by perceptrons - threshold functions #TF becomes negligibly small for larger values of #BF. For n=2, all functions except XOR and XNOR are computable.

Concept of Hyper-planes ∑ wixi = θ defines a linear surface in the (W,θ) space, where W=<w1,w2,w3,…,wn> is an n-dimensional vector. A point in this (W,θ) space defines a perceptron. y . . . θ w1 w2 w3 wn x2 x3 xn x1

Simple perceptron contd. True-Function 1 f4 f3 f2 f1 x θ<0 W< θ 0-function Identity Function Complement Function θ≥0 w≤θ θ≥0 w> θ θ<0 w≤ θ

Counting the number of functions for the simplest perceptron For the simplest perceptron, the equation is w.x=θ. Substituting x=0 and x=1, we get θ=0 and w=θ. These two lines intersect to form four regions, which correspond to the four functions. θ w=θ R4 R1 w R3 θ=0 R2

Fundamental Observation The number of TFs computable by a perceptron is equal to the number of regions produced by 2n hyper- planes,obtained by plugging in the values <x1,x2,x3,…,xn> in the equation ∑i=1nwixi= θ

For 2 input perceptron Basic equation w1x1+w2x2-θ=0 Put 4 values of x1 and x2: <0,0>, <0,1>, <1,0>, <1,1> Defines 4 planes through the origin of the <w1, w2, θ>

No. of regions by these 4 planes The first plane P1 produces 2 regions in the space The second plane P2 is cut by P1 in a line l12, which produces 2 regions on P2 2+2=4 regions P3 is cut by P2 and P1 to produce two lines l23 and l13 through the origin. These lines produce 4 regions on on P3

No. of regions (contd.) 4+4= 8 regions when P3 is introduced P4 is cut by P1, P2, P3 to produce 3 lines passing through origin and producing 6 regions on P4 Total no. of regions= 8+6=14 This is the no. of Boolean functions that 2-input perceptron will be able to compute

Perceptron Training Algorithm

Perceptron Training Algorithm (PTA) Preprocessing: The computation law is modified to y = 1 if ∑wixi > θ y = o if ∑wixi < θ  . . . θ, ≤ w1 w2 wn x1 x2 x3 xn . . . θ, < w1 w2 w3 wn x1 x2 x3 xn w3

PTA – preprocessing cont… 2. Absorb θ as a weight  3. Negate all the zero-class examples . . . θ w1 w2 w3 wn x2 x3 xn . . . θ w1 w2 w3 wn x2 x3 xn x1 w0=θ x0= -1 x1

Example to demonstrate preprocessing OR perceptron 1-class <1,1> , <1,0> , <0,1> 0-class <0,0> Augmented x vectors:- 1-class <-1,1,1> , <-1,1,0> , <-1,0,1> 0-class <-1,0,0> Negate 0-class:- <1,0,0>

Example to demonstrate preprocessing cont.. Now the vectors are x0 x1 x2 X1 -1 0 1 X2 -1 1 0 X3 -1 1 1 X4 1 0 0

Perceptron Training Algorithm Start with a random value of w ex: <0,0,0…> Test for wxi > 0 If the test succeeds for i=1,2,…n then return w 3. Modify w, wnext = wprev + xfail

Tracing PTA on OR-example w=<0,0,0> wx1 fails w=<-1,0,1> wx4 fails w=<0,0 ,1> wx2 fails w=<-1,1,1> wx1 fails w=<0,1,2> wx4 fails w=<1,1,2> wx2 fails w=<0,2,2> wx4 fails w=<1,2,2> success

Proof of Convergence of PTA Perceptron Training Algorithm (PTA) Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Proof of Convergence of PTA Suppose wn is the weight vector at the nth step of the algorithm. At the beginning, the weight vector is w0 Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj Since Xjs form a linearly separable function,  w* s.t. w*Xj > 0 j

Proof of Convergence of PTA Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration G(wn) = |wn| . |w*| . cos  |wn| where  = angle between wn and w* G(wn) = |w*| . cos  G(wn) ≤ |w*| ( as -1 ≤ cos  ≤ 1)

Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* wn-1 . w* + Xn-1fail . w* (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. w0 . w* + ( X0fail + X1fail +.... + Xn-1fail ). w* w*.Xifail is always positive: note carefully Suppose |Xj| ≥  , where  is the minimum magnitude. Num of G ≥ |w0 . w*| + n  . |w*| So, numerator of G grows with n.

Behavior of Denominator of G |wn| =  wn . wn  (wn-1 + Xn-1fail )2  (wn-1)2 + 2. wn-1. Xn-1fail + (Xn-1fail )2  (wn-1)2 + (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 )  (w0)2 + (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 |Xj| ≤  (max magnitude) So, Denom ≤  (w0)2 + n2

Some Observations Numerator of G grows as n Denominator of G grows as  n => Numerator grows faster than denominator If PTA does not terminate, G(wn) values will become unbounded.

Some Observations contd. But, as |G(wn)| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

Convergence of PTA Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.