CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.

Slides:



Advertisements
Similar presentations
CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 10– Club and Circuit Examples.
Perceptron Learning Rule
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture– 4, 5, 6: A* properties 9 th,10 th and 12 th January,
Overview over different methods – Supervised Learning
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
Support Vector Machines Classification
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Neurons, Neural Networks, and Learning 1. Human brain contains a massively interconnected net of (10 billion) neurons (cortical cells) Biological.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31 and 32– Brain and Perceptron.
1 Chapter 20 Section Slide Set 2 Perceptron examples Additional sources used in preparing the slides: Nils J. Nilsson’s book: Artificial Intelligence:
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 19: Fuzzy Logic and Neural Net Based IR.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 6: 22/09/2004, 24/09/2004,
CS344 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29 Introducing Neural Nets.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
CS623: Introduction to Computing with Neural Nets Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 41,42– Artificial Neural Network, Perceptron, Capacity 2 nd, 4 th Nov,
CS344: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20,21: Application of Predicate Calculus.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Linear Discrimination Reading: Chapter 2 of textbook.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21 Computing power of Perceptrons and Perceptron Training.
Linear Classification with Perceptrons
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
CS 621 Artificial Intelligence Lecture /11/05 Guest Lecture by Prof
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621: Artificial Intelligence Lecture 10: Perceptrons introduction Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
CS621: Artificial Intelligence
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
ARTIFICIAL INTELLIGENCE
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS621 : Artificial Intelligence
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and convergence 22 nd March, 2011

The Perceptron Model A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron. Output = y wnwn W n-1 w1w1 X n-1 x1x1 Threshold = θ

θ 1 y Step function / Threshold function y = 1 for Σw i x i >=θ =0 otherwise ΣwixiΣwixi

Features of Perceptron Input output behavior is discontinuous and the derivative does not exist at Σw i x i = θ Σw i x i - θ is the net input denoted as net Referred to as a linear threshold element - linearity because of x appearing with power 1 y= f(net): Relation between y and net is non- linear

Perceptron Training Algorithm (PTA) Preprocessing: 1. The computation law is modified to y = 1 if ∑w i x i > θ y = o if ∑w i x i < θ ... θ, ≤ w1w1 w2w2 wnwn x1x1 x2x2 x3x3 xnxn... θ, < w1w1 w2w2 w3w3 wnwn x1x1 x2x2 x3x3 xnxn w3w3

PTA – preprocessing cont… 2. Absorb θ as a weight  3. Negate all the zero-class examples... θ w1w1 w2w2 w3w3 wnwn x2x2 x3x3 xnxn x1x1 w0=θw0=θ x 0 = θ w1w1 w2w2 w3w3 wnwn x2x2 x3x3 xnxn x1x1

Example to demonstrate preprocessing OR perceptron 1-class,, 0-class Augmented x vectors:- 1-class,, 0-class Negate 0-class:-

Example to demonstrate preprocessing cont.. Now the vectors are x 0 x 1 x 2 X X X X

Perceptron Training Algorithm 1. Start with a random value of w ex: 2. Test for wx i > 0 If the test succeeds for i=1,2,…n then return w 3. Modify w, w next = w prev + x fail

PTA on NAND NAND: Y X 2 X 1 Y W 2 W X 2 X 1 Converted To W 2 W 1 W 0= Θ X 2 X 1 X 0=-1 Θ Θ

Preprocessing NAND Augmented: NAND-0 class Negated X 2 X 1 X 0 Y X 2 X 1 X V 0: V 1: V 2: V 3: Vectors for which W= has to be found such that W. Vi > 0

PTA Algo steps Algorithm: 1. Initialize and Keep adding the failed vectors until W. Vi > 0 is true. Step 0: W = W 1 = + {V 0 Fails} = W 2 = + {V 3 Fails} = W 3 = + {V 0 Fails} = W 4 = + {V 1 Fails} =

Trying convergence W 5 = + {V 3 Fails} = W 6 = + {V 1 Fails} = W 7 = + {V 0 Fails} = W 8 = + {V 3 Fails} = W 9 = + {V 2 Fails} =

Trying convergence W 10 = + {V 3 Fails} = W 11 = + {V 1 Fails} = W 12 = + {V 3 Fails} = W 13 = + {V 1 Fails} = W 14 = + {V 2 Fails} =

Converged! W 15 = + {V 3 Fails} = W 16 = + {V 2 Fails} = W 17 = + {V 3 Fails} = W 18 = + {V 1 Fails} = W 2 = -3, W 1 = -2, W 0 = Θ = -4

PTA convergence

Statement of Convergence of PTA Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Proof of Convergence of PTA Suppose w n is the weight vector at the n th step of the algorithm. At the beginning, the weight vector is w 0 Go from w i to w i+1 when a vector X j fails the test w i X j > 0 and update w i as w i+1 = w i + X j Since Xjs form a linearly separable function,  w* s.t. w*X j > 0  j

Proof of Convergence of PTA (cntd.) Consider the expression G(w n ) = w n. w* | w n | where w n = weight at nth iteration G(w n ) = |w n |. |w*|. cos  |w n | where  = angle between w n and w* G(w n ) = |w*|. cos  G(w n ) ≤ |w*| ( as -1 ≤ cos  ≤ 1)

Behavior of Numerator of G w n. w* = (w n-1 + X n-1 fail ). w*  w n-1. w* + X n-1 fail. w*  (w n-2 + X n-2 fail ). w* + X n-1 fail. w* …..  w 0. w* + ( X 0 fail + X 1 fail X n-1 fail ). w* w*.X i fail is always positive: note carefully Suppose |X j | ≥ , where  is the minimum magnitude. Num of G ≥ |w 0. w*| + n . |w*| So, numerator of G grows with n.

Behavior of Denominator of G |w n | =  w n. w n   (w n-1 + X n-1 fail ) 2   (w n-1 ) w n-1. X n-1 fail + (X n-1 fail ) 2 ≤  (w n-1 ) 2 + (X n-1 fail ) 2 (as w n-1. X n-1 fail ≤ 0 ) ≤  (w 0 ) 2 + (X 0 fail ) 2 + (X 1 fail ) 2 +…. + (X n-1 fail ) 2 |X j | ≤  (max magnitude) So, Denom ≤  (w 0 ) 2 + n  2

Some Observations Numerator of G grows as n Denominator of G grows as  n => Numerator grows faster than denominator If PTA does not terminate, G(w n ) values will become unbounded.

Some Observations contd. But, as |G(w n )| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

Convergence of PTA proved Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.