CS623: Introduction to Computing with Neural Nets (lecture-2)

Slides:



Advertisements
Similar presentations
CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.
Advertisements

Introduction to Neural Networks Computing
G53MLE | Machine Learning | Dr Guoping Qiu
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
CS623: Introduction to Computing with Neural Nets Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 41,42– Artificial Neural Network, Perceptron, Capacity 2 nd, 4 th Nov,
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621: Artificial Intelligence Lecture 10: Perceptrons introduction Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CS621 : Artificial Intelligence
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
Chapter 2 Single Layer Feedforward Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
第 3 章 神经网络.
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
Pushpak Bhattacharyya Computer Science and Engineering Department
CS621: Artificial Intelligence
Data Mining with Neural Networks (HK: Chapter 7.5)
Classification Neural Networks 1
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Chapter 3. Artificial Neural Networks - Introduction -
CSE (c) S. Tanimoto, 2004 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Perceptron as one Type of Linear Discriminants
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
CSE (c) S. Tanimoto, 2001 Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
Artificial Intelligence Chapter 3 Neural Networks
CSE (c) S. Tanimoto, 2002 Neural Networks
CS621: Artificial Intelligence
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Artificial Intelligence Chapter 3 Neural Networks
ARTIFICIAL INTELLIGENCE
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS344 : Introduction to Artificial Intelligence
Lecture 8. Learning (V): Perceptron Learning
Seminar on Machine Learning Rada Mihalcea
CSE (c) S. Tanimoto, 2007 Neural Nets
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS623: Introduction to Computing with Neural Nets (lecture-11)
CS621 : Artificial Intelligence
Outline Announcement Neural networks Perceptrons - continued
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-2) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Recapituation

The human brain Insula: Intuition & empathy Pre-frontal cortex: Self control Amygdala: Aggresion Hippocampus: Emotional memory Anterior Cingulate Cortex: Anxiety Hypothalamus: Control of Hormones Pituitary gland: Mother instincts

Functional Map of the Brain

The Perceptron Model A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron. Output = y Threshold = θ w1 wn Wn-1 x1 Xn-1

y θ 1 Step function / Threshold function y = 1 for Σwixi >=θ =0 otherwise

Perceptron Training Algorithm (PTA) Preprocessing: The computation law is modified to y = 1 if ∑wixi > θ y = o if ∑wixi < θ  . . . θ, ≤ w1 w2 wn x1 x2 x3 xn . . . θ, < w1 w2 w3 wn x1 x2 x3 xn w3

PTA – preprocessing cont… 2. Absorb θ as a weight  3. Negate all the zero-class examples . . . θ w1 w2 w3 wn x2 x3 xn . . . θ w1 w2 w3 wn x2 x3 xn x1 w0=θ x0= -1 x1

Example to demonstrate preprocessing OR perceptron 1-class <1,1> , <1,0> , <0,1> 0-class <0,0> Augmented x vectors:- 1-class <-1,1,1> , <-1,1,0> , <-1,0,1> 0-class <-1,0,0> Negate 0-class:- <1,0,0>

Example to demonstrate preprocessing cont.. Now the vectors are x0 x1 x2 X1 -1 0 1 X2 -1 1 0 X3 -1 1 1 X4 1 0 0

Perceptron Training Algorithm Start with a random value of w ex: <0,0,0…> Test for wxi > 0 If the test succeeds for i=1,2,…n then return w 3. Modify w, wnext = wprev + xfail

Tracing PTA on OR-example w=<0,0,0> wx1 fails w=<-1,0,1> wx4 fails w=<0,0 ,1> wx2 fails w=<-1,1,1> wx1 fails w=<0,1,2> wx4 fails w=<1,1,2> wx2 fails w=<0,2,2> wx4 fails w=<1,2,2> success

Theorems on PTA The process will terminate The order of selection of xi for testing and wnext does not matter.

End of recap

Proof of Convergence of PTA Perceptron Training Algorithm (PTA) Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Proof of Convergence of PTA Suppose wn is the weight vector at the nth step of the algorithm. At the beginning, the weight vector is w0 Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as wi+1 = wi + Xj Since Xjs form a linearly separable function,  w* s.t. w*Xj > 0 j

Proof of Convergence of PTA Consider the expression G(wn) = wn . w* | wn| where wn = weight at nth iteration G(wn) = |wn| . |w*| . cos  |wn| where  = angle between wn and w* G(wn) = |w*| . cos  G(wn) ≤ |w*| ( as -1 ≤ cos  ≤ 1)

Behavior of Numerator of G wn . w* = (wn-1 + Xn-1fail ) . w* wn-1 . w* + Xn-1fail . w* (wn-2 + Xn-2fail ) . w* + Xn-1fail . w* ….. w0 . w* + ( X0fail + X1fail +.... + Xn-1fail ). w* w*.Xifail is always positive: note carefully Suppose |Xj| ≥  , where  is the minimum magnitude. Num of G ≥ |w0 . w*| + n  . |w*| So, numerator of G grows with n.

Behavior of Denominator of G |wn| =  wn . wn  (wn-1 + Xn-1fail )2  (wn-1)2 + 2. wn-1. Xn-1fail + (Xn-1fail )2  (wn-1)2 + (Xn-1fail )2 (as wn-1. Xn-1fail ≤ 0 )  (w0)2 + (X0fail )2 + (X1fail )2 +…. + (Xn-1fail )2 |Xj| ≤  (max magnitude) So, Denom ≤  (w0)2 + n2

Some Observations Numerator of G grows as n Denominator of G grows as  n => Numerator grows faster than denominator If PTA does not terminate, G(wn) values will become unbounded.

Some Observations contd. But, as |G(wn)| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

Convergence of PTA Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Study of Linear Separability W. Xj = 0 defines a hyperplane in the (n+1) dimension. => W vector and Xj vectors are perpendicular to each other. y . . . θ w1 w2 w3 wn x2 x3 xn x1

Linear Separability Positive set : w. Xj > 0 j≤k Negative set : + Xk+1 - Xk+ Positive set : w. Xj > 0 j≤k Xk+2 - - Xm - Negative set : w. Xj < 0 j>k Separating hyperplane

Test for Linear Separability (LS) Theorem: A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC) PLC – Both a necessary and sufficient condition. X1, X2, … , Xm - Vectors of the function Y1, Y2, … , Ym - Augmented negated set Prepending -1 to the 0-class vector Xi and negating it, gives Yi

Example (1) - XNOR The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m where each Pi is a non-negative scalar and atleast one Pi > 0 Example : 2 bit even-parity (X- NOR function) <-1,1,1> Y4 <1,1> + X4 <1,-1,0> Y3 <1,0> - X3 <1,0,-1> Y2 <0,1> - X2 <-1,0,0> Y1 <0,0> + X1

Example (1) - XNOR P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T All Pi = 1 gives the result. For Parity function, PLC exists => Not linearly separable.

Example (2) – Majority function 3-bit majority function <-1 1 1 1> <1 1 1> + <-1 1 1 0> <1 1 0> + <-1 1 0 1> <1 0 1> + <1 -1 0 0> <1 0 0> - <-1 0 1 1> <0 1 1> + <1 0 -1 0> <0 1 0> - <1 0 0 -1> <0 0 1> - <1 0 0 0> <0 0 0> - Yi Xi Suppose PLC exists. Equations obtained are: P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0 -P5+ P6 + P7+ P8 = 0 -P3+ P4 + P7+ P8 = 0 -P2+ P4 + P6+ P8 = 0 On solving, all Pi will be forced to 0 3 bit majority function => No PLC => LS

Limitations of perceptron Non-linear separability is all pervading Single perceptron does not have enough computing power Eg: XOR cannot be computed by perceptron

Solutions Tolerate error (Ex: pocket algorithm used by connectionist expert systems). Try to get the best possible hyperplane using only perceptrons Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola Use layered network

Example - XOR = 0.5 w1=1 w2=1  Calculation of XOR x1x2 x1x2 1 x1x 2 1 x1x 2 x2 x1 Calculation of x1x2 = 1 w1=-1 w2=1.5 x1 x2

Example - XOR = 0.5 w1=1 w2=1 x1x2 1 1 x1x2 1.5 -1 -1 1.5 x1 x2

Multi Layer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”

Assignment Find by solving linear inequalities the perceptron to solve the majority Boolean Function. Then feed it to your implementation of the perceptron training algorithm and study its behaviour. Download SNNS and WEKA packages and try your hand at the perceptron algorithm built in the packages.