CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and convergence 22 nd March, 2011
The Perceptron Model A perceptron is a computing element with input lines having associated weights and the cell having a threshold value. The perceptron model is motivated by the biological neuron. Output = y wnwn W n-1 w1w1 X n-1 x1x1 Threshold = θ
θ 1 y Step function / Threshold function y = 1 for Σw i x i >=θ =0 otherwise ΣwixiΣwixi
Features of Perceptron Input output behavior is discontinuous and the derivative does not exist at Σw i x i = θ Σw i x i - θ is the net input denoted as net Referred to as a linear threshold element - linearity because of x appearing with power 1 y= f(net): Relation between y and net is non- linear
Perceptron Training Algorithm (PTA) Preprocessing: 1. The computation law is modified to y = 1 if ∑w i x i > θ y = o if ∑w i x i < θ ... θ, ≤ w1w1 w2w2 wnwn x1x1 x2x2 x3x3 xnxn... θ, < w1w1 w2w2 w3w3 wnwn x1x1 x2x2 x3x3 xnxn w3w3
PTA – preprocessing cont… 2. Absorb θ as a weight 3. Negate all the zero-class examples... θ w1w1 w2w2 w3w3 wnwn x2x2 x3x3 xnxn x1x1 w0=θw0=θ x 0 = θ w1w1 w2w2 w3w3 wnwn x2x2 x3x3 xnxn x1x1
Example to demonstrate preprocessing OR perceptron 1-class,, 0-class Augmented x vectors:- 1-class,, 0-class Negate 0-class:-
Example to demonstrate preprocessing cont.. Now the vectors are x 0 x 1 x 2 X X X X
Perceptron Training Algorithm 1. Start with a random value of w ex: 2. Test for wx i > 0 If the test succeeds for i=1,2,…n then return w 3. Modify w, w next = w prev + x fail
PTA on NAND NAND: Y X 2 X 1 Y W 2 W X 2 X 1 Converted To W 2 W 1 W 0= Θ X 2 X 1 X 0=-1 Θ Θ
Preprocessing NAND Augmented: NAND-0 class Negated X 2 X 1 X 0 Y X 2 X 1 X V 0: V 1: V 2: V 3: Vectors for which W= has to be found such that W. Vi > 0
PTA Algo steps Algorithm: 1. Initialize and Keep adding the failed vectors until W. Vi > 0 is true. Step 0: W = W 1 = + {V 0 Fails} = W 2 = + {V 3 Fails} = W 3 = + {V 0 Fails} = W 4 = + {V 1 Fails} =
Trying convergence W 5 = + {V 3 Fails} = W 6 = + {V 1 Fails} = W 7 = + {V 0 Fails} = W 8 = + {V 3 Fails} = W 9 = + {V 2 Fails} =
Trying convergence W 10 = + {V 3 Fails} = W 11 = + {V 1 Fails} = W 12 = + {V 3 Fails} = W 13 = + {V 1 Fails} = W 14 = + {V 2 Fails} =
Converged! W 15 = + {V 3 Fails} = W 16 = + {V 2 Fails} = W 17 = + {V 3 Fails} = W 18 = + {V 1 Fails} = W 2 = -3, W 1 = -2, W 0 = Θ = -4
PTA convergence
Statement of Convergence of PTA Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.
Proof of Convergence of PTA Suppose w n is the weight vector at the n th step of the algorithm. At the beginning, the weight vector is w 0 Go from w i to w i+1 when a vector X j fails the test w i X j > 0 and update w i as w i+1 = w i + X j Since Xjs form a linearly separable function, w* s.t. w*X j > 0 j
Proof of Convergence of PTA (cntd.) Consider the expression G(w n ) = w n. w* | w n | where w n = weight at nth iteration G(w n ) = |w n |. |w*|. cos |w n | where = angle between w n and w* G(w n ) = |w*|. cos G(w n ) ≤ |w*| ( as -1 ≤ cos ≤ 1)
Behavior of Numerator of G w n. w* = (w n-1 + X n-1 fail ). w* w n-1. w* + X n-1 fail. w* (w n-2 + X n-2 fail ). w* + X n-1 fail. w* ….. w 0. w* + ( X 0 fail + X 1 fail X n-1 fail ). w* w*.X i fail is always positive: note carefully Suppose |X j | ≥ , where is the minimum magnitude. Num of G ≥ |w 0. w*| + n . |w*| So, numerator of G grows with n.
Behavior of Denominator of G |w n | = w n. w n (w n-1 + X n-1 fail ) 2 (w n-1 ) w n-1. X n-1 fail + (X n-1 fail ) 2 ≤ (w n-1 ) 2 + (X n-1 fail ) 2 (as w n-1. X n-1 fail ≤ 0 ) ≤ (w 0 ) 2 + (X 0 fail ) 2 + (X 1 fail ) 2 +…. + (X n-1 fail ) 2 |X j | ≤ (max magnitude) So, Denom ≤ (w 0 ) 2 + n 2
Some Observations Numerator of G grows as n Denominator of G grows as n => Numerator grows faster than denominator If PTA does not terminate, G(w n ) values will become unbounded.
Some Observations contd. But, as |G(w n )| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.
Convergence of PTA proved Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.