Introduction to Neural Networks John Paxton Montana State University Summer 2003
Chapter 3: Pattern Association Aristotle’s observed that human memory associates –similar items –contrary items –items close in proximity –items close in succession (a song)
Terminology and Issues Autoassociative Networks Heteroassociative Networks Feedforward Networks Recurrent Networks How many patterns can be stored?
Hebb Rule for Pattern Association Architecture x1x1 xnxn ymym y1y1 w 11 w nm
Algorithm 1.set w ij = 0 1 <= i <= n, 1 <= j <= m 2.for each training pair s:t 3.x i = s i 4.y j = t j 5.w ij (new) = w ij (old) + x i y j
Example s 1 = ( ), s 2 = (-1 1 1) t 1 = (1 -1), t 2 = (-1 1) w 11 = 1*1 + (-1)(-1) = 2 w 12 = 1*(-1) + (-1)1 = -2 w 21 = (-1)1+ 1(-1) = -2 w 22 = (-1)(-1) + 1(1) = 2 w 31 = (-1)1 + 1(-1) = -2 w 32 = (-1)(-1) + 1*1 = 2
Matrix Alternative s 1 = ( ), s 2 = (-1 1 1) t 1 = (1 -1), t 2 = (-1 1) =
Final Network f(y in ) = 1 if y in > 0, 0 if y in = 0, else -1 x1x1 x2x2 x3x3 y2y2 y1y
Properties Weights exist if input vectors are linearly independent Orthogonal vectors can be learned perfectly High weights imply strong correlations
Exercises What happens if ( ) is tested? This vector has one mistake. What happens if ( ) is tested? This vector has one piece of missing data. Show an example of training data that is not learnable. Show the learned network.
Delta Rule for Pattern Association Works when patterns are linearly independent but not orthogonal Introduced in the 1960s for ADALINE Produces a least squares solution
Activation Functions Delta Rule (1) w ij (new) = w ij (old) + (t j – y j )*x i *1 Extended Delta Rule (f’(y in.j )) w ij (new) = w ij (old) + (t j – y j )*x i *f’(y in.j )
Heteroassociative Memory Net Application: Associate characters. A a B b
Autoassociative Net Architecture x1x1 xnxn ynyn y1y1 w 11 w nn
Training Algorithm Assuming that the training vectors are orthogonal, we can use the Hebb rule algorithm mentioned earlier. Application: Find out whether an input vector is familiar or unfamiliar. For example, voice input as part of a security system.
Autoassociate Example = =
Evaluation What happens if (1 1 1) is presented? What happens if (0 1 1) is presented? What happens if (0 0 1) is presented? What happens if (-1 1 1) is presented? What happens if ( ) is presented? Why are the diagonals set to 0?
Storage Capacity 2 vectors (1 1 1), ( ) Recall is perfect =
Storage Capacity 3 vectors: (1 1 1), ( ), (1 -1 1) Recall is no longer perfect =
Theorem Up to n-1 bipolar vectors of n dimensions can be stored in an autoassociative net.
Iterative Autoassociative Net 1 vector: s = (1 1 -1) s t * s = (1 0 0) -> (0 1 -1) (0 1 -1) -> (2 1 -1) -> (1 1 -1) (1 1 -1) -> (2 2 -2) -> (1 1 -1)
Testing Procedure 1.initialize weights using Hebb learning 2.for each test vector do 3.set x i = s i 4.calculate t i 5.set s i = t i 6. go to step 4 if the s vector is new
Exercises 1 piece of missing data: (0 1 -1) 2 pieces of missing data: (0 0 -1) 3 pieces of missing data: (0 0 0) 1 mistake: ( ) 2 mistakes: ( )
Discrete Hopfield Net content addressable problems pattern association problems constrained optimization problems w ij = w ji w ii = 0
Characteristics Only 1 unit updates its activation at a time Each unit continues to receive the external signal An energy (Lyapunov) function can be found that allows the net to converge, unlike the previous system Autoassociative
Architecture y1y1 y2y2 y3 x 1 x 3 x2x2
Algorithm 1.initialize weights using Hebb rule 2.for each input vector do 3.y i = x i 4.do steps 5-6 randomly for each y i 5.y in.i = x i + y j w ji 6.calculate f(y in.i ) 7.go to step 2 if the net hasn’t converged
Example training vector: (1 -1) y1y1 y2y2 x 1 x 2
Example input (0 -1) update y 1 = 0 + (-1)(-1) = 1 update y 2 = (-1) = -2 -> -1 input (1 -1) update y 2 = (-1) = -2 -> -1 update y 1 = (-1) = 2 -> 1
Hopfield Theorems Convergence is guaranteed. The number of storable patterns is approximately n / (2 * log n) where n is the dimension of a vector
Bidirectional Associative Memory (BAM) Heteroassociative Recurrent Net Kosko, 1988 Architecture xnxn x1x1 ymym y1y1
Activation Function f(y in ) = 1, if y in > 0 f(y in ) = 0, if y in = 0 f(y in ) = -1 otherwise
Algorithm 1.initialize weights using Hebb rule 2.for each test vector do 3.present s to x layer 4.present t to y layer 5.while equilibrium is not reached 6.compute f(y in.j ) 7.compute f(x in.j )
Example s1 = (1 1), t1 = (1 -1) s2 = (-1 -1), t2 = (-1 1)
Example Architecture x2x2 x1x1 y2y2 y1y present (1 1) to x -> 1 -1 present (1 -1) to y -> 1 1
Hamming Distance Definition: Number of different corresponding bits in two vectors For example, H[(1 -1), (1 1)] = 1 Average Hamming Distance is ½.
About BAMs Observation: Encoding is better when the average Hamming distance of the inputs is similar to the average Hamming distance of the outputs. The memory capacity of a BAM is min(n-1, m-1).