Download presentation
Presentation is loading. Please wait.
1
Introduction to Neural Networks John Paxton Montana State University Summer 2003
2
Chapter 2: Simple Neural Networks for Pattern Classification x0x0 x1x1 y xnxn 1 w0w0 w1w1 wnwn w 0 is the bias f(y in ) = 1 if y in >= 0 f(y in ) = 0 otherwise ARCHITECTURE
3
Representations Binary: 0 no, 1 yes Bipolar: -1 no, 0 unknown, 1 yes Bipolar is superior
4
Interpreting the Weights w 0 = -1, w 1 = 1, w 2 = 1 0 = -1 + x 1 + x 2 or x 2 = 1 – x 1 decision boundary x1 x2 YES NO
5
Modelling a Simple Problem Should I attend this lecture? x 1 = it’s hot x 2 = it’s raining x0x0 x1x1 y x2x2 2.5 -2 1
6
Linear Separability AND OR XOR 1 0 0 0 00 0 1 1 11 1
7
Hebb’s Rule 1949. Increase the weight between two neurons that are both “on”. 1988. Increase the weight between two neurons that are both “off”. w i (new) = w i (old) + x i *y
8
Algorithm 1. set w i = 0 for 0 <= i <= n 2. for each training vector 3.set x i = s i for all input units 4.set y = t 5.w i (new) = w i (old) + x i *y
9
Example: 2 input AND s0s0 s1s1 s2s2 t 1111 11 1 1 1
10
Training Procedure w 0 w 1 w 2 x 0 x 1 x 2 y 0001111 11111-1 (!) 00211-1 (!) 111 -222
11
Result Interpretation -2 + 2x 1 + 2x 2 = 0 OR x 2 = -x 1 + 1 This training procedure is order dependent and not guaranteed.
12
Pattern Recognition Exercise #.#.#..#. #.# #.#.#. “X” “O”
13
Pattern Recognition Exercise Architecture? Weights? Are the original patterns classified correctly? Are the original patterns with 1 piece of wrong data classified correctly? Are the original patterns with 1 piece of missing data classified correctly?
14
Perceptrons (1958) Very important early neural network Guaranteed training procedure under certain circumstances x0x0 x1x1 y xnxn 1 w0w0 w1w1 wnwn
15
Activation Function f(y in ) = 1 if y in > f(y in ) = 0 if - <= y in <= f(y in ) = -1 otherwise Graph interpretation 1
16
Learning Rule w i (new) = w i (old) + *t*x i if error is the learning rate Typically, 0 < <= 1
17
Algorithm 1. set w i = 0 for 0 <= i <= n (can be random) 2. for each training exemplar do 3.x i = s i 4.y in = x i *w i 5.y = f(y in ) 6.w i (new) = w i (old) + *t*x i if error 7. if stopping condition not reached, go to 2
18
Example: AND concept bipolar inputs bipolar target = 0 = 1
19
Epoch 1 w0w0 w1w1 w2w2 x0x0 x1x1 x2x2 yt 00011101 111111 0021 11 111
20
Exercise Continue the above example until the learning algorithm is finished.
21
Perceptron Learning Rule Convergence Theorem If a weight vector exists that correctly classifies all of the training examples, then the perceptron learning rule will converge to some weight vector that gives the correct response for all training patterns. This will happen in a finite number of steps.
22
Exercise Show perceptron weights for the 2-of-3 concept x1x2x3y 1111 111 1 11 1 111 1 1
23
Adaline (Widrow, Huff 1960) Adaptive Linear Network Learning rule minimizes the mean squared error Learns on all examples, not just ones with errors
24
Architecture x0x0 x1x1 y xnxn 1 w0w0 w1w1 wnwn
25
Training Algorithm 1. set w i (small random values typical) 2. set (0.1 typical) 3. for each training exemplar do 4.x i = s i 5.y in = x i *w i 6.w i (new) = w i (old) + *(t – y in )*x i 7. go to 3 if largest weight change big enough
26
Activation Function f(y in ) = 1 if y in >= 0 f(y in ) = -1 otherwise
27
Delta Rule squared error E = (t – y in ) 2 minimize error E’ = -2(t – y in )x i = (t – y in )x i
28
Example: AND concept bipolar inputs bipolar targets w 0 = -0.5, w 1 = 0.5, w 2 = 0.5 minimizes E x0x0 x1x1 x2x2 y in tE 111.51.25 11-.5.25 11-.5.25 1 -1.5.25
29
Exercise Demonstrate that you understand the Adaline training procedure.
30
Madaline Many adaptive linear neurons xmxm x1x1 1 zkzk z1z1 1 y
31
Madaline MRI (1960) – only learns weights from input layer to hidden layer MRII (1987) – learns all weights
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.