Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning with Perceptrons and Neural Networks

Similar presentations


Presentation on theme: "Learning with Perceptrons and Neural Networks"— Presentation transcript:

1 Learning with Perceptrons and Neural Networks
Artificial Intelligence CMSC 25000 February 14, 2002

2 Agenda Neural Networks: Perceptrons: Single layer networks
Biological analogy Perceptrons: Single layer networks Perceptron training: Perceptron convergence theorem Perceptron limitations Neural Networks: Multilayer perceptrons Neural net training: Backpropagation Strengths & Limitations Conclusions

3 Neurons: The Concept Dendrites Axon Nucleus Cell Body
Neurons: Receive inputs from other neurons (via synapses) When input exceeds threshold, “fires” Sends output along axon to other neurons Brain: 10^11 neurons, 10^16 synapses

4 Artificial Neural Nets
Simulated Neuron: Node connected to other nodes via links Links = axon+synapse+link Links associated with weight (like synapse) Multiplied by output of node Node combines input via activation function E.g. sum of weighted inputs passed thru threshold Simpler than real neuronal processes

5 Artificial Neural Net w x w Sum Threshold + x w x

6 Perceptrons Single neuron-like element Binary inputs Binary outputs
Weighted sum of inputs > threshold (Possibly logic box between inputs and weights)

7 Perceptron Structure y compensates for threshold w0 wn w1 w3 w2 x0=-1
xn x0 w0 compensates for threshold

8 Perceptron Convergence Procedure
Straight-forward training procedure Learns linearly separable functions Until perceptron yields correct output for all If the perceptron is correct, do nothing If the percepton is wrong, If it incorrectly says “yes”, Subtract input vector from weight vector Otherwise, add input vector to weight vector

9 Perceptron Convergence Example
LOGICAL-OR: Sample x1 x2 x3 Desired Output Initial: w=(0 0 0);After S2, w=w+s2=(0 1 1) Pass2: S1:w=w-s1=(0 1 0);S3:w=w+s3=(1 1 1) Pass3: S1:w=w-s1=(1 1 0)

10 Perceptron Convergence Theorem
If there exists a vector W s.t. Perceptron training will find it Assume v.x > for all +ive examples x w=x1+x2+..xk, v.w>= k |w|^2 increases by at most 1, in each iteration |w+x|^2 <= |w|^2+1…..|w|^2 <=k (# mislabel) v.w/|w| > k / <= Converges in k <= (1/ )^2 steps

11 Perceptron Learning Perceptrons learn linear decision boundaries E.g.
x1 x2 x2 + But not + x1 xor X1 X2 w1x1 + w2x2 < 0 w1x1 + w2x2 > 0 => implies w1 > 0 w1x1 + w2x2 >0 => but should be false w1x1 + w2x2 > 0 => implies w2 > 0

12 Neural Nets Multi-layer perceptrons Inputs: real-valued
Intermediate “hidden” nodes Output(s): one (or more) discrete-valued X1 X2 Y1 Y2 X3 X4 Inputs Hidden Hidden Outputs

13 Neural Nets Pro: More general than perceptrons
Not restricted to linear discriminants Multiple outputs: one classification each Con: No simple, guaranteed training procedure Use greedy, hill-climbing procedure to train “Gradient descent”, “Backpropagation”

14 Solving the XOR Problem
Network Topology: 2 hidden nodes 1 output w11 w13 x1 w21 w01 y -1 w12 w23 w03 w22 x2 -1 w02 o2 Desired behavior: x1 x2 o1 o2 y -1 Weights: w11= w12=1 w21=w22 = 1 w01=3/2; w02=1/2; w03=1/2 w13=-1; w23=1

15 Backpropagation Greedy, Hill-climbing procedure
Weights are parameters to change Original hill-climb changes one parameter/step Slow If smooth function, change all parameters/step Gradient descent Backpropagation: Computes current output, works backward to correct error

16 Producing a Smooth Function
Key problem: Pure step threshold is discontinuous Not differentiable Solution: Sigmoid (squashed ‘s’ function): Logistic fn

17 Neural Net Training Goal: Approach:
Determine how to change weights to get correct output Large change in weight to produce large reduction in error Approach: Compute actual output: o Compare to desired output: d Determine effect of each weight w on error = d-o Adjust weights

18 Neural Net Example xi : ith sample input vector w : weight vector
y3 w03 w23 z3 z2 w02 w22 w21 w12 w11 w01 z1 -1 x1 x2 w13 y1 y2 xi : ith sample input vector w : weight vector yi*: desired output for ith sample Sum of squares error over training samples z3 z1 z2 Full expression of output in terms of input and weights

19 Gradient Descent Error: Sum of squares error of inputs with current weights Compute rate of change of error wrt each weight Which weights have greatest effect on error? Effectively, partial derivatives of error wrt weights In turn, depend on other weights => chain rule

20 MIT AI lecture notes, Lozano-Perez 2000
Gradient of Error z3 z1 z2 y3 w03 w23 z3 z2 w02 w22 w21 w12 w11 w01 z1 -1 x1 x2 w13 y1 y2 Note: Derivative of sigmoid: ds(z1) = s(z1)(1-s(z1) z1 MIT AI lecture notes, Lozano-Perez 2000

21 From Effect to Update Gradient computation: To train:
How each weight contributes to performance To train: Need to determine how to CHANGE weight based on contribution to performance Need to determine how MUCH change to make per iteration Rate parameter ‘r’ Large enough to learn quickly Small enough reach but not overshoot target values

22 Backpropagation Procedure
j k Pick rate parameter ‘r’ Until performance is good enough, Do forward computation to calculate output Compute Beta in output node with Compute Beta in all other nodes with Compute change for all weights with

23 Backpropagation Observations
Procedure is (relatively) efficient All computations are local Use inputs and outputs of current node What is “good enough”? Rarely reach target (0 or 1) outputs Typically, train until within 0.1 of target

24 Neural Net Summary Training: Prediction:
Backpropagation procedure Gradient descent strategy (usual problems) Prediction: Compute outputs based on input vector & weights Pros: Very general, Fast prediction Cons: Training can be VERY slow (1000’s of epochs), Overfitting

25 Training Strategies Online training: Offline (batch training):
Update weights after each sample Offline (batch training): Compute error over all samples Then update weights Online training “noisy” Sensitive to individual instances However, may escape local minima

26 Training Strategy To avoid overfitting:
Split data into: training, validation, & test Also, avoid excess weights (less than # samples) Initialize with small random weights Small changes have noticeable effect Use offline training Until validation set minimum Evaluate on test set No more weight changes

27 Classification Neural networks best for classification task
Single output -> Binary classifier Multiple outputs -> Multiway classification Applied successfully to learning pronunciation Sigmoid pushes to binary classification Not good for regression

28 Neural Net Conclusions
Simulation based on neurons in brain Perceptrons (single neuron) Guaranteed to find linear discriminant IF one exists -> problem XOR Neural nets (Multi-layer perceptrons) Very general Backpropagation training procedure Gradient descent - local min, overfitting issues


Download ppt "Learning with Perceptrons and Neural Networks"

Similar presentations


Ads by Google