Download presentation
Presentation is loading. Please wait.
Published byAusten Elijah Morgan Modified over 9 years ago
2
Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000
3
Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting and Early Stopping (Example: Face Recognition)
4
ALVINN drives 70mph on highways Dean Pomerleau CMU
5
ALVINN drives 70mph on highways
6
Human Brain
7
Neurons
8
Human Learning Number of neurons:~ 10 10 Connections per neuron:~ 10 4 to 10 5 Neuron switching time:~ 0.001 second Scene recognition time:~ 0.1 second 100 inference steps doesn’t seem much
9
The “Bible” (1986)
10
Perceptron w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t o x2x2 xnxn x1x1... i n p u t x 1 if net > 0 0 otherwise {
11
Inverter input x1 output 01 10 x1x1 w 1 = 1 1 w 0 =
12
Boolean OR input x1 input x2 ouput 000 011 101 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 = 0.5 1
13
Boolean AND input x1 input x2 ouput 000 010 100 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 = 1.5 1
14
Boolean XOR input x1 input x2 ouput 000 011 101 110 x2x2 x1x1 Eeek!
15
Linear Separability x1x1 x2x2 OR
16
Linear Separability x1x1 x2x2 AND
17
Linear Separability x1x1 x2x2 XOR
18
Boolean XOR input x1 input x2 ouput 000 011 101 110 h1h1 x1x1 o x1x1 h1h1 1 1.5 AND 1 1 0.5 OR 1 1 0.5 XOR 11
19
Perceptron Training Rule step size perceptron output input target increment new weightincrementold weight
20
Converges, if… … training data linearly separable … step size sufficiently small … no “hidden” units
21
How To Train Multi-Layer Perceptrons? Gradient descent h1h1 x1x1 o x1x1 h1h1
22
Sigmoid Squashing Function w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t x2x2 xnxn x1x1... i n p u t
23
Sigmoid Squashing Function x (x)
24
Gradient Descent Learn w i ’s that minimize squared error D = training data
25
Gradient Descent Gradient: Training rule:
26
Gradient Descent (single layer)
27
Batch Learning Initialize each w i to small random value Repeat until termination: w i = 0 For each training example d do o d ( i w i x i,d ) w i w i + (t d o d ) o d (1-o d ) x i,d w i w i + w i
28
Incremental (Online) Learning Initialize each w i to small random value Repeat until termination: For each training example d do w i = 0 o d i w i x i,d w i w i + (t d o d ) o d (1-o d ) x i,d w i w i + w i
29
Backpropagation Algorithm Generalization to multiple layers and multiple output units
30
Backpropagation Algorithm Initialize all weights to small random numbers For each training example do –For each hidden unit h: –For each output unit k: –For each hidden unit h: –Update each network weight w ij : with
31
Backpropagation Algorithm “activations” “errors”
32
Can This Be Learned? InputOutput 10000000 01000000 00100000 00010000 00001000 00000100 00000010 00000001
33
Learned Hidden Layer Representation InputOutput 10000000 .89.04.08 10000000 01000000 .01.11.88 01000000 00100000 .01.97.27 00100000 00010000 .99.97.71 00010000 00001000 .03.05.02 00001000 00000100 .22.99.99 00000100 00000010 .80.01.98 00000010 00000001 .60.94.01 00000001
34
Training: Internal Representation
35
Training: Error
36
Training: Weights
37
ANNs in Speech Recognition [Haung/Lippman 1988]
38
Speeding It Up: Momentum error E weight w ij w ij w ij new Gradient descent GD with Momentum
39
Convergence May get stuck in local minima Weights may diverge …but works well in practice
40
Overfitting in ANNs
41
Early Stopping (Important!!!) Stop training when error goes up on validation set
42
Sigmoid Squashing Function x (x) Linear range, # of hidden units doesn’t really matter
43
left strt right up Typical input images Head pose (1-of-4): 90% accuracy Face recognition (1-of-20): 90% accuracy ANNs for Face Recognition
44
left strt right up
45
Recurrent Networks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.