Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000.

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000

Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting and Early Stopping (Example: Face Recognition)

ALVINN drives 70mph on highways Dean Pomerleau CMU

ALVINN drives 70mph on highways

Human Brain

Neurons

Human Learning Number of neurons:~ 10 10 Connections per neuron:~ 10 4 to 10 5 Neuron switching time:~ 0.001 second Scene recognition time:~ 0.1 second 100 inference steps doesn’t seem much

The “Bible” (1986)

Perceptron w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t o x2x2 xnxn x1x1... i n p u t x 1 if net > 0 0 otherwise {

Inverter input x1 output 01 10 x1x1 w 1 =  1 1 w 0 = 

Boolean OR input x1 input x2 ouput 000 011 101 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 =  0.5 1

Boolean AND input x1 input x2 ouput 000 010 100 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 =  1.5 1

Boolean XOR input x1 input x2 ouput 000 011 101 110 x2x2 x1x1 Eeek!

Linear Separability x1x1 x2x2   OR

Linear Separability x1x1 x2x2   AND

Linear Separability x1x1 x2x2   XOR

Boolean XOR input x1 input x2 ouput 000 011 101 110 h1h1 x1x1 o x1x1 h1h1 1  1.5 AND 1 1  0.5 OR 1 1  0.5 XOR 11

Perceptron Training Rule step size perceptron output input target increment new weightincrementold weight

Converges, if… … training data linearly separable … step size  sufficiently small … no “hidden” units

How To Train Multi-Layer Perceptrons? Gradient descent h1h1 x1x1 o x1x1 h1h1

Sigmoid Squashing Function w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t x2x2 xnxn x1x1... i n p u t

Sigmoid Squashing Function x  (x)

Gradient Descent Learn w i ’s that minimize squared error D = training data

Gradient Descent Gradient: Training rule:

Gradient Descent (single layer)

Batch Learning Initialize each w i to small random value Repeat until termination:  w i = 0 For each training example d do o d   (  i w i x i,d )  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

Incremental (Online) Learning Initialize each w i to small random value Repeat until termination: For each training example d do  w i = 0 o d   i w i x i,d  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

Backpropagation Algorithm Generalization to multiple layers and multiple output units

Backpropagation Algorithm Initialize all weights to small random numbers For each training example do –For each hidden unit h: –For each output unit k: –For each hidden unit h: –Update each network weight w ij : with

Backpropagation Algorithm “activations” “errors”

Can This Be Learned? InputOutput 10000000  01000000  00100000  00010000  00001000  00000100  00000010  00000001 

Learned Hidden Layer Representation InputOutput 10000000 .89.04.08  10000000 01000000 .01.11.88  01000000 00100000 .01.97.27  00100000 00010000 .99.97.71  00010000 00001000 .03.05.02  00001000 00000100 .22.99.99  00000100 00000010 .80.01.98  00000010 00000001 .60.94.01  00000001

Training: Internal Representation

Training: Error

Training: Weights

ANNs in Speech Recognition [Haung/Lippman 1988]

Speeding It Up: Momentum error E weight w ij w ij w ij new Gradient descent GD with Momentum

Convergence May get stuck in local minima Weights may diverge …but works well in practice

Overfitting in ANNs

Early Stopping (Important!!!) Stop training when error goes up on validation set

Sigmoid Squashing Function x  (x) Linear range, # of hidden units doesn’t really matter

left strt right up Typical input images Head pose (1-of-4): 90% accuracy Face recognition (1-of-20): 90% accuracy ANNs for Face Recognition

left strt right up

Recurrent Networks

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000.

Similar presentations

Presentation on theme: "Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000.

Similar presentations

Presentation on theme: "Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000."— Presentation transcript:

Similar presentations

About project

Feedback