Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000.

Similar presentations

Presentation on theme: "Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000."— Presentation transcript:


2 Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000

3 Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting and Early Stopping (Example: Face Recognition)

4 ALVINN drives 70mph on highways Dean Pomerleau CMU

5 ALVINN drives 70mph on highways

6 Human Brain

7 Neurons

8 Human Learning Number of neurons:~ 10 10 Connections per neuron:~ 10 4 to 10 5 Neuron switching time:~ 0.001 second Scene recognition time:~ 0.1 second 100 inference steps doesn’t seem much

9 The “Bible” (1986)

10 Perceptron w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t o x2x2 xnxn x1x1... i n p u t x 1 if net > 0 0 otherwise {

11 Inverter input x1 output 01 10 x1x1 w 1 =  1 1 w 0 = 

12 Boolean OR input x1 input x2 ouput 000 011 101 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 =  0.5 1

13 Boolean AND input x1 input x2 ouput 000 010 100 111 x2x2 x1x1 w 2 =1w 1 =1 w 0 =  1.5 1

14 Boolean XOR input x1 input x2 ouput 000 011 101 110 x2x2 x1x1 Eeek!

15 Linear Separability x1x1 x2x2   OR

16 Linear Separability x1x1 x2x2   AND

17 Linear Separability x1x1 x2x2   XOR

18 Boolean XOR input x1 input x2 ouput 000 011 101 110 h1h1 x1x1 o x1x1 h1h1 1  1.5 AND 1 1  0.5 OR 1 1  0.5 XOR 11

19 Perceptron Training Rule step size perceptron output input target increment new weightincrementold weight

20 Converges, if… … training data linearly separable … step size  sufficiently small … no “hidden” units

21 How To Train Multi-Layer Perceptrons? Gradient descent h1h1 x1x1 o x1x1 h1h1

22 Sigmoid Squashing Function w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t x2x2 xnxn x1x1... i n p u t

23 Sigmoid Squashing Function x  (x)

24 Gradient Descent Learn w i ’s that minimize squared error D = training data

25 Gradient Descent Gradient: Training rule:

26 Gradient Descent (single layer)

27 Batch Learning Initialize each w i to small random value Repeat until termination:  w i = 0 For each training example d do o d   (  i w i x i,d )  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

28 Incremental (Online) Learning Initialize each w i to small random value Repeat until termination: For each training example d do  w i = 0 o d   i w i x i,d  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

29 Backpropagation Algorithm Generalization to multiple layers and multiple output units

30 Backpropagation Algorithm Initialize all weights to small random numbers For each training example do –For each hidden unit h: –For each output unit k: –For each hidden unit h: –Update each network weight w ij : with

31 Backpropagation Algorithm “activations” “errors”

32 Can This Be Learned? InputOutput 10000000  01000000  00100000  00010000  00001000  00000100  00000010  00000001 

33 Learned Hidden Layer Representation InputOutput 10000000 .89.04.08  10000000 01000000 .01.11.88  01000000 00100000 .01.97.27  00100000 00010000 .99.97.71  00010000 00001000 .03.05.02  00001000 00000100 .22.99.99  00000100 00000010 .80.01.98  00000010 00000001 .60.94.01  00000001

34 Training: Internal Representation

35 Training: Error

36 Training: Weights

37 ANNs in Speech Recognition [Haung/Lippman 1988]

38 Speeding It Up: Momentum error E weight w ij w ij w ij new Gradient descent GD with Momentum

39 Convergence May get stuck in local minima Weights may diverge …but works well in practice

40 Overfitting in ANNs

41 Early Stopping (Important!!!) Stop training when error goes up on validation set

42 Sigmoid Squashing Function x  (x) Linear range, # of hidden units doesn’t really matter

43 left strt right up Typical input images Head pose (1-of-4): 90% accuracy Face recognition (1-of-20): 90% accuracy ANNs for Face Recognition

44 left strt right up

45 Recurrent Networks

Download ppt "Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000."

Similar presentations

Ads by Google