Neural Networks and Backpropagation Sebastian Thrun 15-781, Fall 2000.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Beyond Linear Separability
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
NEURAL NETWORKS Backpropagation Algorithm
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Artificial Neural Networks
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Perceptron.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Artificial Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Connectionist Models: Backprop Jerome Feldman CS182/CogSci110/Ling109 Spring 2008.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Artificial Neural Networks
20.5 Nerual Networks Thanks: Professors Frank Hoffmann and Jiawei Han, and Russell and Norvig.
Rutgers CS440, Fall 2003 Neural networks Reading: Ch. 20, Sec. 5, AIMA 2 nd Ed.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Biological neuron artificial neuron.
Machine Learning Neural Networks.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
Foundations of Learning and Adaptive Systems ICS320
CS 484 – Artificial Intelligence
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Wednesday, 08 February 2007.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
Appendix B: An Example of Back-propagation algorithm
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
Classification / Regression Neural Networks 2
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
SUPERVISED LEARNING NETWORK
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Artificial Neural Network
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
CSE 473 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Artificial Neural Networks
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining with Neural Networks (HK: Chapter 7.5)
Classification Neural Networks 1
CSC 578 Neural Networks and Deep Learning
Seminar on Machine Learning Rada Mihalcea
Presentation transcript:

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000

Outline Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting and Early Stopping (Example: Face Recognition)

ALVINN drives 70mph on highways Dean Pomerleau CMU

ALVINN drives 70mph on highways

Human Brain

Neurons

Human Learning Number of neurons:~ Connections per neuron:~ 10 4 to 10 5 Neuron switching time:~ second Scene recognition time:~ 0.1 second 100 inference steps doesn’t seem much

The “Bible” (1986)

Perceptron w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t o x2x2 xnxn x1x1... i n p u t x 1 if net > 0 0 otherwise {

Inverter input x1 output x1x1 w 1 =  1 1 w 0 = 

Boolean OR input x1 input x2 ouput x2x2 x1x1 w 2 =1w 1 =1 w 0 =  0.5 1

Boolean AND input x1 input x2 ouput x2x2 x1x1 w 2 =1w 1 =1 w 0 =  1.5 1

Boolean XOR input x1 input x2 ouput x2x2 x1x1 Eeek!

Linear Separability x1x1 x2x2   OR

Linear Separability x1x1 x2x2   AND

Linear Separability x1x1 x2x2   XOR

Boolean XOR input x1 input x2 ouput h1h1 x1x1 o x1x1 h1h1 1  1.5 AND 1 1  0.5 OR 1 1  0.5 XOR 11

Perceptron Training Rule step size perceptron output input target increment new weightincrementold weight

Converges, if… … training data linearly separable … step size  sufficiently small … no “hidden” units

How To Train Multi-Layer Perceptrons? Gradient descent h1h1 x1x1 o x1x1 h1h1

Sigmoid Squashing Function w2w2 wnwn w1w1 w0w0 x 0 =1 o u t p u t x2x2 xnxn x1x1... i n p u t

Sigmoid Squashing Function x  (x)

Gradient Descent Learn w i ’s that minimize squared error D = training data

Gradient Descent Gradient: Training rule:

Gradient Descent (single layer)

Batch Learning Initialize each w i to small random value Repeat until termination:  w i = 0 For each training example d do o d   (  i w i x i,d )  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

Incremental (Online) Learning Initialize each w i to small random value Repeat until termination: For each training example d do  w i = 0 o d   i w i x i,d  w i   w i +  (t d  o d ) o d (1-o d ) x i,d w i  w i +  w i

Backpropagation Algorithm Generalization to multiple layers and multiple output units

Backpropagation Algorithm Initialize all weights to small random numbers For each training example do –For each hidden unit h: –For each output unit k: –For each hidden unit h: –Update each network weight w ij : with

Backpropagation Algorithm “activations” “errors”

Can This Be Learned? InputOutput        

Learned Hidden Layer Representation InputOutput                

Training: Internal Representation

Training: Error

Training: Weights

ANNs in Speech Recognition [Haung/Lippman 1988]

Speeding It Up: Momentum error E weight w ij w ij w ij new Gradient descent GD with Momentum

Convergence May get stuck in local minima Weights may diverge …but works well in practice

Overfitting in ANNs

Early Stopping (Important!!!) Stop training when error goes up on validation set

Sigmoid Squashing Function x  (x) Linear range, # of hidden units doesn’t really matter

left strt right up Typical input images Head pose (1-of-4): 90% accuracy Face recognition (1-of-20): 90% accuracy ANNs for Face Recognition

left strt right up

Recurrent Networks