Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
For Wednesday Read chapter 19, sections 1-3 No homework.
Artificial Neural Networks
Tuomas Sandholm Carnegie Mellon University Computer Science Department
CS 4700: Foundations of Artificial Intelligence
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Classification Neural Networks 1
Perceptron.
Lecture 14 – Neural Networks
Neural Networks Marco Loog.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Biological neuron artificial neuron.
Machine Learning Neural Networks.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks Expressiveness.
Back-Propagation Algorithm
Neural Networks Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Artificial Neural Networks
1 Introduction to Artificial Neural Networks Andrew L. Nelson Visiting Research Faculty University of South Florida.
Artificial Neural Networks
Artificial Neural Networks
Artificial Neural Networks An Overview and Analysis.
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
Appendix B: An Example of Back-propagation algorithm
Classification / Regression Neural Networks 2
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Linear Classification with Perceptrons
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
CS-424 Gregory Dudek Today’s Lecture Neural networks –Training Backpropagation of error (backprop) –Example –Radial basis functions.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
For Friday No reading Take home exam due Exam 2. For Monday Read chapter 22, sections 1-3 FOIL exercise due.
Chapter 2 Single Layer Feedforward Networks
Artificial Neural Network
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.
Computational Intelligence Semester 2 Neural Networks Lecture 2 out of 4.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Learning with Perceptrons and Neural Networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Linear separability Hyperplane In 2D: Feature 1 Feature 2 A perceptron can separate data that is linearly separable.
Neural Networks CS 446 Machine Learning.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Classification / Regression Neural Networks 2
Machine Learning Today: Reading: Maria Florina Balcan
Classification Neural Networks 1
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks Chapter 5
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
David Kauchak CS158 – Spring 2019
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Backpropagation Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001

Administration Questions, concerns?

Classification Percept. x1x1x1x1 net x2x2x2x2 x3x3x3x3 xDxDxDxD … sum w1w1w1w1 1 w2w2w2w2 w3w3w3w3 wDwDwDwD w0w0w0w0 out squash g

Perceptrons Recall that the squashing function makes the output look more like bits: 0 or 1 decisions. What if we give it inputs that are also bits?

A Boolean Function A B C D E F Gout

Think Graphically Can perceptron learn this? C D11 1 0

Ands and Ors out(x) = g(sum k w k x k ) How can we set the weights to represent (v 1 )(v 2 )(~v 7 ) ?AND w i =0, except w 1 =10, w 2 =10, w 7 =-10, w 0 =-15 (5-max) How about ~v 3 + v 4 + ~v 8 ?OR w i =0, except w 1 =-10, w 2 =10, w 7 =-10, w 0 =15 (-5-min)

Majority Are at least half the bits on? Set all weights to 1, w 0 to –n/2. A B C D E F Gout Representation size using decision tree?

Sweet Sixteen? ab(~a)+(~b) ab(~a)+(~b) a(~b)(~a)+b a(~b)(~a)+b (~a)ba+(~b) (~a)ba+(~b) (~a)(~b)a+b (~a)(~b)a+b a~a a~a b~b b~b a = ba exclusive-or b (a  b) a = ba exclusive-or b (a  b)

XOR Constraints A B out g( w 0 ) < 1/ g(w B + w 0 ) > 1/ g(w A + w 0 ) > 1/ g(w A +w B + w 0 ) < 1/2 w 0 0, w B +w 0 >0, w 0 0, w B +w 0 >0, w A +w B +2 w 0 >0, 0 0, 0 < w A +w B +w 0 < 0

Linearly Separable XOR problematic C D ?

How Represent XOR? A xor B = (A+B)(~A+~B) = (A+B)(~A+~B) net AB c1c1c1c1 net 1 c2c2c2c2 net out

Requiem for a Perceptron Rosenblatt proved that a perceptron will learn any linearly separable function. Minsky and Papert (1969) in Perceptrons: “there is no reason to suppose that any of the virtues carry over to the many- layered version.”

Backpropagation Bryson and Ho (1969, same year) described a training procedure for multilayer networks. Went unnoticed. Multiply rediscovered in the 1980s.

Multilayer Net x1x1x1x1 net 1 i x2x2x2x2 x3x3x3x3 xDxDxDxD … W 11 1 W 12 W 13 hid 1 g net H i net 2 i hid hid 2 U1U1U1U1 net i out … … U0U0U0U0

Multiple Outputs Makes no difference for the perceptron. Add more outputs off the hidden layer in the multilayer case.

Output Function out i (x) = g(sum j U ji g(sum k W kj x k )) H: number of “hidden” nodes Also: Use more than one hidden layerUse more than one hidden layer Use direct input-output weightsUse direct input-output weights

How Train? Find a set of weights U, W that minimize sum (x,y) sum i (y i -out i (x)) 2 sum (x,y) sum i (y i -out i (x)) 2 using gradient descent. Incremental version (vs. batch): Move weights a small amount for each training example Move weights a small amount for each training example

Updating Weights 1.Feed-forward to hidden: net j = sum k W kj x k ; hid j = g(net j ) 2.Feed-forward to output: net i = sum j U ji hid j ; out i = g(net i ) 3. Update output weights:  i = g’(net i ) (y i -out i ); U ji +=  hid j  i 4. Update hidden weights:  j = g’(net j ) sum i U jj  i ; W kj +=  x k  j

Multilayer Net (schema) W kj xkxkxkxk net j hid j net i out i U ji jjjj iiii yiyiyiyi

Does it Work? Sort of: Lots of practical applications, lots of people play with it. Fun. However, can fall prey to the standard problems with local search… NP-hard to train a 3-node net.

Step Size Issues Too small? Too big?

Representation Issues Any continuous function can be represented by a one hidden layer net with sufficient hidden nodes. Any function at all can be represented by a two hidden layer net with a sufficient number of hidden nodes. What’s the downside for learning?

Generalization Issues Pruning weights: “optimal brain damage” Cross validation Much, much more to this. Take a class on machine learning.

What to Learn Representing logical functions using sigmoid units Majority (net vs. decision tree) XOR is not linearly separable Adding layers adds expressibility Backprop is gradient descent

Homework 10 (due 12/12) 1.Describe a procedure for converting a Boolean formula in CNF (n variables, m clauses) into an equivalent network? How many hidden units does it have? 2.More soon