Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Slides:



Advertisements
Similar presentations
Beyond Linear Separability
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Kostas Kontogiannis E&CE
Artificial Neural Networks
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Artificial Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
Learning: Nearest Neighbor, Perceptrons & Neural Nets
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Computer Science and Engineering
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Retrieval by Authority Artificial Intelligence CMSC February 1, 2007.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Artificial Neural Network
EEE502 Pattern Recognition
Neural Networks 2nd Edition Simon Haykin
Chapter 6 Neural Network.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
CS 388: Natural Language Processing: Neural Networks
Neural Networks.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Artificial neural networks:
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Data Mining with Neural Networks (HK: Chapter 7.5)
Perceptron as one Type of Linear Discriminants
Neural Network - 2 Mayank Vatsa
Lecture Notes for Chapter 4 Artificial Neural Networks
Backpropagation.
Hubs and Authorities & Learning: Perceptrons
Backpropagation.
David Kauchak CS158 – Spring 2019
Learning: Perceptrons & Neural Networks
Presentation transcript:

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002

Agenda Neural Networks: –Biological analogy Review: single-layer perceptrons Perceptron: Pros & Cons Neural Networks: Multilayer perceptrons Neural net training: Backpropagation Strengths & Limitations Conclusions

Neurons: The Concept Axon Cell Body Nucleus Dendrites Neurons: Receive inputs from other neurons (via synapses) When input exceeds threshold, “fires” Sends output along axon to other neurons Brain: 10^11 neurons, 10^16 synapses

Perceptron Structure x 0 =-1x1x1 x3x3 x2x2 xnxn w1w1 w0w0... w2w2 w3w3 wnwn y x 0 w 0 compensates for threshold Single neuron-like element -Binary inputs &output -Weighted sum of inputs > threshold Until perceptron correct output for all If the perceptron is correct, do nothing If the percepton is wrong, If it incorrectly says “yes”, Subtract input vector from weight vector Otherwise, add input vector to it

Perceptron Learning Perceptrons learn linear decision boundaries E.g. Guaranteed to converge, if linearly separable Many simple functions NOT learnable x1x1 x2x2 But not x2x2 x1x xor

Neural Nets Multi-layer perceptrons –Inputs: real-valued –Intermediate “hidden” nodes –Output(s): one (or more) discrete-valued X1 X2 X3 X4 InputsHidden Outputs Y1 Y2

Neural Nets Pro: More general than perceptrons –Not restricted to linear discriminants –Multiple outputs: one classification each Con: No simple, guaranteed training procedure –Use greedy, hill-climbing procedure to train –“Gradient descent”, “Backpropagation”

Solving the XOR Problem x1x1 w 13 w 11 w 21 o2o2 o1o1 w 12 y w 03 w 22 x2x2 w 23 w 02 w 01 Network Topology: 2 hidden nodes 1 output Desired behavior: x1 x2 o1 o2 y Weights: w11= w12=1 w21=w22 = 1 w01=3/2; w02=1/2; w03=1/2 w13=-1; w23=1

Backpropagation Greedy, Hill-climbing procedure –Weights are parameters to change –Original hill-climb changes one parameter/step Slow –If smooth function, change all parameters/step Gradient descent –Backpropagation: Computes current output, works backward to correct error

Producing a Smooth Function Key problem: –Pure step threshold is discontinuous Not differentiable Solution: –Sigmoid (squashed ‘s’ function): Logistic fn

Neural Net Training Goal: –Determine how to change weights to get correct output Large change in weight to produce large reduction in error Approach: Compute actual output: o Compare to desired output: d Determine effect of each weight w on error = d-o Adjust weights

Neural Net Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 xi : ith sample input vector w : weight vector yi*: desired output for ith sample Sum of squares error over training samples z3z3 z1z1 z2z2 Full expression of output in terms of input and weights - From notes lozano-perez

Gradient Descent Error: Sum of squares error of inputs with current weights Compute rate of change of error wrt each weight –Which weights have greatest effect on error? –Effectively, partial derivatives of error wrt weights In turn, depend on other weights => chain rule

Gradient Descent E = G(w) –Error as function of weights Find rate of change of error –Follow steepest rate of change –Change weights s.t. error is minimized E w G(w) dG dw Local minima w0w1

MIT AI lecture notes, Lozano-Perez 2000 Gradient of Error z3z3 z1z1 z2z2 y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w11w11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Note: Derivative of sigmoid: ds(z1) = s(z1)(1-s(z1)) dz1 - From notes lozano-perez

From Effect to Update Gradient computation: –How each weight contributes to performance To train: –Need to determine how to CHANGE weight based on contribution to performance –Need to determine how MUCH change to make per iteration Rate parameter ‘r’ –Large enough to learn quickly –Small enough reach but not overshoot target values

Backpropagation Procedure Pick rate parameter ‘r’ Until performance is good enough, –Do forward computation to calculate output –Compute Beta in output node with –Compute Beta in all other nodes with –Compute change for all weights with i j k

Backprop Example y3y3 w 03 w 23 z3z3 z2z2 w 02 w 22 w 21 w 12 w 11 w 01 z1z1 x1x1 x2x2 w 13 y1y1 y2y2 Forward prop: Compute z i and y i given x k, w l From notes lozano-perez

Backpropagation Observations Procedure is (relatively) efficient –All computations are local Use inputs and outputs of current node What is “good enough”? –Rarely reach target (0 or 1) outputs Typically, train until within 0.1 of target

Neural Net Summary Training: –Backpropagation procedure Gradient descent strategy (usual problems) Prediction: –Compute outputs based on input vector & weights Pros: Very general, Fast prediction Cons: Training can be VERY slow (1000’s of epochs), Overfitting

Training Strategies Online training: –Update weights after each sample Offline (batch training): –Compute error over all samples Then update weights Online training “noisy” –Sensitive to individual instances –However, may escape local minima

Training Strategy To avoid overfitting: –Split data into: training, validation, & test Also, avoid excess weights (less than # samples) Initialize with small random weights –Small changes have noticeable effect Use offline training –Until validation set minimum Evaluate on test set –No more weight changes

Classification Neural networks best for classification task –Single output -> Binary classifier –Multiple outputs -> Multiway classification Applied successfully to learning pronunciation –Sigmoid pushes to binary classification Not good for regression

Neural Net Conclusions Simulation based on neurons in brain Perceptrons (single neuron) –Guaranteed to find linear discriminant IF one exists -> problem XOR Neural nets (Multi-layer perceptrons) –Very general –Backpropagation training procedure Gradient descent - local min, overfitting issues

Backpropagation An efficient method of implementing gradient descent for neural networks 1.Initialize weights to small random values 2.Choose a random sample input feature vector 3.Compute total input ( ) and output ( ) for each unit (forward prop) 4.Compute for output layer 5.Compute for preceding layer by backprop rule (repeat for all layers) 6.Compute weight change by descent rule (repeat for all weights) Descent rule Backprop rule Notation in Winston’s book y i is x i for input layer