Neural Networks Lecture 8: Two simple learning algorithms

Slides:



Advertisements
Similar presentations
Perceptron Lecture 4.
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Introduction to Neural Networks Computing
Perceptron Learning Rule
NEURAL NETWORKS Perceptron
Longin Jan Latecki Temple University
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning Week 2 Lecture 1.
All lecture slides will be available as .ppt, .ps, & .htm at
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
CS Perceptrons1. 2 Basic Neuron CS Perceptrons3 Expanded Neuron.
CSC321: Neural Networks Lecture 3: Perceptrons
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Multi-Layer Perceptrons Michael J. Watts
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Classification: Feature Vectors
CSC321: Neural Networks Lecture 2: Learning with linear neurons Geoffrey Hinton.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear Models for Classification
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
ADALINE (ADAptive LInear NEuron) Network and
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
A Simple Artificial Neuron
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Roberto Battiti, Mauro Brunato
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Support Vector Machines
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Neural Networks Lecture 8: Two simple learning algorithms Geoffrey Hinton

Supervised Learning Each training case consists of an input vector x and a desired output y (there may be multiple desired outputs but we will ignore that for now) Regression: Desired output is a real number Classification: Desired output is a class label (1 or 0 is the simplest case). We start by choosing a model-class A model-class is a way of using some numerical parameters, W, to map each input vector, x, into a predicted output y Learning usually means adjusting the parameters to reduce the discrepancy between the desired output on each training case and the actual output produced by the model. >

Linear neurons The neuron has a real-valued output which is a weighted sum of its inputs The aim of learning is to minimize the discrepancy between the desired output and the actual output How de we measure the discrepancies? Do we update the weights after every training case? Why don’t we solve it analytically? weight vector input vector Neuron’s estimate of the desired output

A motivating example Each day you get lunch at the cafeteria. Your diet consists of fish, chips, and beer. You get several portions of each The cashier only tells you the total price of the meal After several days, you should be able to figure out the price of each portion. Each meal price gives a linear constraint on the prices of the portions:

Two ways to solve the equations The obvious approach is just to solve a set of simultaneous linear equations, one per meal. But we want a method that could be implemented in a neural network. The prices of the portions are like the weights in of a linear neuron. We will start with guesses for the weights and then adjust the guesses to give a better fit to the prices given by the cashier.

The cashier’s brain Price of meal = 850 150 50 100 2 5 3 Linear neuron 150 50 100 2 5 3 portions of fish portions of chips portions of beer

A model of the cashier’s brain with arbitrary initial weights Price of meal = 500 Residual error = 350 The learning rule is: With a learning rate of 1/35, the weight changes are +20, +50, +30 This gives new weights of 70, 100, 80 Notice that the weight for chips got worse! 50 50 50 2 5 3 portions of fish portions of chips portions of beer

Behaviour of the iterative learning procedure Do the updates to the weights always make them get closer to their correct values? No! Does the online version of the learning procedure eventually get the right answer? Yes, if the learning rate gradually decreases in the appropriate way. How quickly do the weights converge to their correct values? It can be very slow if two input dimensions are highly correlated (e.g. ketchup and chips). Can the iterative procedure be generalized to much more complicated, multi-layer, non-linear nets? YES!

Deriving the delta rule Define the error as the squared residuals summed over all training cases: Now differentiate to get error derivatives for weights The batch delta rule changes the weights in proportion to their error derivatives summed over all training cases

The error surface The error surface lies in a space with a horizontal axis for each weight and one vertical axis for the error. For a linear neuron, it is a quadratic bowl. Vertical cross-sections are parabolas. Horizontal cross-sections are ellipses. w1 E w2

Online versus batch learning Batch learning does steepest descent on the error surface Online learning zig-zags around the direction of steepest descent constraint from training case 1 w1 w1 constraint from training case 2 w2 w2

Adding biases A linear neuron is a more flexible model if we include a bias. We can avoid having to figure out a separate learning rule for the bias by using a trick: A bias is exactly equivalent to a weight on an extra input line that always has an activity of 1.

Binary threshold neurons McCulloch-Pitts (1943) First compute a weighted sum of the inputs from other neurons Then output a 1 if the weighted sum exceeds the threshold. 1 1 if y 0 otherwise z threshold

The perceptron convergence procedure: Training binary output neurons as classifiers Add an extra component with value 1 to each input vector. The “bias” weight on this component is minus the threshold. Now we can forget the threshold. Pick training cases using any policy that ensures that every training case will keep getting picked If the output unit is correct, leave its weights alone. If the output unit incorrectly outputs a zero, add the input vector to the weight vector. If the output unit incorrectly outputs a 1, subtract the input vector from the weight vector. This is guaranteed to find a suitable set of weights if any such set exists.

Weight space an input vector with correct answer=0 Imagine a space in which each axis corresponds to a weight. A point in this space is a weight vector. Each training case defines a plane. On one side of the plane the output is wrong. To get all training cases right we need to find a point on the right side of all the planes. wrong right bad weights good weights right wrong an input vector with correct answer=1 o the origin

Why the learning procedure works Consider the squared distance between any satisfactory weight vector and the current weight vector. Every time the perceptron makes a mistake, the learning algorithm moves the current weight vector towards all satisfactory weight vectors (unless it crosses the constraint plane). So consider “generously satisfactory” weight vectors that lie within the feasible region by a margin at least as great as the largest update. Every time the perceptron makes a mistake, the squared distance to all of these weight vectors is always decreased by at least the squared length of the smallest update vector. margin right wrong

What binary threshold neurons cannot do A binary threshold output unit cannot even tell if two single bit numbers are the same! Same: (1,1)  1; (0,0)  1 Different: (1,0)  0; (0,1)  0 The four input-output pairs give four inequalities that are impossible to satisfy: Data Space (not weight space) 0,1 1,1 weight plane output =1 output =0 0,0 1,0 The positive and negative cases cannot be separated by a plane