September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Aula 3 Single Layer Percetron
Perceptron Lecture 4.
Beyond Linear Separability
Introduction to Neural Networks Computing
NEURAL NETWORKS Perceptron
Artificial Neural Networks
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Artificial Neural Networks
Classification Neural Networks 1
Perceptron.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Overview over different methods – Supervised Learning
Artificial Neural Networks
Simple Neural Nets For Pattern Classification
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Before we start ADALINE
Data Mining with Neural Networks (HK: Chapter 7.5)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Machine Learning Chapter 4. Artificial Neural Networks
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
Multi-Layer Perceptron
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
ADALINE (ADAptive LInear NEuron) Network and
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CS621 : Artificial Intelligence
Chapter 2 Single Layer Feedforward Networks
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
SUPERVISED LEARNING NETWORK
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks for Machine Learning Lecture 3a Learning the weights of a linear neuron Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
April 5, 2016Introduction to Artificial Intelligence Lecture 17: Neural Network Paradigms II 1 Capabilities of Threshold Neurons By choosing appropriate.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Supervised Learning in ANNs
Neural Networks Winter-Spring 2014
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
第 3 章 神经网络.
Ranga Rodrigo February 8, 2014
A Simple Artificial Neuron
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Classification Neural Networks 1
CSC 578 Neural Networks and Deep Learning
Capabilities of Threshold Neurons
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly chosen weight vector w 0 ; Let k = 1; while there exist input vectors that are misclassified by w k-1, do Let i j be a misclassified input vector; Let x k = class(i j )  i j, implying that w k-1  x k < 0; Update the weight vector to w k = w k-1 +  x k ; Increment k; end-while;

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 2 Another Refresher: Linear Algebra How can we visualize a straight line defined by an equation such as w 0 + w 1 i 1 + w 2 i 2 = 0? One possibility is to determine the points where the line crosses the coordinate axes: i 1 = 0  w 0 + w 2 i 2 = 0  w 2 i 2 = -w 0  i 2 = -w 0 /w 2 i 2 = 0  w 0 + w 1 i 1 = 0  w 1 i 1 = -w 0  i 1 = -w 0 /w 1 Thus, the line crosses at (0, -w 0 /w 2 ) T and (-w 0 /w 1, 0) T. If w 1 or w 2 is 0, it just means that the line is horizontal or vertical, respectively. If w 0 is 0, the line hits the origin, and its slope i 2 /i i is: w 1 i 1 + w 2 i 2 = 0  w 2 i 2 = -w 1 i 1  i 2 /i 1 = -w 1 /w 2

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 3 Perceptron Learning Example i1i i2i We would like our perceptron to correctly classify the five 2-dimensional data points below. Let the random initial weight vector w 0 = (2, 1, -2) T. Then the dividing line crosses at (0, 1) T and (-2, 0) T. Then the dividing line crosses at (0, 1) T and (-2, 0) T. class -1 class 1 Let us pick the misclassified point (-2, -1) T for learning: i = (1, -2, -1) T (include offset 1) x 1 = (-1)  (1, -2, -1) T (i is in class -1) x 1 = (-1, 2, 1) T

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 4 Perceptron Learning Example i1i i2i w 1 = w 0 + x 1 (let us set  = 1 for simplicity) w 1 = (2, 1, -2) T + (-1, 2, 1) T = (1, 3, -1) T The new dividing line crosses at (0, 1) T and (-1/3, 0) T. Let us pick the next misclassified point (0, 2) T for learning: i = (1, 0, 2) T (include offset 1) x 2 = (1, 0, 2) T (i is in class 1) class -1 class 1

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 5 Perceptron Learning Example i1i i2i w 2 = w 1 + x 2 w 2 = (1, 3, -1) T + (1, 0, 2) T = (2, 3, 1) T Now the line crosses at (0, -2) T and (-2/3, 0) T. With this weight vector, the perceptron achieves perfect classification! The learning process terminates. In most cases, many more iterations are necessary than in this example. class -1 class 1

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 6 Perceptron Learning Results We proved that the perceptron learning algorithm is guaranteed to find a solution to a classification problem if it is linearly separable. But are those solutions optimal? One of the reasons why we are interested in neural networks is that they are able to generalize, i.e., give plausible output for new (untrained) inputs. How well does a perceptron deal with new inputs?

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 7 Perceptron Learning Results Perfect classification of training samples, but may not generalize well to new (untrained) samples.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 8 Perceptron Learning Results This function is likely to perform better classification on new samples.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 9Adalines Idea behind adaptive linear elements (Adalines): Compute a continuous, differentiable error function between net input and desired output (before applying threshold function). For example, compute the mean squared error (MSE) between every training vector and its class (1 or -1). Then find those weights for which the error is minimal. With a differential error function, we can use the gradient descent technique to find this absolute minimum in the error function.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 10 Gradient Descent Gradient descent is a very common technique to find the absolute minimum of a function. It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the network’s (or neuron’s) error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 11 Gradient Descent Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): f(x)x x0x0x0x0 slope: f’(x 0 ) x 1 = x 0 -  f’(x 0 ) Repeat this iteratively until for some x i, f’(x i ) is sufficiently close to 0.

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 12 Gradient Descent Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.