Computer Vision Lecture 19: Object Recognition III

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

NEURAL NETWORKS Perceptron
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
September 16, 2010Neural Networks Lecture 4: Models of Neurons and Neural Networks 1 Capabilities of Threshold Neurons By choosing appropriate weights.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
DIGITAL IMAGE PROCESSING Dr J. Shanbehzadeh M. Hosseinajad ( J.Shanbehzadeh M. Hosseinajad)
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
From Biological to Artificial Neural Networks Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Artificial Intelligence Techniques Multilayer Perceptrons.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
CS621 : Artificial Intelligence
Chapter 2 Single Layer Feedforward Networks
November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.
EEE502 Pattern Recognition
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Neural Networks 2nd Edition Simon Haykin
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
April 5, 2016Introduction to Artificial Intelligence Lecture 17: Neural Network Paradigms II 1 Capabilities of Threshold Neurons By choosing appropriate.
1 Neural Networks Winter-Spring 2014 Instructor: A. Sahebalam Instructor: A. Sahebalam Neural Networks Lecture 3: Models of Neurons and Neural Networks.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Supervised Learning in ANNs
Neural Networks Winter-Spring 2014
Chapter 2 Single Layer Feedforward Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
One-layer neural networks Approximation problems
第 3 章 神经网络.
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
FUNDAMENTAL CONCEPT OF ARTIFICIAL NETWORKS
CS621: Artificial Intelligence
Chapter 3. Artificial Neural Networks - Introduction -
Artificial Intelligence Methods
of the Artificial Neural Networks.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Multilayer Perceptron & Backpropagation
Supervised Function Approximation
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
The Naïve Bayes (NB) Classifier
Artificial Intelligence Chapter 3 Neural Networks
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Artificial Intelligence Chapter 3 Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Computer Vision Lecture 19: Object Recognition III Linear Separability Input space in the two-dimensional case (n = 2): x2 1 2 3 -3 -2 -1 x2 1 2 3 -3 -2 -1 x2 1 2 3 -3 -2 -1 1 1 1 x1 1 2 3 -3 -2 -1 x1 1 2 3 -3 -2 -1 x1 1 2 3 -3 -2 -1 w1 = 1, w2 = 2,  = 2 w1 = -2, w2 = 1,  = 2 w1 = -2, w2 = 1,  = 1 April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Linear Separability So by varying the weights and the threshold, we can realize any linear separation of the input space into a region that yields output 1, and another region that yields output 0. As we have seen, a two-dimensional input space can be divided by any straight line. A three-dimensional input space can be divided by any two-dimensional plane. In general, an n-dimensional input space can be divided by an (n-1)-dimensional plane or hyperplane. Of course, for n > 3 this is hard to visualize. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons What do we do if we need a more complex function? We can combine multiple artificial neurons to form networks with increased capabilities. For example, we can build a two-layer network with any number of neurons in the first layer giving input to a single neuron in the second layer. The neuron in the second layer could, for example, implement an AND function. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons x1 x2 . xi What kind of function can such a network realize? April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons Assume that the dotted lines in the diagram represent the input-dividing lines implemented by the neurons in the first layer: 1st comp. 2nd comp. Then, for example, the second-layer neuron could output 1 if the input is within a polygon, and 0 otherwise. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons However, we still may want to implement functions that are more complex than that. An obvious idea is to extend our network even further. Let us build a network that has three layers, with arbitrary numbers of neurons in the first and second layers and one neuron in the third layer. The first and second layers are completely connected, that is, each neuron in the first layer sends its output to every neuron in the second layer. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons x1 x2 . oi What type of function can a three-layer network realize? April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons Assume that the polygons in the diagram indicate the input regions for which each of the second-layer neurons yields output 1: 1st comp. 2nd comp. Then, for example, the third-layer neuron could output 1 if the input is within any of the polygons, and 0 otherwise. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Capabilities of Threshold Neurons The more neurons there are in the first layer, the more vertices can the polygons have. With a sufficient number of first-layer neurons, the polygons can approximate any given shape. The more neurons there are in the second layer, the more of these polygons can be combined to form the output function of the network. With a sufficient number of neurons and appropriate weight vectors wi, a three-layer network of threshold neurons can realize any (!) function Rn  {0, 1}. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Terminology Usually, we draw neural networks in such a way that the input enters at the bottom and the output is generated at the top. Arrows indicate the direction of data flow. The first layer, termed input layer, just contains the input vector and does not perform any computations. The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer. After applying their activation function, the neurons in the output layer contain the output vector. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Terminology Example: Network function f: R3  {0, 1}2 output vector output layer hidden layer input layer input vector April 19, 2018 Computer Vision Lecture 19: Object Recognition III

General Network Structure April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Feedback-Based Weight Adaptation Feedback from environment (possibly teacher) is used to improve the system’s performance Synaptic weights are modified to reduce the system’s error in computing a desired function For example, if increasing a specific weight increases error, then the weight is decreased Small adaptation steps are needed to find optimal set of weights Learning rate can vary during learning process Typical for supervised learning April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Network Training Basic idea: Define error function to measure deviation of network output from desired output across all training exemplars. As the weights of the network completely determine the function computed by it, this error is a function of all weights. We need to find those weights that minimize the error. An efficient way of doing this is based on the technique of gradient descent. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Gradient Descent Gradient descent is a very common technique to find the absolute minimum of a function. It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the network’s (or neuron’s) error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Gradient Descent Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): f(x) x slope: f’(x0) x0 x1 = x0 - f’(x0) Repeat this iteratively until for some xi, f’(xi) is sufficiently close to 0. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Gradient Descent Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Computer Vision Lecture 19: Object Recognition III Multilayer Networks The backpropagation algorithm was popularized by Rumelhart, Hinton, and Williams (1986). This algorithm solved the “credit assignment” problem, i.e., crediting or blaming individual neurons across layers for particular outputs. The error at the output layer is propagated backwards to units at lower layers, so that the weights of all neurons can be adapted appropriately. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Backpropagation Learning Algorithm Backpropagation; Start with randomly chosen weights; while MSE is above desired threshold and computational bounds are not exceeded, do for each input pattern xp, 1  p  P, Compute hidden node inputs; Compute hidden node outputs; Compute inputs to the output nodes; Compute the network outputs; Compute the error between output and desired output; Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes; end-for end-while. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Supervised Function Approximation There is a tradeoff between a network’s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate). This problem is similar to fitting a function to a given set of data points. Let us assume that you want to find a fitting function f:RR for a set of three data points. You try to do this with polynomials of degree one (a straight line), two, and nine. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Supervised Function Approximation deg. 2 f(x) x deg. 9 deg. 1 Obviously, the polynomial of degree 2 provides the most plausible fit. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Supervised Function Approximation The same principle applies to ANNs: If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; there are only heuristics. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Reducing Overfitting with Dropout During each training step, we “turn off” a randomly chosen subset of 50% of the hidden-layer neurons, i.e., we set their output to zero. During testing, we once again use all neurons but reduce their outputs by 50% to compensate for the increased number of inputs to each unit. By doing this, we prevent each neuron from relying on the output of any particular other neuron in the network. It can be argued that in this way we train an astronomical number of decoupled sub-networks, whose expertise is combined when using all neurons again. Due to the changing composition of sub-networks it is much more difficult to overfit any of them. April 19, 2018 Computer Vision Lecture 19: Object Recognition III

Reducing Overfitting with Dropout April 19, 2018 Computer Vision Lecture 19: Object Recognition III