EE459 Neural Networks Backpropagation

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Introduction to Artificial Neural Networks
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Artificial Neural Networks - Introduction -
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification Neural Networks 1
Perceptron.
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Simple Neural Nets For Pattern Classification
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
An Introduction To The Backpropagation Algorithm Who gets the credit?
LOGO Classification III Lecturer: Dr. Bo Yuan
CS 4700: Foundations of Artificial Intelligence
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Computer Science and Engineering
Artificial Neural Networks
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Chapter 6 Neural Network.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
An Introduction To The Backpropagation Algorithm.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
An Introduction To The Backpropagation Algorithm
Lecture Notes for Chapter 4 Artificial Neural Networks
Machine Learning: Lecture 4
Machine Learning: UNIT-2 CHAPTER-1
Presentation transcript:

EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University

Background Artificial neural networks (ANNs) provide a general, practical method for learning real-valued, discrete-valued, and vector-valued functions from examples. Algorithms such as BACKPROPAGATION use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as face recognition/detection, speech recognition, and learning robot control strategies.

Autonomous Vehicle Steering

Characteristics of ANNs Instances are represented by many attribute-value pairs. The target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. The training examples may contain errors. Long training times are acceptable. Fast evaluation of the learned target function may be required. The ability of humans to understand the learned target function is not important.

Very simple example net input = 0.4  0 + -0.1  1 = -0.1 0.4 -0.1 1

Learning problem to be solved Suppose we have an input pattern (0 1) We have a single output pattern (1) We have a net input of -0.1, which gives an output pattern of (0) How could we adjust the weights, so that this situation is remedied and the spontaneous output matches our target output pattern of (1)?

Answer Increase the weights, so that the net input exceeds 0.0 E.g., add 0.2 to all weights Observation: Weight from input node with activation 0 does not have any effect on the net input So we will leave it alone

Perceptrons One type of ANN system is based on a unit called a perceptron. The perceptron function can sometimes be written as The space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors.

Representational Power of Perceptrons

Decision surface linear decision surface nonlinear decision surface Programming Example of Decision Surface

The Perceptron Training Rule One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input according to the rule

Gradient Descent and Delta Rule The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples.

Visualizing the Hypothesis Space initial weight vector by random minimum error

Derivation of the Gradient Descent Rule The vector derivative is called the gradient of E with respect to , written The gradient specifies the direction that produces the steepest increase in E. The negative of this vector therefore gives the direction of steepest decrease. The training rule for gradient descent is

Derivation of the Gradient Descent Rule (cont.) The negative sign is presented because we want to move the weight vector in the direction that decreases E. This training rule can also written in its component form which makes it clear that steepest descent is achieved by altering each component of in proportion to .

Derivation of the Gradient Descent Rule (cont.) The vector of derivatives that form the gradient can be obtained by differentiating E The weight update rule for standard gradient descent can be summarized as

Stochastic Approximation to Gradient Descent EECP0720 Expert Systems – Artificial Neural Networks Stochastic Approximation to Gradient Descent

EECP0720 Expert Systems – Artificial Neural Networks Summary of Perceptron Perceptron training rule guaranteed to succeed if training examples are linearly separable sufficiently small learning rate Linear unit training rule uses gradient descent guaranteed to converge to hypothesis with minimum squared error given sufficiently small learning rate even when training data contains noise

BACKPROPAGATION Algorithm EECP0720 Expert Systems – Artificial Neural Networks BACKPROPAGATION Algorithm

Error Function The Backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for those outputs. We begin by redefining E to sum the errors over all of the network output units where outputs is the set of output units in the network, and tkd and okd are the target and output values associated with the kth output unit and training example d.

Architecture of Backpropagation

Backpropagation Learning Algorithm

Backpropagation Learning Algorithm (cont.)

Backpropagation Learning Algorithm (cont.)

Backpropagation Learning Algorithm (cont.)

Backpropagation Learning Algorithm (cont.)

Inputs To Neurons Arise from other neurons or from outside the network Nodes whose inputs arise outside the network are called input nodes and simply copy values An input may excite or inhibit the response of the neuron to which it is applied, depending upon the weight of the connection

Weights Represent synaptic efficacy and may be excitatory or inhibitory Normally, positive weights are considered as excitatory while negative weights are thought of as inhibitory Learning is the process of modifying the weights in order to produce a network that performs some function

Output The response function is normally nonlinear Samples include Sigmoid Piecewise linear

Backpropagation Preparation Training Set A collection of input-output patterns that are used to train the network Testing Set A collection of input-output patterns that are used to assess network performance Learning Rate-η A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments

Network Error Total-Sum-Squared-Error (TSSE) Root-Mean-Squared-Error (RMSE)

A Pseudo-Code Algorithm Randomly choose the initial weights While error is too large For each training pattern Apply the inputs to the network Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer Calculate the error at the outputs Use the output error to compute error signals for pre-output layers Use the error signals to compute weight adjustments Apply the weight adjustments Periodically evaluate the network performance

Face Detection using Neural Networks Training Process Face Database Output=1, for face database Non-Face Database Neural Network Face or Non-Face? Output=0, for non-face database Testing Process

Backpropagation Using Gradient Descent Advantages Relatively simple implementation Standard method and generally works well Disadvantages Slow and inefficient Can get stuck in local minima resulting in sub-optimal solutions

Local Minima Local Minimum Global Minimum

Alternatives To Gradient Descent Simulated Annealing Advantages Can guarantee optimal solution (global minimum) Disadvantages May be slower than gradient descent Much more complicated implementation

Alternatives To Gradient Descent Genetic Algorithms/Evolutionary Strategies Advantages Faster than simulated annealing Less likely to get stuck in local minima Disadvantages Slower than gradient descent Memory intensive for large nets

Enhancements To Gradient Descent Momentum Adds a percentage of the last movement to the current movement

Enhancements To Gradient Descent Momentum Useful to get over small bumps in the error function Often finds a minimum in less steps w(t) = -n*d*y + a*w(t-1) w is the change in weight n is the learning rate d is the error y is different depending on which layer we are calculating a is the momentum parameter

Enhancements To Gradient Descent Adaptive Backpropagation Algorithm It assigns each weight a learning rate That learning rate is determined by the sign of the gradient of the error function from the last iteration If the signs are equal it is more likely to be a shallow slope so the learning rate is increased The signs are more likely to differ on a steep slope so the learning rate is decreased This will speed up the advancement when on gradual slopes

Enhancements To Gradient Descent Adaptive Backpropagation Possible Problems: Since we minimize the error for each weight separately the overall error may increase Solution: Calculate the total output error after each adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates

Enhancements To Gradient Descent SuperSAB(Super Self-Adapting Backpropagation) Combines the momentum and adaptive methods. Uses adaptive method and momentum so long as the sign of the gradient does not change This is an additive effect of both methods resulting in a faster traversal of gradual slopes When the sign of the gradient does change the momentum will cancel the drastic drop in learning rate This allows for the function to roll up the other side of the minimum possibly escaping local minima

Enhancements To Gradient Descent SuperSAB Experiments show that the SuperSAB converges faster than gradient descent Overall this algorithm is less sensitive (and so is less likely to get caught in local minima)

Other Ways To Minimize Error Varying training data Cycle through input classes Randomly select from input classes Add noise to training data Randomly change value of input node (with low probability) Retrain with expected inputs after initial training E.g. Speech recognition

Other Ways To Minimize Error Adding and removing neurons from layers Adding neurons speeds up learning but may cause loss in generalization Removing neurons has the opposite effect