Performance Optimization

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Introduction to Neural Networks Computing
Artificial Neural Networks
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Overview over different methods – Supervised Learning
Widrow-Hoff Learning. Outline 1 Introduction 2 ADALINE Network 3 Mean Square Error 4 LMS Algorithm 5 Analysis of Converge 6 Adaptive Filtering.
Simple Neural Nets For Pattern Classification
x – independent variable (input)
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.
Back-Propagation Algorithm
Unconstrained Optimization Problem
Before we start ADALINE
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ADALINE (ADAptive LInear NEuron) Network and
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 2 Single Layer Feedforward Networks
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
SUPERVISED LEARNING NETWORK
Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.
EEE502 Pattern Recognition
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
10 1 Widrow-Hoff Learning (LMS Algorithm) ADALINE Network  w i w i1  w i2  w iR  =
Lecture 2 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 2/1 Dr.-Ing. Erwin Sitompul President University
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Fall 2004 Backpropagation CS478 - Machine Learning.
Chapter 2 Single Layer Feedforward Networks
第 3 章 神经网络.
Ranga Rodrigo February 8, 2014
Pipelined Adaptive Filters
A Simple Artificial Neuron
Classification with Perceptrons Reading:
Widrow-Hoff Learning (LMS Algorithm).
Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.
Biological and Artificial Neuron
Biological and Artificial Neuron
Variations on Backpropagation.
Outline Single neuron case: Nonlinear error correcting learning
Optimization Part II G.Anuradha.
Ch2: Adaline and Madaline
METHOD OF STEEPEST DESCENT
Biological and Artificial Neuron
Capabilities of Threshold Neurons
Backpropagation.
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Variations on Backpropagation.
Performance Surfaces.
Performance Optimization
Presentation transcript:

Performance Optimization Steepest Descent

objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize F(x)

Basic Optimization Algorithm pk - Search Direction k - Learning Rate

Choose the next step so that the function decreases: Steepest Descent Choose the next step so that the function decreases:

For small changes in x we can approximate F(x): Steepest Descent For small changes in x we can approximate F(x): where

If we want the function to decrease: Steepest Descent If we want the function to decrease:

Steepest Descent If we want the function to decrease: We can maximize the decrease by choosing:

We can maximize the decrease by choosing: Steepest Descent We can maximize the decrease by choosing: Two general methods to select ak: - minimize F(x) w.r.t. ak - use a predetermined value (e.g. 0.2, 1/k)

Example

Plot

Stable Learning Rates  Suppose that the performance index is a quadratic function:   Steepest descent algorithm with constant learning rate:  A linear dynamic system will be stable if the eigenvalues of the matrix [I-A] are less than one in magnitude. 

Stable Learning Rates (Quadratic) Stability is determined by the eigenvalues of this matrix. (i - eigenvalue of A) Eigenvalues of [I - A].

Stable Learning Rates  Let {1, 2,…, n} and {z1,z2,…, zn} be the eigenvalues and eigenvectors of the Hessian matrix. Then  Condition for the stability of the steepest descent algorithm is then  Assume that the quadratic function has a strong minimum point, then its eigenvalues must be positive numbers. Hence,  This must be true for all eigenvalues: 

Example

CHAPTER 10 Widrow-Hoff Learning

Objectives Widrow-Hoff learning is an approximate steepest descent algorithm, in which the performance index is mean square error. It is widely used today in many signal processing applications. It is precursor to the backpropagation algorithm for multilayer networks.

ADALINE Network ADALINE (Adaptive Linear Neuron) network and its learning rule, LMS (Least Mean Square) algorithm are proposed by Widrow and Marcian Hoff in 1960. Both ADALINE network and the perceptron suffer from the same inherent limitation: they can only solve linearly separable problems. The LMS algorithm minimizes mean square error (MSE), and therefore tires to move the decision boundaries as far from the training patterns as possible.

ADALINE Network + + p a W n 1 b p a W n 1 b SR n = Wp + b a = purelin(Wp + b) + a S1 n 1 b W SR R p R1 S Single-layer perceptron

Single ADALINE  Set n = 0, then Wp + b = 0 specifies a decision boundary. The ADALINE can be used to classify objects into two categories if they are linearly separable.

Mean Square Error The LMS algorithm is an example of supervised training. The LMS algorithm will adjust the weights and biases of the ADALINE in order to minimize the mean square error, where the error is the difference between the target output (tq) and the network output (pq). MSE: E[·]: expected value

Mean Square Error  

Example 1

Solved Problem P10.3 1 2 3 -3 -2 -1 4 So the contour of the performance surface will be circular. The center of the contours (the minimum point) is .

Approximate Steepest Descent

Approximate Gradient

Approximate Gradient(conti.)

Approximate Gradient(conti.)

LMS Algorithm The steepest descent algorithm with constant learning rate  is   Matrix notation of LMS algorithm:  The LMS algorithm is also referred to as the delta rule or the Widrow-Hoff learning algorithm. 

Quadratic Functions  General form of quadratic function: (A: Hessian matrix) If the eigenvalues of the Hessian matrix are all positive, then the quadratic function will have one unique global minimum.  ADALINE network mean square error:  

Orange/Apple Example  In practical applications, the stable learning rate  might NOT be practical to calculate R, and  could be selected by trial and error.

Orange/Apple Example  Start, arbitrary, with all the weights set to zero, and then will apply input p1, p2, p1, p2, etc., in that order, calculating the new weights after each input is presented.

Orange/Apple Example  This decision boundary falls halfway between the two reference patterns. The perceptron rule did NOT produce such a boundary,  The perceptron rule stops as soon as the patterns are correctly classified, even though some patterns may be close to the boundaries. The LMS algorithm minimizes the mean square error. 

Perceptron rule V.S. LMS algorithm

Perceptron rule V.S. LMS algorithm(conti.)

Perceptron rule V.S. LMS algorithm(conti.)

Perceptron rule V.S. LMS algorithm(conti.)

Solved Problem P10.4 Train the network using the LMS algorithm, with the initial guess set to zero and a learning rate  = 0.25. 

Solved Problem P10.8 Train the network using the LMS algorithm, with the initial guess set to zero and a learning rate  = 0.04. 

Tapped Delay Line D At the output of the tapped delay line we have an R-dim. vector, consisting of the input signal at the current time and at delays of from 1 to R–1 time steps.

Adaptive Filter  D

Solved Problem P10.1 D   Just prior to k = 0 ( k < 0 ): Three zeros have entered the filter, i.e., y(3) = y(2) = y(1) = 0, the output just prior to k = 0 is zero.  k = 0: 

Solved Problem P10.1 k = 1:  k = 2:  k = 3:  k = 4: 

Solved Problem P10.1   The effect of y(0) last from k = 0 through k = 2, so it will have an influence for three time intervals. This corresponds to the length of the impulse response of this filter. 

Solved Problem P10.6 D +  Application of ADALINE: adaptive predictor The purpose of this filter is to predict the next value of the input signal from the two previous values. Suppose that the input signal is a stationary random process with autocorrelation function given by  D + 

Solved Problem P10.6 Sketch the contour plot of the performance index (MSE). i.

Solved Problem P10.6 Performance Index (MSE): The optimal weights are The Hessian matrix is Eigenvalues: 1 = 4, 2 = 8. Eigenvectors: The contours of F(x) will be elliptical, with the long axis of each ellipse along the 1st eigenvector, since the 1st eigenvalue has the smallest magnitude. The ellipses will be centered at .

Solved Problem P10.6 ii. The maximum stable value of the learning for the LMS algorithm: iii. The LMS algorithm is approximate steepest descent, so the trajectory for small learning rates will move perpendicular to the contour lines. 1 2 -1 -2 1 2 -1 -2

Applications Noise cancellation system to remove 60-Hz noise from EEG signal (Fig. 10.6) Echo cancellation system in long distance telephone lines (Fig. 10.10) Filtering engine noise from pilot’s voice signal (Fig. P10.8)