METHOD OF STEEPEST DESCENT

Slides:



Advertisements
Similar presentations
OPTIMUM FILTERING.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
ELE Adaptive Signal Processing
AGC DSP AGC DSP Professor A G Constantinides©1 A Prediction Problem Problem: Given a sample set of a stationary processes to predict the value of the process.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
2806 Neural Computation Single Layer Perceptron Lecture Ari Visa.
Performance Optimization
Numerical Optimization
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Prediction and model selection
Unconstrained Optimization Problem
Improved BP algorithms ( first order gradient method) 1.BP with momentum 2.Delta- bar- delta 3.Decoupled momentum 4.RProp 5.Adaptive BP 6.Trinary BP 7.BP.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Advanced Topics in Optimization
Linear Discriminant Functions Chapter 5 (Duda et al.)
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering
Linear Prediction Problem: Forward Prediction Backward Prediction
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
Chapter 5ELE Adaptive Signal Processing 1 Least Mean-Square Adaptive Filtering.
Computational Optimization
UNCONSTRAINED MULTIVARIABLE
Equalization in a wideband TDMA system
Algorithm Taxonomy Thus far we have focused on:
Introduction to Adaptive Digital Filters Algorithms
By Asst.Prof.Dr.Thamer M.Jamel Department of Electrical Engineering University of Technology Baghdad – Iraq.
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
Unit-V DSP APPLICATIONS. UNIT V -SYLLABUS DSP APPLICATIONS Multirate signal processing: Decimation Interpolation Sampling rate conversion by a rational.
LEAST MEAN-SQUARE (LMS) ADAPTIVE FILTERING. Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Professors: Eng. Diego Barral Eng. Mariano Llamedo Soria Julian Bruno
Overview of Adaptive Filters Quote of the Day When you look at yourself from a universal standpoint, something inside always reminds or informs you that.
Variations on Backpropagation.
Recursive Least-Squares (RLS) Adaptive Filters
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
10 1 Widrow-Hoff Learning (LMS Algorithm) ADALINE Network  w i w i1  w i2  w iR  =
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
CSC321: Neural Networks Lecture 9: Speeding up the Learning
Automatic Control Theory CSE 322
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Techniques to Mitigate Fading Effects
Adaptive Filters Common filter design methods assume that the characteristics of the signal remain constant in time. However, when the signal characteristics.
Pipelined Adaptive Filters
Equalization in a wideband TDMA system
Widrow-Hoff Learning (LMS Algorithm).
Properties Of the Quadratic Performance Surface
CHAPTER 3 RECURSIVE ESTIMATION FOR LINEAR MODELS
CS5321 Numerical Optimization
Variations on Backpropagation.
لجنة الهندسة الكهربائية
Equalization in a wideband TDMA system
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instructor :Dr. Aamer Iqbal Bhatti
6.5 Taylor Series Linearization
دانشگاه صنعتي اميركبير
Variations on Backpropagation.
Performance Optimization
16. Mean Square Estimation
Outline Preface Fundamentals of Optimization
Section 3: Second Order Methods
Presentation transcript:

METHOD OF STEEPEST DESCENT Week 5 ELE 774 - Adaptive Signal Processing

Mean Square Error (Revisited) For a transversal filter (of length M), the output is written as and the error term wrt. a certain desired response is Week 5 ELE 774 - Adaptive Signal Processing

Mean Square Error (Revisited) Following these terms, the MSE criterion is defined as Substituting e(n) and manupulating the expression, we get where Quadratic in w ! Week 5 ELE 774 - Adaptive Signal Processing

Mean Square Error (Revisited) For notational simplicity, express MSE in terms of vector/matrices where Week 5 ELE 774 - Adaptive Signal Processing

Mean Square Error (Revisited) We found that the solution (optimum filter coef.s wo) is given by the Wiener-Hopf eqn.s Inversion of R can be very costly. J(w) is quadratic in w → convex in w → for wo, Surface has a single minimum and it is global, then Can we reach to wo, i.e. with a less demanding algorithm? Week 5 ELE 774 - Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent Can we find wo in an iterative manner? Week 5 ELE 774 - Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent Starting from w(0), generate a sequence {w(n)} with the property Many sequences can be found following different rules. Method of steepest descent generates points using the gradient Gradient of J at point w, i.e. gives the direction at which the function increases most. Then gives the direction at which the function decreases most. Release a tiny ball on the surface of J → it follows negative gradient of the surface. Week 5 ELE 774 - Adaptive Signal Processing

Basic Idea of the Method of Steepest Descent For notational simplicity, let , then going in the direction given by the negative gradient How far should we go in –g → defined by the step size param. μ Optimum step size can be obtained by line search - difficult Generally a constant step size is taken for simplicity. Then, at each step improvement in J is (from Taylor series expansion) Week 5 ELE 774 - Adaptive Signal Processing

Application of SD to Wiener Filter For w(n) From the theory of Wiener Filter we know that Then the update eqn. Becomes which defines a feedback connection. Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis Feedback → may cause stability problems under certain conditions. Depends on The step size, μ The autocorrelation matrix, R Does SD converge? Under which conditions? What is the rate of convergence? We may use the canonical representation. Let the weight-error vector be then the update eqn. becomes Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis Let be the eigendecomposition of R. Then Using QQH=I Apply the change of coordinates Then, the update eqn. becomes Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis We know that Λ is diagonal, then the k-th natural mode is or, with the initial values vk(0), we have Note the geometric series Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis Obviously for stability or, simply Geometric series results in an exponentially decaying curve with time constant τk, where letting or Why? Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis We have but We know that Q is composed of the eigenvectors of R, then or Each filter coefficient decays exponentially. The overall rate of convergence is limited by the slowest and fastest modes then Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis For small step size What is v(0)? The initial value v(0) is For simplicity assume that w(0)=0, then Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis Transient behaviour: From the canonical form we know that then As long as the upper limit on the step size parameter μ is satisfied, regardless of the initial point Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Convergence Analysis The progress of J(n) for n=0,1,... is called the learning curve. The learning curve of the steepest-descent algorithm consists of a sum of exponentials, each of which corresponds to a natural mode of the problem. # natural modes = # filter taps Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example A predictor with 2 taps (w1(n) and w2(n)) is used to find the params. of the AR process Examine the transient behaviour for Fixed step size, varying eigenvalue spread Fixed eigenvalue spread, varying step size. σv2 is adjusted so that σu2=1. Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example The AR process: Two eigenmodes Condition number Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example (Experiment 1) Experiment 1: Keep the step size fixed at Change the eigenvalue spread Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example (Experiment 1) Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example (Experiment 2) Keep the eigenvalue spread fixed at Change the step size (μmax=1.1) Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Example (Experiment 2) Depending on the value of μ, the learning curve can be Overdamped, moves smoothly to the min. ((very) small μ) Underdamped, oscillates towards the min. (large μ< μmax) Critically damped Generally rate of convergence is slow for the first two. Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Observations SD is a ‘deterministic’ algorithm, i.e. we assume that R and p are known exactly. In practice they can only be estimated Sample average? Can have high computational complexity. SD is a local search algorithm, but for Wiener filtering, the cost surface is convex (quadratic) convergence is guaranteed as long as μ< μmax is satisfied. Week 5 ELE 774 - Adaptive Signal Processing

ELE 774 - Adaptive Signal Processing Observations The origin of SD comes from the Taylor series expansion (as many other local search optimization algorithms) Convergence can we very slow. To speed up the process, second term can also be included as in the Newton’s Method High computational complexity (inversion), numerical stability problems. Hessian Week 5 ELE 774 - Adaptive Signal Processing