Chapter 5ELE 774 - Adaptive Signal Processing 1 Least Mean-Square Adaptive Filtering.

Slides:



Advertisements
Similar presentations
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Advertisements

Adaptive Filters S.B.Rabet In the Name of GOD Class Presentation For The Course : Custom Implementation of DSP Systems University of Tehran 2010 Pages.
The General Linear Model Or, What the Hell’s Going on During Estimation?
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
ELE Adaptive Signal Processing
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
The loss function, the normal equation,
Performance Optimization
EE322 Digital Communications
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.
Point estimation, interval estimation
Prediction and model selection
Linear and generalised linear models
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering
Linear Prediction Problem: Forward Prediction Backward Prediction
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptive Noise Cancellation ANC W/O External Reference Adaptive Line Enhancement.
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
Digital Communications Fredrik Rusek Chapter 10, adaptive equalization and more Proakis-Salehi.
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission ( ) Wiener Filtering Derivation Comments Re-sampling and Re-sizing 1D  2D 10/5/06.
Equalization in a wideband TDMA system
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Algorithm Taxonomy Thus far we have focused on:
Introduction to Adaptive Digital Filters Algorithms
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
Method of Least Squares. Least Squares Method of Least Squares:  Deterministic approach The inputs u(1), u(2),..., u(N) are applied to the system The.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Adv DSP Spring-2015 Lecture#9 Optimum Filters (Ch:7) Wiener Filters.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
LEAST MEAN-SQUARE (LMS) ADAPTIVE FILTERING. Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Robotics Research Laboratory 1 Chapter 7 Multivariable and Optimal Control.
Chapter 11 Filter Design 11.1 Introduction 11.2 Lowpass Filters
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Dept. E.E./ESAT-STADIUS, KU Leuven
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Professors: Eng. Diego Barral Eng. Mariano Llamedo Soria Julian Bruno
3.7 Adaptive filtering Joonas Vanninen Antonio Palomino Alarcos.
Overview of Adaptive Filters Quote of the Day When you look at yourself from a universal standpoint, something inside always reminds or informs you that.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Autoregressive (AR) Spectral Estimation
Discrete-time Random Signals
Recursive Least-Squares (RLS) Adaptive Filters
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Impulse Response Measurement and Equalization Digital Signal Processing LPP Erasmus Program Aveiro 2012 Digital Signal Processing LPP Erasmus Program Aveiro.
Geology 6600/7600 Signal Analysis 23 Oct 2015
State-Space Recursive Least Squares with Adaptive Memory College of Electrical & Mechanical Engineering National University of Sciences & Technology (NUST)
DSP-CIS Part-III : Optimal & Adaptive Filters Chapter-9 : Kalman Filters Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
ELG5377 Adaptive Signal Processing Lecture 13: Method of Least Squares.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Equalization in a wideband TDMA system
Assoc. Prof. Dr. Peerapol Yuvapoositanon
Chapter 2 Minimum Variance Unbiased estimation
Modern Spectral Estimation
لجنة الهندسة الكهربائية
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Instructor :Dr. Aamer Iqbal Bhatti
METHOD OF STEEPEST DESCENT
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Chapter 5ELE Adaptive Signal Processing 1 Least Mean-Square Adaptive Filtering

ELE Adaptive Signal Processing2 Chapter 5 Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. In practice we can only estimate these functions.

ELE Adaptive Signal Processing3 Chapter 5 Basic Idea The simplest estimate of the expectations is  To remove the expectation terms and replace them with the instantaneous values, i.e. Then, the gradient becomes Eventually, the new update rule is No expectations, Instantaneous samples!

ELE Adaptive Signal Processing4 Chapter 5 Basic Idea However the term in the brackets is the error, i.e. then is the gradient of instead of as in SD.

ELE Adaptive Signal Processing5 Chapter 5 Basic Idea Filter weights are updated using instantaneous values

ELE Adaptive Signal Processing6 Chapter 5 Update Equation for Method of Steepest Descent Update Equation for Least Mean-Square

ELE Adaptive Signal Processing7 Chapter 5 LMS Algorithm Since the expectations are omitted, the estimates will have a high variance. Therefore, the recursive computation of each tap weight in the LMS algorithm suffers from a gradient noise. In contrast to SD which is a deterministic algorithm, LMS is a member of the family of stochastic gradient descent algorithms. LMS has higher MSE (J(∞)) compared to SD (J min ) (Wiener Soln.) as n→∞  i.e., J(n) →J(∞) as n→∞  Difference is called the excess mean-square error J ex (∞)  The ratio J ex (∞)/ J min is called the misadjustment.  Hopefully, J(∞) is a finite value, then LMS is said to be stable in the mean square sense.  LMS will perform a random motion around the Wiener solution. unbiased

ELE Adaptive Signal Processing8 Chapter 5 LMS Algorithm Involves a feedback connection. Although LMS might seem very difficult to work due the randomness, the feedback acts as a low-pass filter or performs averaging so that the randomness can be filtered-out. The time-constant of averaging is inversely proportional to μ. Actually, if  is chosen small enough, the adaptive process is made to progress slowly and the effects of the gradient noise on the tap weights are largely filtered-out. Computational complexity of LMS is very low → very attractive  Only 2M+1 complex multiplications and 2M complex additions per iteration.

ELE Adaptive Signal Processing9 Chapter 5 LMS Algorithm

ELE Adaptive Signal Processing10 Chapter 5 Canonical Model LMS algorithm for complex signals/with complex coef.s can be represented in terms of four separate LMS algorithms for real signals with cross-coupling between them. Write the input/desired signal/tap gains/output/error in the complex notation

ELE Adaptive Signal Processing11 Chapter 5 Canonical Model Then the relations bw. these expressions are

ELE Adaptive Signal Processing12 Chapter 5 Canonical Model

ELE Adaptive Signal Processing13 Chapter 5 Canonical Model

ELE Adaptive Signal Processing14 Chapter 5 Analysis of the LMS Algorithm Although the filter is a linear combiner, the algorithm is highly non- linear and violates superposition and homogenity Assume the initial condition, then Analysis will continue using the weight-error vector and its autocorrelation input output Here we use expectation, however, actually it is the ensemble average!.

ELE Adaptive Signal Processing15 Chapter 5 Analysis of the LMS Algorithm We have Let Then the update eqn. can be written as Analyse convergence in an average sense  Algorithm run many times→study their ensemble average behavior

ELE Adaptive Signal Processing16 Chapter 5 Analysis of the LMS Algorithm Using It can be shown that Small step size assumption Here we use expectation, however, actually it is the ensemble average!.

ELE Adaptive Signal Processing17 Chapter 5 Small Step Size Analysis Assumption I: step size  is small (how small?) → LMS filter act like a low-pass filter with very low cut-off frequency. Assumption II: Desired response is described by a linear multiple regression model that is matched exactly by the optimum Wiener filter where e o (n) is the irreducible estimation error and Assumption III: The input and the desired response are jointly Gaussian.

ELE Adaptive Signal Processing18 Chapter 5 Small Step Size Analysis Applying the similarity transformation resulting from the eigendecom. on i.e. Then, we have where We do not have this term in Wiener filtering!. Components of v(n) are uncorrelated! HW: Prove these relations.

ELE Adaptive Signal Processing19 Chapter 5 Small Step Size Analysis Components of v(n) are uncorrelated:  first order difference equation ( Brownian motion, thermodynamics ) Solution: Iterating from n=0 natural component of v(n) forced component of v(n) stochastic force

ELE Adaptive Signal Processing20 Chapter 5 Learning Curves Two kinds of learning curves  Mean-square error (MSE) learning curve  Mean-square deviation (MSD) learning curve Ensemble averaging → results of many (→∞) realizations are averaged. What is the relation bw. MSE and MSD? for  small

ELE Adaptive Signal Processing21 Chapter 5 Learning Curves under the assumptions of slide Excess MSE  LMS performs worse than SD, there is always an excess MSE for  small ← use

ELE Adaptive Signal Processing22 Chapter 5 Learning Curves Mean-square deviation D is lower-upper bounded by the excess MSE. They have similar response: decaying as n grows or

ELE Adaptive Signal Processing23 Chapter 5 Convergence For  small Hence, for convergence The ensemble-average learning curve of an LMS filter does not exhibit oscillations, rather, it decays exponentially to the const. value or J ex (n)

ELE Adaptive Signal Processing24 Chapter 5 Misadjustment Misadjustment, define  For small , from prev. slide or equivalently but then

ELE Adaptive Signal Processing25 Chapter 5 Average Time Constant From SD we know that but then

ELE Adaptive Signal Processing26 Chapter 5 Observations Misadjustment is  directly proportional to the filter length M, for a fixed  mse,av  inversely proportional to the time constant  mse,av slower convergence results in lower misadjustment.  Directly proportional to the step size  smaller step size results in lower misadjustment.  Time constant is inversely proportional to the step size   smaller step size results in slower convergence  Large  requires the inclusion of  k (n) (k≥1) into the analysis Difficult to analyse, small step analysis is no longer valid, learning curve becomes more noisy

ELE Adaptive Signal Processing27 Chapter 5 LMS vs. SD Main goal is to minimise the Mean Square Error (MSE) Optimum solution found by Wiener-Hopf equations. Requires auto/cross-correlations. Achieves the minimum value of MSE, J min. LMS and SD are iterative algorithms designed to find w o.  SD has direct access to auto/cross-correlations ( exact measurements ) can approach the Wiener solution w o, can go down to J min.  LMS uses instantenous estimates instead (noisy measurements) fluctuates around w o in a Brownian-motion manner, at most J(∞).

ELE Adaptive Signal Processing28 Chapter 5 LMS vs. SD Learning curves  SD has a well-defined curve composed of decaying exponentials  For LMS, curve is composed of noisy- decaying exponentials

ELE Adaptive Signal Processing29 Chapter 5 Statistical Wave Theory As filter length increases, M→∞  Propagation of electromagnetic disturbances along a transmission line towards infinity is similar to signals on n infinitely long LMS filter. Finite length LMS filter (transmission line)  Corrections have to be made at the edges to tackle reflections,  As length increases reflection region decreases compared to the total filter. Imposes a limit on the step size to avoid instability as M→∞ If the upper bound is exceeded, instability is observed. S max : maximum component of the PSD S(ω) of the tap inputs u(n).

ELE Adaptive Signal Processing30 Chapter 5 H ∞ Optimality of LMS A single realisation of LMS is not optimum in the MSE sense  Ensemble average is.  The previous derivation is heuristic (replacing auto/cross correlations with their instantenous estimates.) In what sense is LMS optimum?  It can be shown that LMS minimises Maximum energy gain of the filter under the constraint  Minimising the maximum of something → minimax Optimisation of an H ∞ criterion.

ELE Adaptive Signal Processing31 Chapter 5 H ∞ Optimality of LMS Provided that the step size parameter  satisfies the limits on the prev. slide, then no matter how different the initial weight vector is from the unknown parameter vector w o of the multiple regression model, and irrespective of the value of the additive disturbance (n), the error energy produced at the output of the LMS filter will never exceed a certain level.

ELE Adaptive Signal Processing32 Chapter 5 Limits on the Step Size