Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "METHOD OF STEEPEST DESCENT"— Presentation transcript:

Week 5 ELE Adaptive Signal Processing

2 Mean Square Error (Revisited)
For a transversal filter (of length M), the output is written as and the error term wrt. a certain desired response is Week 5 ELE Adaptive Signal Processing

3 Mean Square Error (Revisited)
Following these terms, the MSE criterion is defined as Substituting e(n) and manupulating the expression, we get where Quadratic in w ! Week 5 ELE Adaptive Signal Processing

4 Mean Square Error (Revisited)
For notational simplicity, express MSE in terms of vector/matrices where Week 5 ELE Adaptive Signal Processing

5 Mean Square Error (Revisited)
We found that the solution (optimum filter coef.s wo) is given by the Wiener-Hopf eqn.s Inversion of R can be very costly. J(w) is quadratic in w → convex in w → for wo, Surface has a single minimum and it is global, then Can we reach to wo, i.e with a less demanding algorithm? Week 5 ELE Adaptive Signal Processing

6 Basic Idea of the Method of Steepest Descent
Can we find wo in an iterative manner? Week 5 ELE Adaptive Signal Processing

7 Basic Idea of the Method of Steepest Descent
Starting from w(0), generate a sequence {w(n)} with the property Many sequences can be found following different rules. Method of steepest descent generates points using the gradient Gradient of J at point w, i.e gives the direction at which the function increases most. Then gives the direction at which the function decreases most. Release a tiny ball on the surface of J → it follows negative gradient of the surface. Week 5 ELE Adaptive Signal Processing

8 Basic Idea of the Method of Steepest Descent
For notational simplicity, let , then going in the direction given by the negative gradient How far should we go in –g → defined by the step size param. μ Optimum step size can be obtained by line search - difficult Generally a constant step size is taken for simplicity. Then, at each step improvement in J is (from Taylor series expansion) Week 5 ELE Adaptive Signal Processing

9 Application of SD to Wiener Filter
For w(n) From the theory of Wiener Filter we know that Then the update eqn. Becomes which defines a feedback connection. Week 5 ELE Adaptive Signal Processing

10 ELE 774 - Adaptive Signal Processing
Convergence Analysis Feedback → may cause stability problems under certain conditions. Depends on The step size, μ The autocorrelation matrix, R Does SD converge? Under which conditions? What is the rate of convergence? We may use the canonical representation. Let the weight-error vector be then the update eqn. becomes Week 5 ELE Adaptive Signal Processing

11 ELE 774 - Adaptive Signal Processing
Convergence Analysis Let be the eigendecomposition of R. Then Using QQH=I Apply the change of coordinates Then, the update eqn. becomes Week 5 ELE Adaptive Signal Processing

12 ELE 774 - Adaptive Signal Processing
Convergence Analysis We know that Λ is diagonal, then the k-th natural mode is or, with the initial values vk(0), we have Note the geometric series Week 5 ELE Adaptive Signal Processing

13 ELE 774 - Adaptive Signal Processing
Convergence Analysis Obviously for stability or, simply Geometric series results in an exponentially decaying curve with time constant τk, where letting or Why? Week 5 ELE Adaptive Signal Processing

14 ELE 774 - Adaptive Signal Processing
Convergence Analysis We have but We know that Q is composed of the eigenvectors of R, then or Each filter coefficient decays exponentially. The overall rate of convergence is limited by the slowest and fastest modes then Week 5 ELE Adaptive Signal Processing

15 ELE 774 - Adaptive Signal Processing
Convergence Analysis For small step size What is v(0)? The initial value v(0) is For simplicity assume that w(0)=0, then Week 5 ELE Adaptive Signal Processing

16 ELE 774 - Adaptive Signal Processing
Convergence Analysis Transient behaviour: From the canonical form we know that then As long as the upper limit on the step size parameter μ is satisfied, regardless of the initial point Week 5 ELE Adaptive Signal Processing

17 ELE 774 - Adaptive Signal Processing
Convergence Analysis The progress of J(n) for n=0,1,... is called the learning curve. The learning curve of the steepest-descent algorithm consists of a sum of exponentials, each of which corresponds to a natural mode of the problem. # natural modes = # filter taps Week 5 ELE Adaptive Signal Processing

18 ELE 774 - Adaptive Signal Processing
Example A predictor with 2 taps (w1(n) and w2(n)) is used to find the params. of the AR process Examine the transient behaviour for Fixed step size, varying eigenvalue spread Fixed eigenvalue spread, varying step size. σv2 is adjusted so that σu2=1. Week 5 ELE Adaptive Signal Processing

19 ELE 774 - Adaptive Signal Processing
Example The AR process: Two eigenmodes Condition number Week 5 ELE Adaptive Signal Processing

20 ELE 774 - Adaptive Signal Processing
Example (Experiment 1) Experiment 1: Keep the step size fixed at Change the eigenvalue spread Week 5 ELE Adaptive Signal Processing

21 ELE 774 - Adaptive Signal Processing
Example (Experiment 1) Week 5 ELE Adaptive Signal Processing

22 ELE 774 - Adaptive Signal Processing
Week 5 ELE Adaptive Signal Processing

23 ELE 774 - Adaptive Signal Processing
Example (Experiment 2) Keep the eigenvalue spread fixed at Change the step size (μmax=1.1) Week 5 ELE Adaptive Signal Processing

24 ELE 774 - Adaptive Signal Processing
Week 5 ELE Adaptive Signal Processing

25 ELE 774 - Adaptive Signal Processing
Example (Experiment 2) Depending on the value of μ, the learning curve can be Overdamped, moves smoothly to the min. ((very) small μ) Underdamped, oscillates towards the min. (large μ< μmax) Critically damped Generally rate of convergence is slow for the first two. Week 5 ELE Adaptive Signal Processing

26 ELE 774 - Adaptive Signal Processing
Observations SD is a ‘deterministic’ algorithm, i.e. we assume that R and p are known exactly. In practice they can only be estimated Sample average? Can have high computational complexity. SD is a local search algorithm, but for Wiener filtering, the cost surface is convex (quadratic) convergence is guaranteed as long as μ< μmax is satisfied. Week 5 ELE Adaptive Signal Processing

27 ELE 774 - Adaptive Signal Processing
Observations The origin of SD comes from the Taylor series expansion (as many other local search optimization algorithms) Convergence can we very slow. To speed up the process, second term can also be included as in the Newton’s Method High computational complexity (inversion), numerical stability problems. Hessian Week 5 ELE Adaptive Signal Processing


Similar presentations

Ads by Google