Presentation is loading. Please wait.

Presentation is loading. Please wait.

Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks.

Similar presentations


Presentation on theme: "Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks."— Presentation transcript:

1 Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks

2 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS2 Linear Adaptive Filter Linear adaptive filter performs a linear transformation of signal according to a performance measure which is minimized or maximized The development of LAFs followed work of Rosenblatt (perceptron) and early neural network researchers  LAFs can be considered as linear single layer feedforward neural networks  Least-mean-square algorithm is a popular learning algorithm for LAFs (and linear single layer networks) Wide applicability  Signal processing  Control

3 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS3 Historical Note Linear associative memory (early 1970s)  Function: memory by association  Type: linear single layer feedforward network Perceptron (late 50s, early 60s)  Function: pattern classification  Type: Nonlinear single layer feedforward network Linear adaptive filter or Adaline (1960s)  Function: adaptive signal processing  Type: linear single layer feedforward network

4 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS4 Spatial Filter

5 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS5 Wiener-Hopf Equations (1) The goal is to find the optimum weights that minimizes the difference between the system output y and some desired response d in the mean-square sense System equations y = Σ k=1 p w k x k e = d – y Performance measure or cost function J = 0.5E[e 2 ] ; E = expectation operator  Find the optimum weights for which J is a minimum

6 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS6 Wiener-Hopf Equations (2) Substituting and simplifying J = 0.5E[d 2 ] – E[Σ k=1 p w k x k d] + 0.5E[Σ j=1 p Σ k=1 p w j w k x j x k ] Noting that expectation is a linear operator and w a constant J = 0.5E[d 2 ] – Σ k=1 p w k E[x k d] + 0.5Σ j=1 p Σ k=1 p w j w k E[x j x k ] Let r d = E[d 2 ]; r dx (k)= E[dx k ]; r x (j, k) = E[x j x k ] Then J = 0.5r d – Σ k=1 p w k r dx (k) + 0.5Σ j=1 p Σ k=1 p w j w k r x (j,k) To find the optimum weight Nabla wk J = δJ/ δw k = 0 k = 1, 2,…, p = -r dx (k) + Σ j=1 p w j r x (j,k)

7 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS7 Wiener-Hopf Equations (3) Let w ok be the optimum weights, then Σ j=1 p w oj r x (j,k) = r dx (k); k = 1, 2,…, p  These system of equations are known as the Wiener-Hopf equations. Their solution yields the optimum weights for the Wiener filter (spatial filter) The solution of the Wiener-Hopf equations require the inverse of the autocorrelation matrix r x (j, k). This can be computationally expensive

8 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS8 Method of Steepest Descent (1)

9 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS9 Method of Steepest Descent (2) Iteratively move in the direction of steepest descent (opposite the gradient direction) until the minimum is reached approximately Let w k (n) be the weight at iteration n. Then, the gradient at iteration n is Nabla wk J(n) = -r dx (k) + Σ j=1 p w j (n)r x (j,k) Adjustment applied to w k (n) at iteration n is given by  η = positive learning rate parameter

10 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS10 Method of Steepest Descent (3) Cost function J(n) = 0.5E[e 2 (n)] is the ensemble average of all squared errors at the instant n drawn from a population of identical filters An identical update rule can be derived when cost function is J = 0.5Σ i=1 n e 2 (i) Method of steepest descent requires knowledge of the environment. Specifically, the terms r dx (k) and r x (j, k) must be known What happens in an unknown environment?  Use estimates -> least-mean-square algorithm

11 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS11 Least-Mean-Square Algorithm (1) LMS algorithm is based on instantaneous estimates of r x (j, k) and r dx (k) r’ x (j, k;n) = x j (n)x k (n) r’ dx (k;n) = x k (n)d(n) Substituting these estimates, the update rule becomes w’ k (n+1) = w’ k (n) + η[x k (n)d(n) – Σ j=1 p w’ j (n)x j (n)x k (n)] w’ k (n+1) = w’ k (n) + η[d(n) – Σ j=1 p w’ j (n)x j (n)]x k (n) w’ k (n+1) = w’ k (n) + η[d(n) – y(n)]x k (n); k = 1, 2,…, p  This is also know as the delta rule or the Widrow-Hoff rule

12 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS12 LMS Algorithm (2)

13 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS13 LMS Vs Method of Steepest Descent LMSSteepest Descent Can operate in unknown environment Cannot operate in unknown environment (r x and r dx mut be known Can operate in stationary and non-stationary environment (optimum seeking and tracking) Can operate in stationary environment only (no adaptation or tracking) Minimizes instantaneous square error Minimizes mean-square-error (or sum of squared errors) StochasticDeterministic ApproximateExact

14 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS14 Adaline (1)

15 CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS15 Adaline (2) Adaline (adaptive linear element) is an adaptive signal processing/pattern classification machine that uses LMS algorithm. Developed by Widrow and Hoff Inputs x are either -1 or +1, threshold is between 0 and 1 and output is either -1 or +1 LMS algorithm is used to determine the weights. Instead of using the output y, the net input u is used in the error computation, i.e., e = d – u (because y is quantized in the Adaline)


Download ppt "Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks."

Similar presentations


Ads by Google