Download presentation
Presentation is loading. Please wait.
1
Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks
2
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS2 Linear Adaptive Filter Linear adaptive filter performs a linear transformation of signal according to a performance measure which is minimized or maximized The development of LAFs followed work of Rosenblatt (perceptron) and early neural network researchers LAFs can be considered as linear single layer feedforward neural networks Least-mean-square algorithm is a popular learning algorithm for LAFs (and linear single layer networks) Wide applicability Signal processing Control
3
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS3 Historical Note Linear associative memory (early 1970s) Function: memory by association Type: linear single layer feedforward network Perceptron (late 50s, early 60s) Function: pattern classification Type: Nonlinear single layer feedforward network Linear adaptive filter or Adaline (1960s) Function: adaptive signal processing Type: linear single layer feedforward network
4
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS4 Spatial Filter
5
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS5 Wiener-Hopf Equations (1) The goal is to find the optimum weights that minimizes the difference between the system output y and some desired response d in the mean-square sense System equations y = Σ k=1 p w k x k e = d – y Performance measure or cost function J = 0.5E[e 2 ] ; E = expectation operator Find the optimum weights for which J is a minimum
6
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS6 Wiener-Hopf Equations (2) Substituting and simplifying J = 0.5E[d 2 ] – E[Σ k=1 p w k x k d] + 0.5E[Σ j=1 p Σ k=1 p w j w k x j x k ] Noting that expectation is a linear operator and w a constant J = 0.5E[d 2 ] – Σ k=1 p w k E[x k d] + 0.5Σ j=1 p Σ k=1 p w j w k E[x j x k ] Let r d = E[d 2 ]; r dx (k)= E[dx k ]; r x (j, k) = E[x j x k ] Then J = 0.5r d – Σ k=1 p w k r dx (k) + 0.5Σ j=1 p Σ k=1 p w j w k r x (j,k) To find the optimum weight Nabla wk J = δJ/ δw k = 0 k = 1, 2,…, p = -r dx (k) + Σ j=1 p w j r x (j,k)
7
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS7 Wiener-Hopf Equations (3) Let w ok be the optimum weights, then Σ j=1 p w oj r x (j,k) = r dx (k); k = 1, 2,…, p These system of equations are known as the Wiener-Hopf equations. Their solution yields the optimum weights for the Wiener filter (spatial filter) The solution of the Wiener-Hopf equations require the inverse of the autocorrelation matrix r x (j, k). This can be computationally expensive
8
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS8 Method of Steepest Descent (1)
9
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS9 Method of Steepest Descent (2) Iteratively move in the direction of steepest descent (opposite the gradient direction) until the minimum is reached approximately Let w k (n) be the weight at iteration n. Then, the gradient at iteration n is Nabla wk J(n) = -r dx (k) + Σ j=1 p w j (n)r x (j,k) Adjustment applied to w k (n) at iteration n is given by η = positive learning rate parameter
10
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS10 Method of Steepest Descent (3) Cost function J(n) = 0.5E[e 2 (n)] is the ensemble average of all squared errors at the instant n drawn from a population of identical filters An identical update rule can be derived when cost function is J = 0.5Σ i=1 n e 2 (i) Method of steepest descent requires knowledge of the environment. Specifically, the terms r dx (k) and r x (j, k) must be known What happens in an unknown environment? Use estimates -> least-mean-square algorithm
11
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS11 Least-Mean-Square Algorithm (1) LMS algorithm is based on instantaneous estimates of r x (j, k) and r dx (k) r’ x (j, k;n) = x j (n)x k (n) r’ dx (k;n) = x k (n)d(n) Substituting these estimates, the update rule becomes w’ k (n+1) = w’ k (n) + η[x k (n)d(n) – Σ j=1 p w’ j (n)x j (n)x k (n)] w’ k (n+1) = w’ k (n) + η[d(n) – Σ j=1 p w’ j (n)x j (n)]x k (n) w’ k (n+1) = w’ k (n) + η[d(n) – y(n)]x k (n); k = 1, 2,…, p This is also know as the delta rule or the Widrow-Hoff rule
12
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS12 LMS Algorithm (2)
13
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS13 LMS Vs Method of Steepest Descent LMSSteepest Descent Can operate in unknown environment Cannot operate in unknown environment (r x and r dx mut be known Can operate in stationary and non-stationary environment (optimum seeking and tracking) Can operate in stationary environment only (no adaptation or tracking) Minimizes instantaneous square error Minimizes mean-square-error (or sum of squared errors) StochasticDeterministic ApproximateExact
14
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS14 Adaline (1)
15
CS/CMPE 537 - Neural Networks (Sp 2006-2007) - Asim Karim @ LUMS15 Adaline (2) Adaline (adaptive linear element) is an adaptive signal processing/pattern classification machine that uses LMS algorithm. Developed by Widrow and Hoff Inputs x are either -1 or +1, threshold is between 0 and 1 and output is either -1 or +1 LMS algorithm is used to determine the weights. Instead of using the output y, the net input u is used in the error computation, i.e., e = d – u (because y is quantized in the Adaline)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.