Least-Mean-Square Algorithm CS/CMPE 537 – Neural Networks
CS/CMPE Neural Networks (Sp ) - Asim LUMS2 Linear Adaptive Filter Linear adaptive filter performs a linear transformation of signal according to a performance measure which is minimized or maximized The development of LAFs followed work of Rosenblatt (perceptron) and early neural network researchers LAFs can be considered as linear single layer feedforward neural networks Least-mean-square algorithm is a popular learning algorithm for LAFs (and linear single layer networks) Wide applicability Signal processing Control
CS/CMPE Neural Networks (Sp ) - Asim LUMS3 Historical Note Linear associative memory (early 1970s) Function: memory by association Type: linear single layer feedforward network Perceptron (late 50s, early 60s) Function: pattern classification Type: Nonlinear single layer feedforward network Linear adaptive filter or Adaline (1960s) Function: adaptive signal processing Type: linear single layer feedforward network
CS/CMPE Neural Networks (Sp ) - Asim LUMS4 Spatial Filter
CS/CMPE Neural Networks (Sp ) - Asim LUMS5 Wiener-Hopf Equations (1) The goal is to find the optimum weights that minimizes the difference between the system output y and some desired response d in the mean-square sense System equations y = Σ k=1 p w k x k e = d – y Performance measure or cost function J = 0.5E[e 2 ] ; E = expectation operator Find the optimum weights for which J is a minimum
CS/CMPE Neural Networks (Sp ) - Asim LUMS6 Wiener-Hopf Equations (2) Substituting and simplifying J = 0.5E[d 2 ] – E[Σ k=1 p w k x k d] + 0.5E[Σ j=1 p Σ k=1 p w j w k x j x k ] Noting that expectation is a linear operator and w a constant J = 0.5E[d 2 ] – Σ k=1 p w k E[x k d] + 0.5Σ j=1 p Σ k=1 p w j w k E[x j x k ] Let r d = E[d 2 ]; r dx (k)= E[dx k ]; r x (j, k) = E[x j x k ] Then J = 0.5r d – Σ k=1 p w k r dx (k) + 0.5Σ j=1 p Σ k=1 p w j w k r x (j,k) To find the optimum weight Nabla wk J = δJ/ δw k = 0 k = 1, 2,…, p = -r dx (k) + Σ j=1 p w j r x (j,k)
CS/CMPE Neural Networks (Sp ) - Asim LUMS7 Wiener-Hopf Equations (3) Let w ok be the optimum weights, then Σ j=1 p w oj r x (j,k) = r dx (k); k = 1, 2,…, p These system of equations are known as the Wiener-Hopf equations. Their solution yields the optimum weights for the Wiener filter (spatial filter) The solution of the Wiener-Hopf equations require the inverse of the autocorrelation matrix r x (j, k). This can be computationally expensive
CS/CMPE Neural Networks (Sp ) - Asim LUMS8 Method of Steepest Descent (1)
CS/CMPE Neural Networks (Sp ) - Asim LUMS9 Method of Steepest Descent (2) Iteratively move in the direction of steepest descent (opposite the gradient direction) until the minimum is reached approximately Let w k (n) be the weight at iteration n. Then, the gradient at iteration n is Nabla wk J(n) = -r dx (k) + Σ j=1 p w j (n)r x (j,k) Adjustment applied to w k (n) at iteration n is given by η = positive learning rate parameter
CS/CMPE Neural Networks (Sp ) - Asim LUMS10 Method of Steepest Descent (3) Cost function J(n) = 0.5E[e 2 (n)] is the ensemble average of all squared errors at the instant n drawn from a population of identical filters An identical update rule can be derived when cost function is J = 0.5Σ i=1 n e 2 (i) Method of steepest descent requires knowledge of the environment. Specifically, the terms r dx (k) and r x (j, k) must be known What happens in an unknown environment? Use estimates -> least-mean-square algorithm
CS/CMPE Neural Networks (Sp ) - Asim LUMS11 Least-Mean-Square Algorithm (1) LMS algorithm is based on instantaneous estimates of r x (j, k) and r dx (k) r’ x (j, k;n) = x j (n)x k (n) r’ dx (k;n) = x k (n)d(n) Substituting these estimates, the update rule becomes w’ k (n+1) = w’ k (n) + η[x k (n)d(n) – Σ j=1 p w’ j (n)x j (n)x k (n)] w’ k (n+1) = w’ k (n) + η[d(n) – Σ j=1 p w’ j (n)x j (n)]x k (n) w’ k (n+1) = w’ k (n) + η[d(n) – y(n)]x k (n); k = 1, 2,…, p This is also know as the delta rule or the Widrow-Hoff rule
CS/CMPE Neural Networks (Sp ) - Asim LUMS12 LMS Algorithm (2)
CS/CMPE Neural Networks (Sp ) - Asim LUMS13 LMS Vs Method of Steepest Descent LMSSteepest Descent Can operate in unknown environment Cannot operate in unknown environment (r x and r dx mut be known Can operate in stationary and non-stationary environment (optimum seeking and tracking) Can operate in stationary environment only (no adaptation or tracking) Minimizes instantaneous square error Minimizes mean-square-error (or sum of squared errors) StochasticDeterministic ApproximateExact
CS/CMPE Neural Networks (Sp ) - Asim LUMS14 Adaline (1)
CS/CMPE Neural Networks (Sp ) - Asim LUMS15 Adaline (2) Adaline (adaptive linear element) is an adaptive signal processing/pattern classification machine that uses LMS algorithm. Developed by Widrow and Hoff Inputs x are either -1 or +1, threshold is between 0 and 1 and output is either -1 or +1 LMS algorithm is used to determine the weights. Instead of using the output y, the net input u is used in the error computation, i.e., e = d – u (because y is quantized in the Adaline)