Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system.

Similar presentations


Presentation on theme: "Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system."— Presentation transcript:

1 Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system by optimization

2 Example: fit y = ax + b to m data points Find unknown a and b that minimize sum of squared residuals

3 If know uncertainty in data, make the fit best where the uncertainty is least

4 Data fitting formulated as an over-determined linear system: Given dataset {(t k,y k ), k=1,...,m} and functions {f j (t), j=1,...,n}, find the linear combination of functions that best represents the data. Let a kj = f j (t k ) (j th function evaluated at k th data point) Let b = [y 1, y 2,...,y m ] T (column vector of the measured values) Let x = [x 1, x 2,...,x n ] T (column vector of unknown parameters) If n = m, then Ax = b might have a solution that gives to a combination of functions that goes through all of the data points. Even if this solution exist, if is probably not the result that we want

5 Usually, we are seeking a model of the data as “noise” superimposed on a smooth variation described by a linear combination of f(t) functions with n << m Under the condition, n << m, Ax = b does not have a solution because x is “over determined” To get the “dimensional reduction” we want, find an approximate solution which minimizes a norm of the residual vector r = b – Ax Choosing the Euclidean norm for minimization  linear least squares In this example, model is a parabola

6 Let y = Ax (like b, y is a vector with m components) y  span(A) (y is a linear combination of the columns of A) For given data set, b is a constant vector and || b – y || 2 = f(y) f(y) is continuous and strictly convex.  f(y) is guaranteed to have a minimum Find y such that f(y) is a minimum The vector y  span(A) that is closest to b is unique; however, this unique vector may not correspond to a unique set of unknown parameters x If Ax 1 = Ax 2 = y then A(x 2 – x 1 ) =0 and z = (x 2 – x 1 )  o Columns of A must be linearly dependent. This condition is called rank deficiency Unless otherwise stated, assume A has full rank.

7 Normal Equations: Let r = b – Ax and define f(x) = (||r|| 2 ) 2 = r T r f(x) = (b – Ax) T (b – Ax) = b T b –2x T A T b + x T A T Ax A necessary condition for x 0 to be minimum of f(x) is  f(x 0 ) = o, where  f is an n-vector with components equal to the partial derivatives of f(x) with respect to the unknown parameters  f(x) = 2A T Ax – 2A T b = o  the optimal set of parameters is a solution of the nxn symmetric system A T Ax = A T b called the “normal” equations of the linear least squares problem.

8

9 Same set of equations as obtained by objective function method

10 Fit a polynomial of degree p-1 to m data points (x k, y k )

11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Given the parameters, c 0, c 1,…,c k, that minimize the sum of squared deviations, evaluate fit at all the data points x t by r = Y fit – y are the residuals at the data points ||r|| 2 = r T r is the sum of squared residuals

12 Matrix formulation of weighted linear least squares Let  y be a column vector of uncertainties in measured values w = 1./  y is a column vector of the square root of the weights W = diag(w) is a diagonal matrix V = coefficient matrix of the un-weighted least-squares problem (Vandermonde matrix in the case of polynomials fitting) y = column vector of observations A = W V = weighted coefficient matrix b = W y = weighted column vector of measured values Ax = b is over-determined linear system for weighted linear least squares normal equations A T Ax = (WV) T WVx = A T b = (WV) T W y (note that w = 1./  y gets squared at this point)

13 Weighted parabola fit of data from text. See next side for graph.

14 Weighted quadratic fit to data on text p495 Uncertainties chosen to show how small uncertainty draws fit to data point at 95

15 Assignment 18 due 4/19/16 Use normal equations to fit a parabola to the data set t=linspace(0,10,21) y=[2.9, 2.7, 4.8, 5.3, 7.1, 7.6, 7.7, 7.6, 9.4, 9, 9.6,10, 10.2, 9.7, 8.3, 8.4, 9, 8.3, 6.6, 6.7, 4.1] with weights that the reciprocal of the square of the uncertain in y, which is 10% of the value of y. Plot the data with error bars and the fit on the same set of axes. Use MatLab’s errorbar(t,y,dy,’*’) function. Show the optimum value of the parameters and calculate the sum of squared deviations between fit and data.

16 Linear models with different types of data Estimate height of 3 hills by combining 2 types of measurements heights relative fixed reference heights relative to each other height of 3 hills relative to a fixed reference. hill 1 = 1237 m hill 2 = 1941 m hill 3 = 2417 m relative heights of the hills hill 2 relative to 1 = 711 m hill 3 relative to 1 = 1177 m hill 3 relative to 2 = 475 m Construct a linear model of the data with height of hills relative to fixed reference as parameters

17 Ax = = = b Construct normal equations A T Ax = = A T b = Solve by Cholesky factorization x = Note that measurements of relative hill heights affect estimate of heights relative fixed reference Design matrix: How model parameters relate to data taken.

18 Orthogonal Transformations: An alternative to normal equations Using normal equations A T Ax = A T b to solute a over-determined linear system Ax = b can have the following numerical problems. (1)Loss of information in construction of A T A and A T b A = with 0 <  <  mach ½ A T A = = in floating point arithmetic

19 2) “condition-squaring effect” cond(A T A)  [cond(A)] 2  if cond(A) is large, the normal equations will be very ill-conditioned. To avoid these problems, we need a transformation that converts the mxn matrix A into a nxn upper triangular matrix so that the linear least squares problem can be solved by backward substitution Gauss elimination cannot be used to develop this transformation because it does not preserve the Euclidean norm. Orthogonal Transformations: An alternative to normal equations

20 A real square matrix Q is “orthogonal” if Q T Q = I Qv is called an orthogonal transformation of v ||Qv|| 2 2 = (Qv) T (Qv) = v T Q T Qv = v T v = ||v|| 2 2 Orthogonal transformations preserve Euclidean norm

21 QR factorization in MatLab if A is an mxn matrix with m > n, then [Q,R] = qr(A) returns Q = mxm orthogonal matrix and R = nxn upper triangular matrix. = mxn matrix with zeros below R A=Q

22 Solution of linear least squares problem by QR factorization: Given a QR factorization for A, find an approximate solution of the over-determined system Ax = b Norm of residuals ||r|| 2 2 = ||b - Ax|| 2 2 = ||b – Q x|| 2 2 Since orthogonal transformation preserves Euclidean norm ||r|| 2 2 = ||Q T r|| 2 2 = ||Q T b – x|| 2 2 Write Q T b as where c 1 is an n-vector an c 2 is (m-n)-vector ||r|| 2 2 = || || 2 2 = ||c 1 - Rx|| 2 2 +||c 2 || 2 2 Minimum ||r|| 2 2 obtained by solving Rx = c 1, which is an nxn upper triangular system for parameters of best fit. Minimum sum of squared residuals is ||c 2 || 2 2 = ||Q T b || 2 2 - ||c 1 || 2 2

23 Reduced QR factorization Q = [Q 1 Q 2 ] is an mxm matrix Q 1 contains the 1 st n columns of Q Q 2 contains the remaining m – n columns of Q A = Q = [Q 1 Q 2 ] = Q 1 R A = Q 1 R is the reduced QR factorization of A Obtained in MATLAB by [Q,R] = qr(A,0) Given A = Q 1 R, we can solve Ax = b by back substitution because Q 1 T Ax = Q 1 T Q 1 Rx = Rx = Q 1 T b = c 1 Reduced QR factorization does not provide c 2 needed to calculate the minimum sum of squared residuals.

24 Assignment 19 due 4/21/16 Use QR factorization to fit a parabola to the data set t=linspace(0,10,21) y=[2.9, 2.7, 4.8, 5.3, 7.1, 7.6, 7.7, 7.6, 9.4, 9, 9.6,10, 10.2, 9.7, 8.3, 8.4, 9, 8.3, 6.6, 6.7, 4.1] Note: Same data as assignment 18 but without dy’s Plot the data and the fit on the same set of axes Show the optimum value of the parameters Calculate the sum of squared deviations between fit and data directly from the QR factorization

25 Orthogonalization Methods Like Gauss elimination, QR factorization introduces zeros into A to produce an upper triangular form Use orthogonal transformations rather than elementary elimination matrices so that Euclidean norm is preserved Three commonly used procedures are (1) Householder transformations (elementary reflectors) (2) Givens transformations (plane rotations) (3) Gram-Schmidt orthogonalization

26 Householder transformations H = I – 2v v T /(v T v) where v is a non-zero vector chosen so that HA induces zeros below the diagonal of some column of matrix A H is both orthogonal and symmetric (i.e. H = H T = H -1 ) Given m-vector a, how do we find v such that Ha = =  =  e 1 ? ||Ha|| 2 =   e 1 = Ha = (I – 2v v T /(v T v))a = a – 2v v T a/(v T v) = a – 2v (v T a)/(v T v)  v = (v T v)/(2(v T a))(a –  e 1 ) = c(||v||)(a –  e 1 ) c(||v||) = 1, so v = a –  e 1 Ha must have the same Euclidian norm as a, so  = + ||a|| 2 v 1 = a 1 –  to avoid loss of significance chose  = -sign(a 1 )||a|| 2 (i.e. if a 1 is positive then  is negative)

27 Example: a = ||a|| 2 = 3a 1 is positive v = a –  e 1 = a + 3e 1 = v T a = 15v T v = 302(v T a)/(v T v) = 1 Ha = a – 2v (v T a)/(v T v) = - = Expected result with ||a|| 2 = 3 and a 1 is positive Note: All the work was done with vectors a and v. No need to construct H Given ||a|| 2 and sign(a 1 ) we can construct v and Ha

28 A similar approach works for annihilating other components of m-vector a a = where a 1 is a (k-1)-vector with 1 < k < m v = -  e k where  = -sign(a k )||a 2 || 2 Ha = a – 2v (v T a)/(v T v) = is an m-vector with zeros below the k th component and has the same Euclidian norm as a This method can be applied successively to the columns of an mxn matrix A Again, given ||a 2 || 2 and sign(a 2 ) we can construct v and Ha

29 QR factorization applied to over-determined Surveyor’s problem Ax = = = b ||a 1 || 2 = sqrt(3)=1.7321 a 1 is positive

30 We know H 1 a 1 =  e 1 but must apply H 1 to 2 nd and 3 rd columns of A and to b

31 Transform column 2 of H 1 A

32 Transform column 3 of H 2 H 1 A Solve the upper triangular system Rx = c 1 by back substitution x =[1236, 1943, 2416] T Sum of squared residuals = ||c 2 || 2 = 35

33 Rank deficiency Let Ax = b is a mxn least-squares problem with m>n. b  span(A) because system is over-determined Consider all y  span(A)  y = Ax r(y) = b – y are residuals associated with approximate solutions to Ax = b A unique y min exist for which || r(y) || 2 is smallest However, the solution of Ax = y min may not be unique If x 1 and x 2 exist such that Ax 1 = Ax 2 = y min Then z = x 2 – x 1  o, and Az = o Columns of A must be linearly dependent This condition is called rank deficiency.

34 Consequences of rank deficiency for the linear least squares problem A T A will be singular and normal equation method will fail QR factorization is possible, but R will be singular Zero on diagonal prevents solution by back substitution. Linear least squares problem as formulated does not have a unique solution (design matrix is flawed).

35 Clever surveyors get better statistics Experimental design: Estimate height of 3 hills by combining 2 types of measurements heights relative fixed reference relative height to each other height of 3 hills relative to a fixed reference. hill 1 = 1237 m hill 2 = 1941 m hill 3 = 2417 m relative heights of the hills hill 2 relative to 1 = 711 m hill 3 relative to 1 = 1177 m hill 3 relative to 2 = 475 m Construct a linear model of the data with height of hills relative to fixed reference be parameters All 3 parameters can be determined

36 Example of rank-deficient linear least squares problem Incompetent assistant loses data on heights of hills above the fixed reference but continue to model data in terms of heights above the fixed reference. Ax = = = b Flawed design matrix A is singular (all row-sums are zero). Unique solution does not exist. Chosen parameters are not all “identifiable” from acquired data. Obviously, cannot estimate heights of hills relative to a fixed reference from their relative heights only

37 Az = = = b New design matrix z 1 = x 2 – x 1 = 771 height of hill 2 relative to1 z 2 = x 3 – x 1 = 1177 height of hill 3 relative to 1 z 2 –z 1 = x 3 –x 2 = 475 height of hill 3 relative to 2 A is not singular. Normal equations can be used to find optimal z 1 and z 2 Solution: z 1 = 708z 2 = 1180 Note that data on hill 3 relative to 2 did influence model parameters. Reformulate problem with the height of hill 1 as reference point

38 Alternative solution by QR factorization of original design matrix Problem is reduced to one where the number of parameters equals the rank(A) = 2. Solve R 1 x = c 1 by back substitution Solution: x 1 = -1180 height of hill 1 relative to hill 3 x 2 = -472 height of hill 3 relative to hill 3 Zero on diagonal of R determined which hill would be treated as reference = b

39 Singular Value Decomposition A = USV T is the singular value decomposition of mxn matrix A U is an mxm orthogonal matrix. V is an nxn orthogonal matrix. S is an mxn diagonal matrix with diagonal elements  i > 0 that are called the “singular values” of A The columns of U are called the “left singular vectors” of A The columns of V are called the “right singular vectors” of A SVD in MatLab is [U, S, V] = svd(A)

40 Singular values of A are  1 = 25.5,  2 = 1.29, and  3 = 0 Rank of A = number of non-zero singular values = 2 in this case In floating-point arithmetic, small singular values may correspond to zeros in exact calculations Usual procedure is to sort singular values in decreasing order and regard singular values below some threshold as zero This defines the “numerical” rank of a matrix

41 Solution of linear systems by SVD Ax = b is a linear system with A rectangular or square and possibly rank deficient  is the diagonal matrix with diagonal elements equal to the singular values of A.  + is a diagonal matrix with diagonal elements 1/  if   0 and 0 if  = 0   +  I Ax = U  V T x = b  x = V  + U T b The rows of U T and columns of V in the product V  + U T b are restricted by zeros in  + to those associated with non-zero singular values x = v i (u i T b/  i ), where u i and v i are the left and right singular vectors, respectively, associated with  i.

42 Solution of Ax = b with b = [1 2 3 4] T using exact singular values Note that system has 3 unknown but only 2 non-zero singular values u 1 T b/  1 = 0.2148u 2 T b/  2 = 0.2155 and x = 0.2148 + 0.2155 =

43 Solution of linear systems when singular values are not exact Same linear system as before Display MatLab’s SVD Specify a threshold for non-zero singular values Solve linear system with this threshold Solve linear system using the M-P pseudo inverse x = V  + U T b

44 Solution of linear systems when singular values are not exact U =  =  3 is zero in exact calculations V = Threshold on non-zero singular values All 3 singular values exceed threshold Solution different from exact results because threshold too small Solution use M-P pseudo inverse is the same as exact results eps =

45 Use singular value decomposition to fit a parabola to the data set on page 495 of Cheney & Kincaid 6th edition: surface tension as a function of temperature. T=0, 10, 20, 30, 40, 80, 90, 95 S=68.0, 67.1, 66.4, 65.6, 64.6, 61.8, 61.0, 60.0 Show the optimum value of the parameters and the minimum sum of squared deviations of the fit from the data points. Plot the fit and data (no error bars) on the same set of axes. Use Moore-Penrose psedo inverse to solve for unknown parameters Assignment 20 due 4/26/16


Download ppt "Chapter 12: Data Analysis by linear least squares Overview: Formulate problem as an over-determined linear system of equations Solve the linear system."

Similar presentations


Ads by Google