Lecture XXVII
Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r m. For mathematical simplicity, we may want to form an orthogonal basis for this space. One way to form such a basis is the Gram-Schmit orthonormalization. In this procedure, we want to generate a new set of vectors {y 1,…y r } that are orthonormal.
The Gram-Schmit process is:
Example
The vectors can then be normalized to one. However, to test for orthogonality:
Theorem 2.13 Every r-dimensional vector space, except the zero-dimensional space {0}, has an orthonormal basis.
Theorem 2.14 Let {z 1,…z r } be an orthornomal basis for some vector space S, of R m. Then each x R m can be expressed uniquely as were u S and v is a vector that is orthogonal to every vector in S.
Definition 2.10 Let S be a vector subspace of R m. The orthogonal complement of S, denoted S , is the collection of all vectors in R m that are orthogonal to every vector in S: That is, S ={x:x R m and x’y=0 for all y S}. Theorem If S is a vector subspace of R m then its orthogonal complement S is also a vector subspace of R m.
Projection Matrices The orthogonal projection of an m x 1 vector x onto a vector space S can be expressed in matrix form. Let {z 1,…z r } be any othonormal basis for S while {z 1,…z m } is an orthonormal basis for R m. Any vector x can be written as:
Aggregating ’, 2 ’)’ where 1 =( 1 … r )’ and 2 =( r+1 … m )’ and assuming a similar decomposition of Z=[Z 1 Z 2 ], the vector x can be written as: given orthogonality, we know that Z 1 ’Z 1 =I r and Z 1 ’Z 2 =(0), and so
Theorem 2.17 Suppose the columns of the m x r matrix Z 1 from an orthonormal basis for the vector space S which is a subspace of R m. If x R m, the orthogonal projection of x onto S is given by Z 1 Z 1 ’x. Projection matrices allow the division of the space into a spanned space and a set of orthogonal deviations from the spanning set. One such separation involves the Gram-Schmit system.
In general, if we define the m x r matrix X 1 =(x 1,…x r ) and define the linear transformation of this matrix that produces an orthonormal basis as A, so that: we are left with the result that:
Given that the matrix A is nonsingular, the projection matrix that maps any vector x onto the spanning set then becomes:
Ordinary least squares is also a spanning decomposition. In the traditional linear model: within this formulation is chosen to minimize the error between y and estimated y:
This problem implies minimizing the distance between the observed y and the predicted plane X , which implies orthogonality. If X has full column rank, the projection space becomes X(X’X) -1 X’ and the projection then becomes:
Premultiplying each side by X’ yields:
Idempotent matrices can be defined as any matrix such that AA=A. Note that the sum of square errors under the projection can be expressed as:
In general, the matrix I n -X(X’X) -1 X’ is referred to as an idempotent matrix. An idempotent matrix is one that AA=A:
Thus, the SSE can be expressed as: which is the sum of the orthogonal errors from the regression
Eigenvalues and Eigenvectors Eigenvalues and eigenvectors (or more appropriately latent roots and characteristic vectors) are defined by the solution for a nonzero x. Mathematically, we can solve for the eigenvalue by rearranging the terms:
Solving for then involves solving the characteristic equation that is implied by: Again using the matrix in the previous example:
In general, there are m roots to the characteristic equation. Some of these roots may be the same. In the above case, the roots are complex. Turning to another example:
The eigenvectors are then determined by the linear dependence in A- I matrix. Taking the last example: Obviously, the first and second rows are linear. The reduced system then implies that as long as x 1 =x 2 and x 3 =0, the resulting matrix is zero.
Theorem For any symmetric matrix, A, there exists an orthogonal matrix H (that is, a square matrix satisfying H’H=I) wuch that: where is a diagonal matrix. The diagonal elements of are called the characteristic roots (or eigenvalues) of A. The i th column of H is called the characteristic vector (or eigenvector) of A corresponding to the characteristic root of A.
This proof follows directly from the definition of eigenvalues. Letting H be a matrix with eigenvalues in the columns it is obvious that
Kronecker Products Two special matrix operations that you will encounter are the Kronecker product and vec() operators. The Kronecker product is a matrix is an element by element multiplication of the elements of the first matrix by the entire second matrix:
The vec(.) operator then involves stacking the columns of a matrix on top of one another.