Principal Components What matters most?
DRAFT: Copyright GA Tagliarini, PhD Basic Statistics Assume x1, x2,…, xn represent a distribution x of samples of a random variable. The expected value or mean of the distribution x, written E{x} or m, is given by E{x} = m = (x1+ x2+,…,+ xn)/n The population variance E{(x-m)2} = σ2, the mean of the squared deviations from the mean, is given by σ2=[(x1-m)2+(x2-m)2+…+(xn-m)2]/n 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Generalization to Vectors Assume d-dimensional vectors xi=<x1,…,xd>T whose dimensions sample random distributions 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Interpretation The mean vector is d-dimensional The covariance matrix: d x d Symmetric Real valued Main diagonal entries are the variances of the d dimensions The off-diagonal entry in row i column j is the covariance in dimensions i and j 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding a Covariance Matrix 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Eigenvectors and Eigenvalues Suppose A is an d x d matrix, λ is a scalar, and e ≠ 0 is an d-dimensional vector. If Ae = λe, then e is an eigenvector of A corresponding to the eigenvalue λ. If A is real and symmetric, one can always find d, unit length, orthogonal (orthonormal) eigenvectors for A 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Characteristic Polynomial Since eigenvalues and the corresponding eigenvectors satisfy Ae = λe, one can find an eigenvector corresponding to λ by solving the homogeneous linear system (λI-A)e = 0 To find λ, construct the characteristic polynomial p(λ) = det(λI-A), where det(.) is the determinant operator 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding Eigenvalues 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding Eigenvalues 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Example: Finding Eigenvalues 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ1 and e1 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ1 and e1 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ1 and e1 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ2 and e2 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ2 and e2 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ3 and e3 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Eigenvectors For The Eigenvalues: λ3 and e3 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Assumptions and Goals of PCA Given a set of vectors xi, i=1,…,n Goal: Find a unit vector u such that the variance after projecting onto the line through u is maximal Assume that the xi have zero mean, 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Pictorially-Principal Axis Raw data Data with principal axis z z y y x x 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Pictorially-Second Axis Raw data Data with second principal axis z z y y x x 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Pictorially-Third Axis Raw data Data with third principal axis z z y y x x 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD All Three Axes Raw data Data with three new axis 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Recall from Vector Projection 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Setting Things Up There is some abuse of notation in the last two expressions, which represent matrix operations having 1x1 results. 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
The Scalars Have Zero Mean Notice: 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
What is the Variance of the Projection Scalars? Where R is the covariance matrix of the xi. 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Variance as a Function of u Variance is commonly denoted σ2 Variance of the scalars depends on u and is given by σ2(u) = uT R u Using Lagrange multipliers one may maximize variance by seeking extrema for J = uT R u + λ (1 –uT u) Differentiating we obtain deluJ = 2 (R u - λ u) 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Extreme Variance Setting R u - λ u = 0 Or R u = λ u So what??? 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Finding Extreme Variance Setting R u - λ u = 0 Or R u = λ u The variance (potential capacity for discrimination) is maximized when u is an eigenvector of the covariance matrix. 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Which Eigenvector?? Since σ2(u) = uT R u And σ2(u) = uT R u = uT (λ u) = λ (uTu) = λ Maximum variance (capacity for discrimination) Equals the largest eigenvalue Occurs when u is the corresponding eigenvector 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Notes The covariance matrix is symmetric Its eigenvectors are orthogonal The unit eigenvectors form an orthonormal basis One may order the eigenvectors ui according to the magnitude of the corresponding eigenvalues λ1 ≥ λ2≥ … ≥ λi≥ … ≥ λd 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Reducing Dimensionality Since the uj, j=1,…,d, form a basis, any x can be written as Or estimated for some m < d, with error e as 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
Accounting for Variability Since the u’s are orthogonal, and e are likewise Hence, x e 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Using PCA Find the mean of the distribution Find the covariance matrix Find the eigenvalues and eigenvectors Reduce data dimensionality by projecting onto a sub-space of the original determined by using only the eigenvectors with largest eigenvalues 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
A Similarity Transform 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD
DRAFT: Copyright GA Tagliarini, PhD Hotelling Transform Form matrix A using the eigenvectors of Cx Order the eigenvalues from largest to smallest l1≥l2≥...≥ld and the corresponding eigenvectors e1, e2,…, ed Enter eigenvectors as the rows in A y = A(x-mx) my = E{y} = 0 Cy = A Cx AT 2/28/2019 DRAFT: Copyright GA Tagliarini, PhD