Principal Components Analysis

Principal Components Analysis
BMTRY 726 7/17/2018

Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of those variables Why: There are several reasons we may want to do this (1) Dimension Reduction (use k of p components) -Note, total variability still requires p components (2) Identify “hidden” underlying relationships (i.e. patterns in the data) -Use these relationships in further analyses -e.g. regression with multi-collinearity (3) Select subsets of variables

Defining “Genetic” Race by PCA
Zakharia F, et al. (2009). Characterizing the admixed African Ancestry of African Americans. Genome Biology, 10(12): R141

“Exact” Principal Components
We can represent data Xn x p as linear combinations of p random measurements on j = 1, 2,…, n subjects

We can also find the moments for these p linear combinations of our data X

Principal components are those combinations that are: (1) Uncorrelated (linear combinations Y1, Y2,…, Yp) (2) Variance as large as possible (3) Subject to:

Finding PC’s Under Constraints
So how do we find PC’s that meet the constraints we just discussed? We want to maximize subject to the constraint that This constrained maximization problem can be done using the method of Lagrange multipliers Thus, for the first PC we want to maximize

Finding 1st PC Under Constraints
Differentiate w.r.t a1 :

Finding 2nd PC Under Constraints
We can think about this in the same way, but we now have an additional constraint We want to maximize but subject to So now we need to maximize

Finding 2nd PC Under Constraints
Differentiate w.r.t a2 :

Finding PC’s Under Constraints
But how do we choose our eigenvector (i.e. which eigenvector corresponds to which PC?) We can see that what we want to maximize is So we choose li to be as large as possible If l1 is our largest eigenvalue with corresponding eigenvector ei then the solution for our max is

So we can compute the PCs from the variance matrix of X, S:

Properties We can also find the moments of our PC’s

Properties Normality assumption not required to find PC’s If Xj ~ Np(m,S) then: Total Variance:

Principal Components Consider data with p random measures on j = 1,2,…,n subjects For the jth subject we then have the random vector X2 m2 X1 m1

Graphic Representation

Graphic Representation
Now suppose X1, X2 ~ N2(m, S) Y1 axis selected to maximize variation in the scores Y2 axis must be orthogonal to Y1 and maximize variation in the scores X2 Y1 Y2 X1

Dimension Reduction Proportion of total variance accounted for by the first k components is If this proportion is large, we might want to restrict our attention to only these first k components Keep in mind, components are simply linear combinations of the original p measurements Ideally look for meaningful interpretations of our choose k components

PC’s from Standardized Variables
We may want to standardize our variables before finding PCs

PC’s from Standardized Variables
So the covariance of V equals the correlation of X We can define our PC’s for Z the same way as before….

Compare Standardized/Non-standardized PCs

Estimation In general we do not know what S is- we must estimate if from the sample So what are our estimated principal components?

Sample Properties In general we do not know what S is- estimate it from sample So what are our estimated principal components?

Centering We often center our observations before defining our PCs The centered PCs are found according to:

Example Jolicoeur and Mosimann (1960) conducted a study looking at the relationship between size and shape of painted turtle carapaces. We can develop PC’s for natural log of length, width, and height of female turtles’ carapaces

Example The first PC is: This might be interpreted as an overall size component Shell dimensions small Shell dimensions large Small values y1 Large values y1

Example The second PC is: Small values y2 Large values y2

Example The third PC is: Small values y3 Large values y3

Example Consider the proportion of variability accounted for by each PC

Example How are the PCs correlated with each of the x’s? Then x1 Trait
0.99 0.09 -0.08 X2 0.06 0.15 X3 -0.14 -0.01

Interpretation of PCs Consider data x1, x2, …., xp: PCs are projections onto the estimated eigenvectors -1st PC is the one with the largest projection -For data reduction, only use PCA if the eigenvalues vary -If x’s are uncorrelated, we can’t really do data reduction

Choosing Number of PCs Often the goal of PCA is dimension reduction of data Select a limited number of PCs that capture majority of the variability in the data How do we decide how many PCs to include: 1. Scree plot: plot of versus i 2. Select all PCs with (for standardized observations) 3. Choose some proportion of the variance you want to account for

Scree Plots

Choosing Number of PCs Should principal components that only account for a small proportion of variance always be ignored? Not necessarily, they may indicate near perfect colinearities among traits In the turtle example, this is true-very little variation of the variation in shell measurements can be attributed to the 2nd and 3rd components

Large Sample Properties
If n is large, there are nice properties we can use

Large Sample Properties
Also for our estimated eigenvectors These results assume that X1, X2, …., Xn are N(m, S)

Uses Other Than Dimension Reduction
Principal component analysis most useful for dimensionality reduction Can also be used in a regression setting is as one way to handle multi-collinearity A caveat… principal components can be difficult to interpret and should therefore be used with caution

Regression and Collinearity
If a predictor matrix X is not full rank, a linear combination a’X of columns in X equals 0 In such a case the columns are co-linear in which case the inverse of X’X doesn’t exist Rare that aX == 0, but if a combination exists that is nearly zero, (X’X)-1 is numerically unstable For regression, results in very large estimated variance of the model parameters making it difficult to identify significant regression coefficients

Principal Component Regression (PCR)
Define a set of p principal components Linear combinations of the original predictors in X Calculate the value of the selected p components for each outcome in the data Fit a linear regression model to these derived components using these calculated values as your new predictors

Principal Component Regression (PCR)
Use of PCR can serve two objectives Eliminate issue of collinearity by developing a set of p orthoganol derived components, z1, z2, …, zp, as an alternative of the original correlated p features, x1, x2, …, xp, in X Reduce the number of predictors used to fit the regression model for y by including only m< p of the original predictors Choose the m components that capture a majority of the variability in X NOTE: the m components are often still dependent on the p original features

Phthalate and Birth Outcomes
Due to use in plastics/person care products, phthalate exposure is prevalent in the general population and infants/fetuses are vulnerable to exposure through maternal exposure. An investigator is interested in examining the impact of phthalate exposure on infant birth outcomes. She collects information on 297 mothers during pregnancy and them evaluates infant outcomes at birth. Outcomes: birth weight, head circumference, urogenital measures Phtalates: MBP, MBZP, MEHHP, MEHP, MEOHP, MIBP, MEP, MMP

Correlations Birth Weight MBP MBZP MEHHP MEHP MEOHP MIBP MEP MMP 1
-0.063 -0.103 0.016 -0.053 0.019 -0.130 -0.064 0.802 0.713 0.581 0.679 0.848 0.584 0.638 0.623 0.531 0.591 0.720 0.520 0.561 0.821 0.984 0.661 0.444 0.537 0.834 0.523 0.369 0.446 0.635 0.420 0.510 0.574 0.600 0.419

Regression Model Model global F-Test p = 0.002 Variable Beta SE P VIF
Intercept 3210.9 120.3 <0.001 ln(MBP) 131.0 60.7 0.032 5.17 ln(MBZP) -35.5 32.8 0.280 2.97 ln(MEHHP) 111.6 179.9 0.536 35.5 ln(MEHP) -122.4 48.2 0.012 3.16 ln(MEOHP) 00.4 172.8 0.562 32.9 ln(MIBP) -132.7 51.8 0.011 3.52 ln(MEP) -33.6 24.9 0.178 1.59 ln(MMP) -27.3 30.8 0.377 1.81

PC for the Phthalate Date
Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7 Prin8 lMBP lMBZP lMEHHP lMEHP lMEOHP lMIBP lMEP lMMP % Variance 62.6 75.5 85.0 92.4 96.3 98.3 99.9 100

PCR Model with 3 Components
For centered and scales data For “original data Beta Intercept ln(MBP) -0.041 ln(MBZP) -0.043 ln(MEHHP) 0.048 ln(MEHP) 0.059 ln(MEOHP) 0.054 ln(MIBP) -0.047 ln(MEP) -0.055 ln(MMP) -0.028 Beta Intercept 3370.4 ln(MBP) -19.5 ln(MBZP) -14.4 ln(MEHHP) 26.2 ln(MEHP) 28.0 ln(MEOHP) 28.7 ln(MIBP) -23.1 ln(MEP) -19.3 ln(MMP) -11.3

PCR Coefficients… We regressed on the values calculated for 3 PCs so why are our coefficients in terms of the original predictors?

Summary PCR is designed predominantly to address collinearity
There are alternative approaches Partial least squares regression Shrinkage approaches PCR can also be used to reduce the number of features included in the fitted model Use m < p derived features in the final model The m derived features still based on all p original predictors in X Using all p features yields the original regression coefficients PCR (and PLSR) is prone to over-fitting so generally some form of model validation is recommended to select the number of components derived from X

Principal Components Analysis

Similar presentations

Presentation on theme: "Principal Components Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Principal Components Analysis

Similar presentations

Presentation on theme: "Principal Components Analysis"— Presentation transcript:

Similar presentations

About project

Feedback