Factor Analysis BMTRY 726 7/19/2018.

Factor Analysis BMTRY 726 7/19/2018

Uses Goal: Similar to PCA… describe the covariance of a large set of measured traits using a few linear combinations of underlying latent traits Why: again, similar reasons to PCA (1) Dimension Reduction (use k of p components) (2) Remove redundancy/duplication from a set of correlated variables (3) Represent correlated variables with a smaller set of “derived” variables (4) Create “new” factor variables that are independent

For Examples Say we want to define “frailty” in a population of cancer patients We have a concept of what “frailty” is but no direct way to measure it We believe an individual’s frailty has to do with their weight, strength, speed, agility, balance, etc. We therefore want to be able to define frailty as some composite measure of all of these factors…

Key Concepts Fj is a latent underlying variable ( j = 1, 2, …, m) X’s are observed variables related to what we think Fj might be ei is the measurement error for Xi, i = 1, 2, …, p lij are the factor “loadings” for Xi

Orthogonal Factor Model
Consider data with p observed variables:

Model Assumptions We must make some serious assumptions… Note, these are very strong assumptions which implies only narrow application These models are also best when p >> m

Model Assumptions Our assumptions can be related back to the variability of our original X’s

Model Terms Decomposition of the variance of Xi The proportion of variance of the ith measurement Xj contributed by the m factors F1, F2, …, Fm is called the ith communality

Model Terms Decomposition of the variance of Xi The remaining proportion of the variance of the ith measurement, associated with ei, is called the uniqueness or specific variance Note, we are assuming that the variances and covariances of X can be reconstructed from our pm factor loadings lij and the p specific variances

Limitations Linearity
-Assuming factors = linear combinations of our X’s -Factors unobserved so we can not verify this assumption -If relationship non-linear, linear combinations may provide a good approximation for only a small range of values The elements of S described by mp factor loadings in L and p specific variances {yi} -Model most useful for small m, but often mp + p parameters not sufficient and S is not close to LL’+ y

Limitations Even when m < p, we find L such that S = LL’ + e… but L is not unique

Limitations Non-Unique L…
-So what happens to our moments if we use these new factors and factor loadings?

Potential Pitfall The problem is, most covariance matrices can not be factored in the manor we have defined for a factor model: X = LF + e Cov(X)=S = LL’ + y For example… Consider data with 3 variables which we are trying to describe with 1 factor

Potential Pitfall We can write our factor model as follows: Using our factor model representation of variance, LL’ + y, we can define the following six equations

Potential Pitfall Use these equations to find the factor loadings and specific variances:

Potential Pitfall However, this results in the following problems:

Methods of Estimation We need to estimate: We have a random sample from n subjects from a population Measure p attributes for each of the n subjects latent factors L

Methods of Estimation We could also standardize our variables: Methods of Estimation 1. Principal Components method 2. Principal Factor method 3. Maximum likelihood method

Principal Component Method
Given S (or S if we have a sample from the population) Consider decomposition

Problem here is m = p so we want to drop some factors We drop l’s that are small (i.e. stop at lm)

Estimate L and y by substituting estimate eigenvectors/values for S or R: To make diagonal elements of , , we let

The optimality of using to approximate S due to: Note, the sum of squared elements is an approximation of the sum of squared error We can also estimate the proportion of the total sample variance due to the jth factor

Phthalate and Birth Outcomes
Recall the phthalate study. The PI is interested in examining the impact of phthalate exposure on infant birth outcomes and ideally identify underlying factors (i.e. patterns in the phthalate exposure). Factor analysis could be used to identify these underlying factors and then see if these are associated. Phthalates (urine metabolites): MBP, MBZP, MEHHP, MEHP, MEOHP, MIBP, MEP, MMP

Urine Phthalate Metabolites
Phthalate data includes n = 297 mothers with urine levels of p = 8 variables/metabolites Data standardized and factor analysis performed in sample correlation matrix R MBP MBZP MEHHP MEHP MEOHP MIBP MEP MMP 1 0.802 0.713 0.581 0.679 0.848 0.584 0.638 0.623 0.531 0.591 0.720 0.520 0.561 0.821 0.984 0.661 0.444 0.537 0.834 0.523 0.369 0.446 0.635 0.420 0.510 0.574 0.600 0.419

Given the eigenvalues/vectors of R, what is our factor model for two factors

Given a 2-factor solution, we can find the communalities and specific variances based on our loadings and R.

What is the cumulative proportion of variance accounted by factor 1, what about both factors?

What about how our model checks out….

Two factor (m = 2) solution: Data standardized and FA performed using correlation matrix R How might we interpret these factors? Variables Factor 1 Factor 2 Specific variances h2 lMBP 0.883 0.26 0.151 0.849 lMBzP 0.803 0.29 0.268 0.732 lMEHHP 0.880 -0.42 0.047 0.953 lMEHP 0.779 -0.49 lMEOHP 0.858 -0.46 0.048 0.952 lMiBP 0.826 0.32 0.216 0.784 lMEP 0.617 0.42 0.436 0.564 lMMP 0.729 0.21 0.422 0.578 SS Loading 5.14 1.12 Proportion Variance 0.642 0.141

Estimated loadings on factors do not change as number of factors increases Diagonal elements of S (or R) exactly equal diagonal elements of , but sample covariances may not be exactly reproduced Select number of factors m to make off-diagonal elements small for residual matrix Contribution of the kth factor to total variance is:

Principal Factor Method
Consider the model: Suppose initial estimates available for the communalities or specific variances

Then

Apply procedure iteratively 1. Start with 2. Compute factor loadings from eigenvalues/vectors of Rr 3. Compute new values 4. Repeat steps 2 and 3 until algorithm converges Problems: - some eigenvalues Rr can be negative -choice of m (m too large, some communalities > 1 and iteration terminates)

Principal factor method m = 2 factor solution: Variables Factor 1 Factor 2 y h2 lMBP 0.880 0.334 0.115 0.885 lMBzP 0.766 0.295 0.326 0.673 lMEHHP 0.902 -0.406 0.021 0.979 lMEHP 0.754 -0.360 0.302 0.698 lMEOHP 0.882 -0.459 0.012 0.988 lMiBP 0.801 0.247 0.753 lMEP 0.550 0.251 0.634 0.366 lMMP 0.668 0.176 0.522 0.478 SS Loading 4.91 0.91 Proportion Var 0.614 0.114

Check how closely the model estimated R

Maximum Likelihood Method
Likelihood function needed and additional assumptions made: Additional restriction specifying unique solution MLE’s are:

Maximum Likelihood Method
For m factors: -estimated communalities -proportion of the total sample variance due to kth factor

Maximum likelihood method m = 2 factor solution: Variables Factor 1 Factor 2 y lMBP 0.675 0.669 0.097 lMBzP 0.579 0.592 0.315 lMEHHP 0.989 -0.010 0.021 lMEHP 0.834 -0.034 0.303 lMEOHP 0.993 -0.071 0.010 lMiBP 0.609 0.607 0.261 lMEP 0.399 0.439 0.648 lMMP 0.542 0.417 0.532 SS Loading 4.27 1.54 Proportion Var 0.534 0.192

Check how closely the model estimated R

Large Sample Test for number of factors
We want to be able to decide of the number of common factors m we’ve choose in sufficient So if n is large, we do hypothesis testing: We can use estimates in our hypothesis statement…

Large Sample Test for number of factors
From this we develop a likelihood ratio test:

Test Results What does it mean if we reject the null hypothesis?
-Not an adequate number of factors Test results for our phthalate metabolite data for each approach (from R) Number of Factors PC Approach PF Approach ML Approach 1 5.8 x 10-53 2 1.4 x 10-9 0.22 0.36 3 4.6 x 10-9 0.84 0.88

Test Results Problem with the test
-If n is large and m is small compared to p, this test will very often reject the null -Results is we tend to want to keep in more factors -This can defeat the purpose of factor analysis -exercise caution when using this test

Factor Analysis BMTRY 726 7/19/2018.

Similar presentations

Presentation on theme: "Factor Analysis BMTRY 726 7/19/2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Factor Analysis BMTRY 726 7/19/2018.

Similar presentations

Presentation on theme: "Factor Analysis BMTRY 726 7/19/2018."— Presentation transcript:

Similar presentations

About project

Feedback