Presentation is loading. Please wait.

Presentation is loading. Please wait.

About Factor Analysis February 5, 2007. 2 Factor Analysis Principal Component Analysis PCA Factor Analysis Exploratory Factor Analysis EFA Confirmatory.

Similar presentations


Presentation on theme: "About Factor Analysis February 5, 2007. 2 Factor Analysis Principal Component Analysis PCA Factor Analysis Exploratory Factor Analysis EFA Confirmatory."— Presentation transcript:

1 About Factor Analysis February 5, 2007

2 2 Factor Analysis Principal Component Analysis PCA Factor Analysis Exploratory Factor Analysis EFA Confirmatory Factor Analysis CFATerminology Factor analysis as a generic term is for a family of statistical techniques concerned with the reduction of a set of observable variables in terms of a small number of factors/components. Factor analysis as a generic term is for a family of statistical techniques concerned with the reduction of a set of observable variables in terms of a small number of factors/components.

3 3 Principal Component Analysis A variable reduction technique A variable reduction technique Used when variables are highly correlated Used when variables are highly correlated Reduces the number of observed variables to a smaller number of principal components which account for most of the variance of the observed variables Reduces the number of observed variables to a smaller number of principal components which account for most of the variance of the observed variables

4 4 Exploratory Factor Analysis A variable reduction technique A variable reduction technique Identifies the number of underlying latent factors and factor structure of a set of variables Identifies the number of underlying latent factors and factor structure of a set of variables provide an explanation of the relationships among observed variables in terms of factors provide an explanation of the relationships among observed variables in terms of factors

5 5 PCA and EFA Variable reduction methods Variable reduction methods Identify groups of observed variables that tend to hang together empirically Identify groups of observed variables that tend to hang together empirically Performed by PROC FACTOR Performed by PROC FACTOR Sometimes even provide very similar results Sometimes even provide very similar results

6 6 PCA vs. EFA Identifies a smaller number of composite variables (principal components, artificial variables) to account for most of the variance present in the observed variables Identifies a smaller number of composite variables (principal components, artificial variables) to account for most of the variance present in the observed variables PCs retained account for a maximal amount of total variance of observed variables PCs retained account for a maximal amount of total variance of observed variables Identifies the number of underlying latent factors (cannot be directly measured) and factor structure of a set of variables Factors account for common variance of the observed variables

7 7 PCA vs. EFA Diagonals of the correaltion matrix equal to one Diagonals of the correaltion matrix equal to one Component scores are linear combinations of the observed variables weighted by eigenvectors Component scores are linear combinations of the observed variables weighted by eigenvectors Diagonals of the correaltion matrix adjusted with unique factors (communality) Observed variables are linear combinations of the underlying factors.

8 8 PCA vs. EFA PCA PCA proc factor method=prin priors=one; or proc princomp; EFA EFA proc factor method=ml priors=smc;

9 9 PRINCOMP vs. FACTOR PROC PRINCOMP has the following advantages over PROC FACTOR: Slightly faster if a small number of components is requested. Slightly faster if a small number of components is requested. Can analyze somewhat larger problems in a fixed amount of memory. Can analyze somewhat larger problems in a fixed amount of memory. Can output scores from an analysis of a partial correlation or covariance matrix. Can output scores from an analysis of a partial correlation or covariance matrix. Simpler to use. Simpler to use. PROC FACTOR has the following advantages over PROC PRINCOMP for PCA: Produces more output, including the scree (eigenvalue) plot, pattern matrix, and residual correlations. Produces more output, including the scree (eigenvalue) plot, pattern matrix, and residual correlations. Does rotations. Does rotations.

10 10 Communalities Communality measures the proportion of the variance of the observed variable shared with the other variables; Communality measures the proportion of the variance of the observed variable shared with the other variables; EFA begins by substituting the diagonal of the correlation matrix with what are called prior communality estimates; By PRIORS= option to the PROC FACTOR statement: EFA begins by substituting the diagonal of the correlation matrix with what are called prior communality estimates; By PRIORS= option to the PROC FACTOR statement: o PRIORS=MAX, the largest absolute correlation for a variable with any other variable as the communality estimate for the variable; o PRIORS=SMC, the squared multiple correlation between the variable and all other variables; o SAS by default sets all prior communalities to 1.0, which is the same as requesting a PCA.

11 11 Confirmatory Factor Analysis Confirmatory factor analysis allows you to test very specific hypotheses regarding the number of factors, factor loadings, and factor intercorrelations; Confirmatory factor analysis allows you to test very specific hypotheses regarding the number of factors, factor loadings, and factor intercorrelations; PROC CALIS; PROC CALIS; Specify the structure of three matrices a priori of data analysis: Specify the structure of three matrices a priori of data analysis: 1) the factor loading matrix 2) the factor intercorrelation matrix 3) the unique variance matrix It is more complex to run than EFA; It is more complex to run than EFA; The assumptions underlying confirmatory factor analysis as well as the interpretation of the output can be exceedingly complex. The assumptions underlying confirmatory factor analysis as well as the interpretation of the output can be exceedingly complex.

12 12 Proc Calis PROC CALIS METHOD = LSML ALL NOMOD ; VAR Item1-Item6; FACTOR HEYWOOD N = 2; /* N = 2 specifies a two factor solution ; */ MATRIX _F_ {1,1} = Item1F1 (.80), {1,2} = Item1F2 (.20), {2,1} = Item2F1 (.80), {2,2} = Item2F2 (.20), {3,1} = Item3F1 (.80), {3,2} = Item3F2 (.20), {4,1} = Item4F1 (.20), {4,2} = Item4F2 (.80), {5,1} = Item5F1 (.20), {5,2} = Item5F2 (.80), {6,1} = Item6F1 (.20), {6,2} = Item6F2 (.80) ; /* The matrix _F_, the factor loading matrix;*/ Matrix _P_ {1, 1} = 1.0, {2, 2} = 1.0, {2, 1} =.60 ; /* The _P_, the factor intercorrelation matrix; defaults to an identity matrix;*/ Matrix _U_ {1, 1} = Theta1-Theta6 6*.10 ; /* Matrix _U_ is the unique variance matrix; */ RUN ;

13 13 Combining EFA and CFA Use EFA if you do not have strong theory about the underlying constructs (theory-generating); CFA otherwise (theory-testing); Use EFA if you do not have strong theory about the underlying constructs (theory-generating); CFA otherwise (theory-testing); On separate data sets, using an EFA to generate a theory about the underlying constructs; then test the structure of the extracted factors with a CFA; On separate data sets, using an EFA to generate a theory about the underlying constructs; then test the structure of the extracted factors with a CFA; Example: The Coach-Athlete Relationship Questionnaire (CART-Q): development and initial validation. Example: The Coach-Athlete Relationship Questionnaire (CART-Q): development and initial validation.

14 14 Steps of PCA/EFA Initial Extraction of the components/factors; Initial Extraction of the components/factors; Determining the number of “meaningful” components/factors to retain; Determining the number of “meaningful” components/factors to retain; Rotation to a Final solution; Rotation to a Final solution; Interpreting the rotated solution; Interpreting the rotated solution; Creating factor scores or factor-based scores for further analysis; Creating factor scores or factor-based scores for further analysis; Summarizing the results. Summarizing the results.

15 15 Determining the number of factors/components Kaiser-Guttman Rule/Eigenvalue-greater-than-one rule; Kaiser-Guttman Rule/Eigenvalue-greater-than-one rule; o An eigenvalue represents the amount of variance that is accounted for by a given factor/component. o More appropriate for PCA than EFA. For EFA, it is adjusted by communalities. o The PROC FACTOR has the option MINEIGEN= to specify the cutoff. Scree test; Scree test; o Plotting the eigenvalues against the corresponding factor numbers o Look for the “elbow” Cumulative proportion of variance explained, e.g at least 70%; Cumulative proportion of variance explained, e.g at least 70%; Interpretability of the factors/components extracted; Interpretability of the factors/components extracted;

16 16 Interpretability Factor Pattern Matrix/factor loadings: Factor Pattern Matrix/factor loadings: In an orthogonal analysis, factor loadings are equivalent to bivariate correlations between the observed variables and the retained factors/components. A rule of thumb: factor loadings greater than.40 in absolute value are considered to be significant. A rule of thumb: factor loadings greater than.40 in absolute value are considered to be significant. Simple structure: each variable has relatively high factor loadings on only one component, and near zero loadings on the others. Simple structure: each variable has relatively high factor loadings on only one component, and near zero loadings on the others.

17 17

18 18 Rotation to a Final Solution The factor pattern matrix is not unique. The factor pattern matrix is not unique. By rotating the reference axes of the factor solution to simplify the factor structure and to make the final solution easier to interpret. By rotating the reference axes of the factor solution to simplify the factor structure and to make the final solution easier to interpret. Orthogonal rotation: Orthogonal rotation: VARIMAX, EQUAMAX, ORTHOMAX… Oblique rotation: PROCRUSTES, PROMAX Oblique rotation: PROCRUSTES, PROMAX

19 19 Interpretation of Factors Identifying significant loadings. That is, to determine what construct seems to be measured by factor 1, what construct seems to be measured by factor 2 … Identifying significant loadings. That is, to determine what construct seems to be measured by factor 1, what construct seems to be measured by factor 2 … Unique loadings of 0.40 and above, and of at least 0.10 cross-loading differences Unique loadings of 0.40 and above, and of at least 0.10 cross-loading differences Naming Factors. What these variables have in common? Naming Factors. What these variables have in common?

20 20 Factor Scores Assign scores to each subject to indicate where that subject stands on each retained factor/component for further analysis. Assign scores to each subject to indicate where that subject stands on each retained factor/component for further analysis. A factor score is a linear composite of the optimally- weighted observed variables. If request, PROC FACTOR will compute each subject’s factor scores. A factor score is a linear composite of the optimally- weighted observed variables. If request, PROC FACTOR will compute each subject’s factor scores. A factor-based score is a linear composite of the variables that demonstrated meaningful loadings for the factor/component in question, i.e. sum. A factor-based score is a linear composite of the variables that demonstrated meaningful loadings for the factor/component in question, i.e. sum.

21 21 Example: Childhood Trauma Questionnaire ARYS, N=438, 28 questions/measured variables; The scale ranged from 1 to 5: 1 - never true 2 - rarely true 3 - sometimes true 4 - often true 5 - very often true Excluded Q 10, 16, 22; Excluded Q 10, 16, 22; Reversed scores for Q 2, 5, 7, 13, 19, 26, 28; Reversed scores for Q 2, 5, 7, 13, 19, 26, 28;

22 22 When I was growing up … 1. I didn’t have enough to eat. 2. I knew that there was someone to take care of me and protect me. 3. People in my family called me things like “stupid”, “lazy”, or “ugly”. 4. My parents were too drunk or high to take care of the family. 5. There was someone in my family who helped me feel that I was important or special. 6. I had to wear dirty clothes. 7. I felt loved. 8. I thought that my parents wished I had never been born. 9. I got hit so hard by someone in my family that I had to see a doctor or go to the hospital. 10. There was nothing I wanted to change about my family. 11. People in my family hit me so hard that it left me with bruises or marks. 12. … 26. There was someone to take me to the doctor if I needed it. 27. …

23 23 Correlation Matrix

24 24 Method 1: EFA without Rotation PROC FACTOR DATA=trauma METHOD=ml SCREE PRIORS=smc; VAR nq:; RUN; Method is maximum likelihood Method is maximum likelihood Scree plot of eigenvalues Scree plot of eigenvalues Diagonals of the correlation matrix are equal to squared multiple correlations Diagonals of the correlation matrix are equal to squared multiple correlations

25 25 Table 1: Eigenvalues of the Weighted Reduced Correlation Matrix.

26 26 Scree Plot

27 27 Table 2: Unrotated Factor Pattern Matrix.

28 28 Method 2: EFA with Orthogonal Rotation PROC FACTOR DATA=trauma METHOD=ml ROTATE=varimax N=4 OUT=factscore PRIORS=smc; VAR nq:; RUN; Orthogonal rotation method VARIMAX Orthogonal rotation method VARIMAX 4 factors retained 4 factors retained Original data and factor scores output to dataset factscore Original data and factor scores output to dataset factscore

29 29 Table 3: Rotated Factor Pattern matrix by orthogonal method VARIMAX (number of factors=4).

30 30 Table 4: Rotated Factor Pattern matrix by orthogonal method VARIMAX (number of factors=5).

31 31 Method 3: EFA with Oblique Rotation PROC FACTOR DATA=trauma METHOD=ml ROTATE=promax N=5 OUT=factscore PRIORS=smc; VAR nq:; RUN; Oblique rotation method PROMAX Oblique rotation method PROMAX 5 factors retained 5 factors retained

32 32 Table 5: Rotated Factor Pattern matrix by oblique method PROMAX.

33 33 Plots of Factor Pattern for Factor1 and Factor2

34 34 Table 6: Rotated Factor Pattern matrix by oblique method PROMAX after removing Q8.

35 35 Table 7: Inter-Factor Correlations by oblique method PROMAX after removing Q8.

36 36 Table 8: Scoring Direction for Childhood Trauma Questionnaire.

37 37 When I was growing up … 1. I didn’t have enough to eat. 2. I knew that there was someone to take care of me and protect me. 3. People in my family called me things like “stupid”, “lazy”, or “ugly”. 4. My parents were too drunk or high to take care of the family. 5. There was someone in my family who helped me feel that I was important or special. 6. I had to wear dirty clothes. 7. I felt loved. 8. I thought that my parents wished I had never been born. 9. I got hit so hard by someone in my family that I had to see a doctor or go to the hospital. 10. There was nothing I wanted to change about my family. 11. People in my family hit me so hard that it left me with bruises or marks. 12. … 26. There was someone to take me to the doctor if I needed it. 27. …

38 38 PCA or EFA PCA deals with correlated variables with the purpose of reducing the numbers of variables and explaining the large amount of variance with few variables PCA deals with correlated variables with the purpose of reducing the numbers of variables and explaining the large amount of variance with few variables EFA estimates factors, underlying constructs that cannot be measured directly EFA estimates factors, underlying constructs that cannot be measured directly Do not run both. Select the appropriate analysis first. Do not run both. Select the appropriate analysis first.

39 39 Assumptions & Limitations No outliers; No outliers; Variables have to be correlated, interval-scaled; Variables have to be correlated, interval-scaled; Linearity; Linearity; Normal distribution; Normal distribution; Sufficient number of observed variables; Sufficient number of observed variables; Sufficient number of observations to provide reliable estimations of the correlations; Sufficient number of observations to provide reliable estimations of the correlations; Sometimes arbitrary and subjective decisions have to be made. Sometimes arbitrary and subjective decisions have to be made.

40 40 Related Topics Nonlinear factor analysis; Nonlinear factor analysis; Factor analysis of ordinal/categorical variables; Factor analysis of ordinal/categorical variables; Principal components of qualitative data (PROC PRINQUAL); Principal components of qualitative data (PROC PRINQUAL); Assess reliability by computing coefficient alpha: an index of internal consistency reliability. Assess reliability by computing coefficient alpha: an index of internal consistency reliability.

41 41 References Factor Analysis Using SAS PROC FACTOR, by the University of Texas at Austin Statistical Services Factor Analysis Using SAS PROC FACTOR, by the University of Texas at Austin Statistical Services http://www.ats.ucla.edu/stat/sas/library/factor_ut.htm Principle Component Analysis vs. Exploratory Factor Analysis, Diana D. Suhr, University of Northern Colorado Principle Component Analysis vs. Exploratory Factor Analysis, Diana D. Suhr, University of Northern Colorado http://support.sas.com/publishing/pubcat/chaps/55129.pdf http://support.sas.com/publishing/pubcat/chaps/55129.pdf http://support.sas.com/publishing/pubcat/chaps/55129.pdf A tutorial on Principal Components Analysis, by Lindsay I Smith A tutorial on Principal Components Analysis, by Lindsay I Smith Basic concepts and procedures of confirmatory factor analysis, by Connie D. Stapleton Basic concepts and procedures of confirmatory factor analysis, by Connie D. Stapleton SAS/STAT OnlineDoc 9.1.3 SAS/STAT OnlineDoc 9.1.3 Sophia Jowett, Nikos Ntoumanis (2004), The Coach-Athlete Relationship Questionnaire (CART-Q): development and initial validation Sophia Jowett, Nikos Ntoumanis (2004), The Coach-Athlete Relationship Questionnaire (CART-Q): development and initial validation http://www.utexas.edu/its/rc/answers/sas/sas26.html http://www.utexas.edu/its/rc/answers/sas/sas26.html

42 42 Thank you …

43 43

44 44

45 45

46 46 Most skiing accidents happen on sunny days on easy slopes. Most skiing accidents happen on sunny days on easy slopes. The percentage of head injuries in skiing has gone up. The percentage of head injuries in skiing has gone up. The current injury rate in Scotland is 2.24 injuries per 1000 skier/boarder days; 1.74 injuries per 1000 skier days; 3.55 injuries per 1000 boarder days. The current injury rate in Scotland is 2.24 injuries per 1000 skier/boarder days; 1.74 injuries per 1000 skier days; 3.55 injuries per 1000 boarder days. Alpine skiers are three times more likely to be involved in a collision with other people than snowboarders. Alpine skiers are three times more likely to be involved in a collision with other people than snowboarders. Both drivers and passengers in SUVs are more likely to die in accidents than those in compact cars. Both drivers and passengers in SUVs are more likely to die in accidents than those in compact cars. Traffic accidents account for about 10,000 deaths a year in Japan compared to 30,000+ deaths due to suicide. Traffic accidents account for about 10,000 deaths a year in Japan compared to 30,000+ deaths due to suicide.


Download ppt "About Factor Analysis February 5, 2007. 2 Factor Analysis Principal Component Analysis PCA Factor Analysis Exploratory Factor Analysis EFA Confirmatory."

Similar presentations


Ads by Google