Factor Rotation & Factor Scores: Interpreting & Using Factors Well- & Ill-defined Factors Simple Structure Simple Structure & Factor Rotation Major Kinds of Factor Rotation Factor Interpretation Proper & Improper Factor Scores Uses of Factor Scores (interpretation & representation) Designing the a factor analysis study (& the next one)
How the process really works… Here’s the series of steps we talked about earlier. # factors decision interpreting the factors factor scores Considering the interpretations of the factors can aid the # factors decision! These decisions aren’t made independently in this order! Considering how the factor scores (representing the factors) relate to each other and to variables external to the factoring can aid both the # factors decision and the interpretation of the factors. Remember that this is Exploratory Factor Analysis! Exploring means trying alternatives (# factor rules, rotations, cutoffs). If those alternatives agree we’re pretty confident in the agreed upon solution. If they do not agree, we must select the “best” exploratory solution for these data and then replicate and converge to see if it continues to look like the “best solution”.
Reminder of Goals & Process of Factoring Research Remember that multiple programmatic studies and convergent findings are essential in factoring research (like all other kinds) Many studies generate more questions than answers When factoring we often learn about … The factors or composite variables that can be formed -- their interpretations, meaning and utility The variables you started with -- that often are different or more complex than is implied by their names multi-vocal items, especially unexpected ones, are an often an indication that you have something to learn about that variable – it may be more interesting or complex than you thought
Kinds of well-defined factors There is a trade-off between “parsimony” and “specificity” whenever we are factoring This trade-off influences both the #-of-factors and cutoff decisions, both of which, in turn, influence factor interpretation general and “larger” group factors include more variables, account for more variance -- are more parsimonious unique and “smaller” group factors include fewer variables & many be more focused -- are often more specific Preferences really depends upon... what you are expecting what you are trying to accomplish with the factoring
Kinds of ill-defined factors Unique factors hard to know what construct is represented by a 1- variable factor especially if that variable is multi-vocal then the factor is defined by “part” of that single variable -- but maybe not the part defined by its name Group factors can be ill-defined “odd combinations” can be hard to interpret -- especially later factors comprised of multi-vocal variables (knowledge of variables & population is very important!)
Reasons for ill-defined factors Ill-defined factors are particularly common when factoring a “closed set” of variables especially when that set was chosen to be “efficient” and so the variables have low intercorrelations When there is a general or large group factor, be careful about subsequent smaller group factors they may be “left-over” parts of multi-vocal variables factors may not represent the “named” parts of the vars Keeping & rotating “too many” factors will increase the chances of finding ill-defined factors
Simple Structure The idea of simple structure is very appealing... Each factor of any solution should have an unambiguous interpretation, because the variable loadings for each factor should be simple and clear. There have been several different characterizations of this idea, and varying degrees of success with translating those characterizations into mathematical operations and objective procedures, here are some of the most common
Components of Simple Structure Each factor should have several variables with strong loadings admonition for well-defined factors remember that “strong” loadings can be “+” or “-” Each variable should have a strong loading for only one factor admonition against multi-vocal items admonition of conceptually separable factors admonition that each variable should “belong” to some factor Each variable should have a large communality implying that its membership “accounts” for its variance
The benefit of “simple structure” ? Remember that … we’re usually factoring to find “groups of variables” But, the extraction process is trying to “reproduce variance” the factor plot often looks simpler than the structure matrix PC 1 PC 2 V V V V PC1 PC2 V1 V2 V3 V4 True, this gets more complicated with more variables and factors, but “simple structure” is basically about “seeing” in the structure matrix what is apparent in the plot
How rotation relates to “Simple Structure” Factor Rotations -- changing the “viewing angle” of the factor space-- have been the major approach to providing simple structure structure is “simplified” if the factor vectors “spear” the variable clusters Unrotated PC 1 PC 2 V V V V PC2 V1 V2 V3 V4 PC1 PC1’ PC2’ Rotated PC 1 PC 2 V V V V 4.2.6
Major Types of Rotation Remember -- extracted factors are orthogonal (uncorrelated) Orthogonal Rotation -- resulting factors are uncorrelated more parsimonious & efficient, but less “natural” Oblique Rotation -- resulting factors are correlated more “natural” & better “spearing”, but more complicated PC2 V1 V2 V3 V4 PC1 PC1’ PC2’ Orthogonal Rotation PC2 V1 V2 V3 V4 PC1 PC1’ PC2’ Oblique Rotation Angle less than 90 o Angle is 90 o
Major Types of Orthogonal Rotation & their “tendencies” Varimax -- most commonly used and common default “ simplifies factors” by maximizing variance of loadings of variables of a factor (minimized #vars with high loadings) tends to produce group factors Quartimax “simplifies variables” by maximizing variance of loadings of a variable across factors (minimizes #factors a var loads on) tends to “move” vars from extraction less than varimax tends to produce a general & small group factors Equimax designed to “balance” varimax and quartimax tendencies didn’t work very well -- can’t do simultaneously - whichever is done first dominates the final structure
Major Types of Oblique Rotation & their “tendencies” Promax computes best orthogonal solution and then “relaxes” orthogonality constraints to better “spear” variable clusters with factor vectors (give simpler structure) Direct Oblimin spearing variable clusters as well as possible to produce lowest occurrence of multi-vocality All oblique rotations have a parameter ( , , Κ) that set maximum correlation allowed between rotated factors changing this parameter can “importantly” change the resulting rotation and interpretation try at least a couple of values & look for consistency
Some things that are different (or not) when you use a Oblique Rotation Different things: There will be a (phi) matrix that holds the factor intercorrelations The -values and variances accounted for by the rotated factors will be different than those of the extracted factors compute for each factor by summing the squared structure loadings for that factor compute the variance accounted for as the newly computed / k Same things: the communality of each variable will be the same -- but can’t be computed by summing squared structure loadings for each variable (since factors are correlated)
Interpretation & Cut-offs Interpretation is the process of naming factors based on the variables that “load on” them Which variables “load” is decided based on a “cutoff” cutoffs usually range from.3 to.4 ( + or - ) Higher cutoffs limit # loading variables factors may be ill-defined, some variables may not load Lower cutoffs increases # loading variables variables more likely to be multi-vocal Worry & make a careful decision when your interpretation depends upon the cutoff that is chosen !!
Combining #-factors & Rotation to Select “the best Factor Solution” To specify “the solution” you must pick the #- factors, type or rotation & cutoff ! Apply the different rules to arrive at an initial “best guess” of the #-factors Obtain orthogonal and oblique rotations for that many factors, for one fewer and for one more Compare the solutions to find for “your favorite” – remember this is exploratory factoring, so explore! parsimony vs. specificity different cutoffs (.3 -.4) rotational survival simple structure conceptual sense interesting surprises (about factors and/or variables)
Component Scores A principal component is a composite variable formed as a linear combination of measure variables A component SCORE is a person’s score on that composite variable -- when their variable values are applied to the component score formula usually computed from Z-scores of measured variables the resulting PC scores are also Z-scores (M=0, S=1) PC 1 = 11 Z 1 + 21 Z 2 + … + k1 Z k PC 2 = 12 Z 1 + 22 Z 2 + … + k2 Z k (etc.) Component scores have the same properties as the components they represent (e.g., orthogonal or oblique)
Proper & Improper Component Scores A proper component score is a linear combination of all the variables in the analysis the appropriate s applied to variable Z-scores An improper component score is a linear combination of the variables which “define” that component usually an additive combination of the Z-scores of the variables with structure weights beyond the chosen cut-off value (Note: improper doesn’t mean “wrong” -- it means “not derived from optimal OLS weightings”)
Proper Component Scores Proper component scores are the “instantiation” of the components as they were mathematically derived from R (a linear combination of all the variables) Proper component scores have the same properties as components they are correlated with each other the same as are the PCs PC scores from orthogonal components are orthogonal PC scores from oblique components have r = they can be used to produce the structure matrix (corr of component scores and variables scores), communalities, variance accounted for, etc.
Improper Component Scores Improper component scores are the “instantiation” of the components as they were interpreted by the researcher (a linear combination of the variables which define that component) Improper component scores usually don’t have exactly the same properties as components they are usually correlated with each other whether based on orthogonal or oblique solutions they can not be used to produce the structure matrix (corr of component scores and variables scores), communalities, variance accounted for, etc.
Why many folks like Improper Component Scores When we talk about a component we seldom conceptualize it as a linear combination of all the variables -- shouldn’t we generate a score for the PC the same way be define it ?? The weights of the “noncontributing” variables for a given PC are primarily chosen to produce a desired -- if orthogonality is unlikely, can’t we just treat these as “0” ?? Aren’t we fooling ourselves using 5-decimal s when the variables themselves probably have a single significant digit ? Much simpler application -- if you only want a score for a given PC, you need only measure the variables used to define it Improper PC scores replicate and generalize across populations better than proper PC scores (fewer parameters to “drift”)
Uses of Component Scores PC scores can be used as predictors or criteria in subsequent dependent model analyses truncated components analysis -- using as predictors the PCs selected & derived from a large set of collinear predictors (common reason for factoring “closed sets” of variables) watch the “parsimony/stability vs. specificity” trade-off answers using proper and improper PC scores often differ (more collinearity & specificity w/ improper) similar approaches can be taken in ldf, ancova, etc.
Uses of Component Scores, cont. PC scores can be used as “variables” in additional factor analyses Higher order factoring factoring the factors looking for “more basic” or “more aggregated” variables watch the parsimony/stability vs. specificity trade-off PC scores can be used as the basis for cluster analysis & MDScaling interpretation (more later) parsimony/stability vs. specificity again)
Using Component Scores to Help with Factor Interpretation We can use component scores (especially improper scores) two ways to compare different factor solutions 1.Compare the inter-correlations of component scores from different “solutions” (#-factors, type of rotations & cutoff) Important differences may help identify the best solution 2.Compare the correlations of component scores from different “solutions” (#-factors, type of rotations & cutoff) and additional variables (that were not part of the factor analysis) The information in these correlations can be especially helpful in determining what to do about multi-vocal variables and naming factors – what are the differences in the patterns of correlations for different versions of the factor solution?
Factoring items vs. factoring scales Items are often factored as part of the process of scale development check if the items “go together” as the scale’s author intended Scales (composites of items) are factored to … examine construct validity of “new” scales test “theory” about what constructs are interrelated Remember, the reason we have scales is that individual items are typically unreliable and have limited validity
Factoring items vs. factoring scales, cont. The limited reliability and validity of items means that they will be measured with less precision, and so, their intercorrelations from any one sample will be “fraught with error” Since factoring starts with R, factorings of items is likely to yield spurious solutions -- replication of item-level factoring is very important !! Consider for a moment… Is the issue really “items vs. scales” ?? No -- it is really the reliability and validity of the “things being factored” scales having these properties more than scale items
Selecting Variables for a Factor Analysis The variables in the analysis determine the analysis results this has been true in every model we’ve looked at (remember how the inclusion of covariate and/or interaction terms has radically changed some results we’ve seen) this is very true of factor analysis, because the goal is to find “sets of variables” Variable sets for factoring come in two “kinds” when the researcher has “hand-selected” each variable when the researcher selects a “closed set” of variables (e.g., the sub-scales of a standard inventory, the items of an interview, or the elements of data in a “medical chart”)
Selecting Variables for a Factor Analysis, cont. Sometimes a researcher has access to a data set that someone else has collected -- an “opportunistic data set” while this can be a real money/time saver, be sure to recognize the possible limitations be sure the sample represents a population you care about carefully consider the variables that “aren’t included” and the possible effects their absence has on the resulting factors this is especially true if the data set was chosen to be “efficient” -- variables chosen to cover several domains you should plan to replicate any results obtained from opportunistic data
Selecting the Sample for a Factor Analysis How many? Keep in mind that the R (correlation matrix) and so the factor solution is the same no matter now many cases are used -- so the point is the representativeness and stability of the correlations Advice about the subject/variable ration varies pretty dramatically 5-10 cases per variable 300 cases minimum (maybe + # per item) Consider that Std r = 1 / (N-3) n=50 r +/-.146 n=100 r +/-.101 n=200 r +/-.07 n=300 r +/-.058 n=500 r +/-.045 n=1000 r +/-.031
Selecting the Sample for a Factor Analysis, cont. Who? Sometimes the need to increase our sample size leads us to “acts of desperation”, i.e., taking anybody? Be sure your sample represents a single “homogeneous” population Consider that one interesting research question is whether different populations or sub-populations have different factor structures
Designing the next study Usually the next study involves “additional variables” Additional variables might be selected to “join” a factor from this study OR to “create” a new factor One way to convincing folks you know how to interpret this factor solution is to be able to anticipate the result of including selected additional variables in the next study One way to solve “difficulties” of multi-vocality and ill-defined factors it to include selected additional variables in the next study “Additional Variables” might be more indices of the same construct(s), or indices of new “constructs “
Designing the next study, cont. Consider different ways of measuring constructs within or across studies different “tests” measuring the variable especially helpful if you know about the similarities and differences among the measures (e.g., “anxiety”) self-report, other-report, observation, “ lab tasks”, etc. Consider how the factor structure might differ across populations (e.g., intelligence) Consider how the factor structure might differ across “social time” (e.g., racism)