© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON MULTI-DIMENSIONAL MEASUREMENT AND FACTOR ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
STRUCTURE OF THE CHAPTER Elementary linkage analysis Factor analysis What to look for in factor analysis output Cluster analysis Examples of studies using multidimensional scaling and cluster analysis Multi-dimensional data: some words on notation A note on structural equation modelling A note on multilevel modelling
ELEMENTARY LINKAGE ANALYSIS A way of exploring the relationship between personal constructs, of assessing the dimensionality of the judgements that are made. It seeks to identify and define the clusterings of certain variables within a set of variables. Like factor analysis, elementary linkage analysis searches for interrelated groups of correlation coefficients. The objective of the search is to identify ‘types’.
WHAT IS FACTOR ANALYSIS? A method of grouping together variables which have something in common. It enables the researcher to take a set of variables and reduce them to a smaller number of underlying factors (latent variables) which account for as many variables as possible. It detects structures and commonalities in the relationships between variables. Researchers can identify where different variables in fact are addressing the same underlying concept. It detects latent (unobservable) factors.
WHAT IS FACTOR ANALYSIS? Factor analysis can take two main forms: Exploratory factor analysis: the use of factor analysis (principal components analysis in particular) to explore previously unknown groupings of variables, to seek underlying patterns, clusterings and groups. Confirmatory factor analysis is more stringent, testing a found set of factors against a hypothesized model of groupings and relationships.
STAGE ONE IN FACTOR ANALYSIS Check that the data are suitable for factor analysis: (a) Sample size (varies in the literature, from a minimum of 30 to a minimum of 300); if the sample size is small then the factors loadings should be high to be included); (b) Number of variables; (c) Ratio of sample size to number of variables (different ratios given in literature, from 5:1 to 30:1); (c) Strength of intercorrelations should be no less than .3.; (d)Bartlett’s test of sphericity should be statistically significant (<.05); (e) Kaiser-Mayer-Olkin measure of sampling adequacy should be .6 or higher (maximum is 1).
STAGE TWO IN FACTOR ANALYSIS Decide which form of extraction method to use: (a) Principal components analysis is widely used; (b) Set the Kaiser criterion (the Eigenvalues to be set at greater than 1); the Eigenvalue of a factor indicates the amount of the total variance explained by that factor – if it is less than 1.00 then it does not have any additional explanatory value and should be ignored (SPSS does this automatically); (c) Unrotated factor solution to be set; (d) Scree plot to be set.
SCREE PLOT IN SPSS
STAGE THREE IN FACTOR ANALYSIS Conduct the factor rotation: (a) Decide which of the two main approaches to use: Oblique (related variables): Direct Oblimin; Orthogonal (unrelated variables): Varimax; (b) People often use the varimax solution when it should not be used, as it is sometimes easier to use than other kinds; (c) Check that the rotated solution is set.
ROTATION Rotation keeps together those items that are closely related and separates them clearly from other items, i.e. it includes and excludes (keeps together a group of homogeneous items and keeps them apart from other groups).
EXAMPLE OF FACTOR ANALYSIS USING SPSS Factor analysis for an oblique rotation. Direct Oblimin rotation.
Analyze Dimension Reduction Factor Move the variables to be included to the ‘Variables’ box
Click on ‘Descriptives’ Click on ‘KMO and Bartlett’s test of sphericity’ Click on Coefficients’ Click ‘Continue’
Click on ‘Extraction’ Click on ‘Principal components’ Click on ‘Correlation matrix’ Click on ‘Unrotated factor solution’ Click on ‘Scree plot’ Click on ‘Based on Eigenvalue’ Click ‘Continue’
Click on ‘Rotation’ Click on ‘Direct Oblimin’ or ‘Varimax’ (depending on whether the rotation is oblique or orthogonal) Click ‘Continue’ return to main screen and Click ‘OK’
ANALYSIS OF THE EXAMPLE FROM SPSS SPSS produces many tables for factor analysis. Be selective but fair to the data.
Check correlation coefficients (most should be over Check correlation coefficients (most should be over .3) (selection only reproduced here, not the full table) How much do you feel that working with colleagues all day is really a strain for you? How much do you feel emotionally drained by your work? How much do you worry that your job is hardening you emotionally? How much frustration do you feel in your job? Correlation 1.000 .554 .507 .461 .580 .518 .646
SUITABILITY FOR FACTOR ANALYSIS KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .845 Bartlett's Test of Sphericity Approx. Chi-Square 5460.475 df 36 Sig. .000 KMO >.6 Bartlett’s test Sig.: <.05 The data are suitable for factor analysis
How much of the variance is explained by each item (lower than How much of the variance is explained by each item (lower than .3 and the item is a poor fit) Communalities Initial Extraction How hard do you feel you are working in your job? 1.000 .779 How much do you feel exhausted by the end of the workday? .818 How much do you feel that you cannot cope with your job any longer? .578 How much do you feel that you treat colleagues as impersonal objects? How much do you feel that working with colleagues all day is really a strain for you? .602 How much do you feel emotionally drained by your work? .629 How tired do you feel in the morning, having to face another school day? .595 How much do you worry that your job is hardening you emotionally? .661 How much frustration do you feel in your job? Extraction Method: Principal Component Analysis.
Total Variance Explained Two factors found: factor one explains 45.985 per cent of total variance; factor two explains 18.852 per cent of total variance. Total Variance Explained Component Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadingsa Total % of Variance Cumulative % 1 4.139 45.985 4.028 2 1.697 18.851 64.836 1.991 3 .661 7.342 72.178 4 .542 6.023 78.202 5 .531 5.900 84.102 6 .451 5.006 89.107 7 .395 4.390 93.497 8 .323 3.593 97.090 9 .262 2.910 100.000 Extraction Method: Principal Component Analysis. a. When components are correlated, sums of squared loadings cannot be added to obtain a total variance.
Pattern Matrixa Component 1 2 How hard do you feel you are working in your job? .005 .882 How much do you feel exhausted by the end of the workday? .252 .834 How much do you feel that you cannot cope with your job any longer? .691 .234 How much do you feel that you treat colleagues as impersonal objects? .674 -.459 How much do you feel that working with colleagues all day is really a strain for you? .782 -.158 How much do you feel emotionally drained by your work? .774 .096 How tired do you feel in the morning, having to face another school day? .697 .247 How much do you worry that your job is hardening you emotionally? .814 -.008 How much frustration do you feel in your job? .752 .097 Extraction Method: Principal Component Analysis. Rotation Method: Oblimin with Kaiser Normalization. a. Rotation converged in 6 iterations. Decide the cut-off points and which variables to include.
WHICH VARIABLES TO INCLUDE IN A FACTOR For each variable: Include the highest scoring variables; Omit the low scoring variables; Look for where there is a clear scoring distance between those included and those excluded; Review your selection to check that no lower scoring variables have been excluded which are conceptually close to those included; Review your selection to check whether some higher scoring variables should be excluded if they are not sufficiently conceptually close to the other that have been included; Review your final selection to see that they are conceptually similar. N. B. Inclusion and exclusion are an art, not a science; there is no simple formula, so you have to use your judgement.
WHAT TO REPORT Method of factor analysis used (Principal components; Direct Oblimin); KMO and Bartlett test of sphericity; Eigenvalues greater than 1; scree test; rotated solution). How many factors were extracted with Eigenvalues greater than 1. How many factors were included as a result of the scree test. Give a name/title to each of the factors. Indicate how much of the total variance was explained by each factor. Report the cut-off point for the variables included in each factor. Indicate the factor loadings of each variable in the factor. What the results tell us.
CLUSTER ANALYSIS Factor analysis and elementary linkage analysis enable the researcher to group together factors and variables, but cluster analysis enables the researcher to group together similar and homogeneous sub-samples of people. SPSS creates a dendrogram of clusters of people into groups.
DENDROGRAM IN CLUSTER ANALYSIS
INTERPRETING THE DENDROGRAM There are two main clusters: Cluster One: Persons 19, 20, 2, 13, 15, 9, 11, 18, 14, 16, 1, 10, 12, 5, 17 Cluster Two: Persons 7, 8, 4, 3, 6 If one wishes to have smaller clusters then three clusters can be found: Cluster One: Persons 19, 20, 2, 13, 15, 9, 11, 18 Cluster Two: Persons 14, 16, 1, 10, 12, 5, 17 Cluster Three: Persons 7, 8, 4, 3, 6
STRUCTURAL EQUATION MODELLING The name given to a group of techniques that enable researchers to construct models of putative causal relations, and to test those models against data. It is designed to enable researchers to confirm, modify and test their models of causal relations between variables. It is based on multiple regression and factor analysis.
STRUCTURAL EQUATION MODELLING It works with observed and unobserved variables, not latent factors (as in factor analysis). It is a particular kind of multiple regression analysis that enables the researcher to see the relative weightings of observed independent variables on each other and on a dependent variable, to establish pathways of causation, and to determine the direct and indirect effects of independent variables on a dependent variable.
A CAUSAL MODEL (USING AMOS WITH SPSS)
THE CAUSAL MODEL WITH CALCULATIONS ADDED
INTERPRETING THE CAUSAL MODEL Socio-economic’ status exerts a direct powerful influence on class of degree (.18), which is higher than the direct influence of either ‘part-time work’ (-.01) or ‘level of motivation for academic study’ (.04); ‘Socio-economic status’ exerts a powerful direct influence on ‘level of motivation for academic study’ (.52), which is higher than the influence of ‘socio-economic status’ on ‘class of degree’ (.18); ‘Socio-economic status’ exerts a powerful direct and negative influence on ‘part-time work’ (–.21), i.e. the higher the socio-economic status, the lesser is the amount of part-time work undertaken;
INTERPRETING THE CAUSAL MODEL ‘Part-time work’ exerts a powerful direct influence on ‘level of motivation for academic study’ (1.37), and this is higher than the influence of ‘socio-economic status’ on ‘level of motivation for academic study’ (.52); ‘Level of motivation for academic study’ exerts a powerful negative direct influence on ‘part-time work’ (–1.45), i.e. the higher is the level of motivation for academic study, the lesser is the amount of part-time work undertaken; ‘Level of motivation for academic study’ exerts a slightly more powerful influence on ‘class of degree’ (.04) than does ‘part-time work’ (–.01); ‘Part-time work’ exerts a negative influence on the class of degree (–.01), i.e. the more one works part-time, the lower is the class of degree obtained.
A STRUCTURAL EQUATION MODEL (USING AMOS IN SPSS) Ovals = factors Rectangles = variables for each factor E = Error factor
A NOTE ON MULTILEVEL MODELLING Data and variables exist at individual and group levels, e.g.: between students over all groups between groups between students within groups individual group class school local regional national international
A NOTE ON MULTILEVEL MODELLING Data are ‘nested’, i.e. individual-level data are nested within group, class, school, regional etc. levels. A dependent variable is affected by independent variables at different levels, i.e. data are hierarchical. Multilevel modelling uses regression analysis and multilevel regression. Multilevel modelling enables the researcher to calculate the relative impact on a dependent variable of one or more independent variables at each level of the hierarchy, and, thereby to identify factors at each level of the hierarchy that are associated with the impact of that level.