Download presentation
Presentation is loading. Please wait.
Published byHenry Williams Modified over 9 years ago
1
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis Homework problems
2
SW388R7 Data Analysis & Computers II Slide 2 Principal components factor analysis Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution. We begin by identifying a group of variables whose variance we believe can be represented more parsimoniously by a smaller set of factors, or components. The end result of the principal components analysis will tell us which variables can be represented by which components, and which variables should be retained as individual variables because the factor solution does not adequately represent their information.
3
SW388R7 Data Analysis & Computers II Slide 3 Strategy for solving problems - 1 A principal component factor analysis requires: The variables included must be metric level or dichotomous (dummy-coded) nominal level The sample size must be greater than 50 (preferably 100) The ratio of cases to variables must be 5 to 1 or larger The correlation matrix for the variables must contain 2 or more correlations of 0.30 or greater Variables with measures of sampling adequacy less than 0.50 must be removed The overall measure of sampling adequacy is 0.50 or higher The Bartlett test of sphericity is statistically significant. The first phase of a principal component analysis is devoted to verifying that we meet these requirements. If we do not meet these requirements, factor analysis is not appropriate.
4
SW388R7 Data Analysis & Computers II Slide 4 Strategy for solving problems - 2 The second phase of a principal component factor analysis focuses on deriving a factor model, or pattern of relationships between variables and components, that satisfies the following requirements: The derived components explain 50% or more of the variance in each of the variables, i.e. have a communality greater than 0.50 None of the variables have loadings, or correlations, of 0.40 or higher for more than one component, i.e. do not have complex structure None of the components has only one variable in it To meet these requirements, we remove problematic variables from the analysis and repeat the principal component analysis procedure in SPSS.
5
SW388R7 Data Analysis & Computers II Slide 5 Strategy for solving problems - 3 If, at the conclusion of this process, we can substitute the components for the variables in further analyses if: the components have more than one variable loading on them, the components explain at least 50% of the variance in each of the included variables, and components that collectively explain more than 60% of the variance in the set of included variables. Variables that were removed in the analysis should be included individually in further analyses.
6
SW388R7 Data Analysis & Computers II Slide 6 Substituting components for variables Substitution of components for individual variables is accomplished by : using only the highest loading variable in place of the other variables loading on the component, or by combining the variables loading on each component to create a new variable.
7
SW388R7 Data Analysis & Computers II Slide 7 Notes - 1 When evaluating measures of sampling adequacy, communalities, or factor loadings, we ignore the sign of the numeric value and base our decision on the size or magnitude of the value. The sign of the number indicates the direction of the relationship (direct or inverse). A loading of -0.732 is just as strong as a loading of 0.732. The minus sign indicates an inverse or negative relationship; the absence of a sign is meant to imply a plus sign indicating a direct or positive relationship.
8
SW388R7 Data Analysis & Computers II Slide 8 Notes - 2 If there are two or more components in the component matrix, the pattern of loadings is based on the SPSS Rotated Component Matrix. If there is only one component in the solution, the Rotated Component Matrix is not computed, and the pattern of loadings is based on the Component Matrix. It is possible that the analysis will break down and we will have too few variables in the analysis to support the use of principal component analysis.
9
SW388R7 Data Analysis & Computers II Slide 9 Question 1
10
SW388R7 Data Analysis & Computers II Slide 10 Answer 1 To answer the first question, we examine the level of measurement for each variable listed in the problem to make certain each is metric or dichotomous. In this example, all variables satisfied the level of measurement requirement. We added a caution because we are treating ordinal variables as metric.
11
SW388R7 Data Analysis & Computers II Slide 11 Question 2 To answer this question, we will compute the principal components analysis.
12
SW388R7 Data Analysis & Computers II Slide 12 Computing a principal component analysis To compute a principal component analysis in SPSS, select the Data Reduction | Factor… command from the Analyze menu.
13
SW388R7 Data Analysis & Computers II Slide 13 Add the variables to the analysis First, move the variables listed in the problem to the Variables list box. Second, click on the Descriptives… button to specify statistics to include in the output.
14
SW388R7 Data Analysis & Computers II Slide 14 Compete the descriptives dialog box First, mark the Univariate descriptives checkbox to get a tally of valid cases. Third, mark the Coefficients checkbox to get a correlation matrix, one of the outputs needed to assess the appropriateness of factor analysis for the variables. Second, keep the Initial solution checkbox to get the statistics needed to determine the number of factors to extract. Fourth, mark the KMO and Bartlett’s test of sphericity checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Fifth, mark the Anti-image checkbox to get more outputs used to assess the appropriateness of factor analysis for the variables. Sixth, click on the Continue button.
15
SW388R7 Data Analysis & Computers II Slide 15 Select the extraction method First, click on the Extraction… button to specify statistics to include in the output. The extraction method refers to the mathematical method that SPSS uses to compute the factors or components.
16
SW388R7 Data Analysis & Computers II Slide 16 Compete the extraction dialog box First, retain the default method Principal components. Second, click on the Continue button.
17
SW388R7 Data Analysis & Computers II Slide 17 Select the rotation method First, click on the Rotation… button to specify statistics to include in the output. The rotation method refers to the mathematical method that SPSS rotate the axes in geometric space. This makes it easier to determine which variables are loaded on which components.
18
SW388R7 Data Analysis & Computers II Slide 18 Compete the rotation dialog box First, mark the Varimax method as the type of rotation to used in the analysis. Second, click on the Continue button.
19
SW388R7 Data Analysis & Computers II Slide 19 Complete the request for the analysis First, click on the OK button to request the output.
20
SW388R7 Data Analysis & Computers II Slide 20 Sample size requirement: minimum number of cases The number of valid cases for this set of variables is 620. The preferred minimum sample size requirement of 100 valid cases is satisfied. While principal component analysis can be conducted on a sample that has fewer than 100 cases, but more than 50 cases, we should be cautious about its interpretation.
21
SW388R7 Data Analysis & Computers II Slide 21 Sample size requirement: ratio of cases to variables The ratio of cases to variables in a principal component analysis should be at least 5 to 1. With 620 and 12 variables, the ratio of cases to variables is 51.67 to 1, which exceeds the requirement for the ratio of cases to variables.
22
SW388R7 Data Analysis & Computers II Slide 22 Answer 2
23
SW388R7 Data Analysis & Computers II Slide 23 Question 3
24
SW388R7 Data Analysis & Computers II Slide 24 Appropriateness of factor analysis: Presence of substantial correlations Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis. For this set of variables, there are 7 correlations in the matrix greater than 0.30, satisfying this requirement. The correlations greater than 0.30 are highlighted in yellow.
25
SW388R7 Data Analysis & Computers II Slide 25 Appropriateness of factor analysis: Sampling adequacy of individual variables Principal component analysis requires that the Kaiser-Meyer-Olkin Measure of Sampling Adequacy be greater than 0.50 for each individual variable as well as the set of variables. The Measure of Sampling Adequacy (MSA) is described at marvelous if it is 0.90 or greater, meritorious if it is in the 0.80's, middling if in the 0.70's, mediocre if in the in the 0.60's, miserable if in the 0.50's, and unacceptable if below 0.50. There are two anti-image matrices: the anti-image covariance matrix and the anti-image correlation matrix. We are interested in the anti-image correlation matrix.
26
SW388R7 Data Analysis & Computers II Slide 26 Appropriateness of factor analysis: Sampling adequacy of individual variables In our initial analysis, the MSA for the variable "importance of ethnic identity" [ethimp] was 0.467. Since this is less than 0.50, the variable should be removed from the principal component analysis. SPSS locates the Measures of Sampling Adequacy are on the diagonal of the anti- image correlation matrix, highlighted in yellow.
27
SW388R7 Data Analysis & Computers II Slide 27 Re-running the principal components analysis To re-run the analysis, click on the Dialog Recall button and select Factor Analysis from the pop-up menu. The dialog box from the last factor analysis run will be displayed.
28
SW388R7 Data Analysis & Computers II Slide 28 Removing the variable Second, click on the button with the arrow pointing left to move the highlighted variable back to the list of variables. First, highlight the variable to be removed, ethimp.
29
SW388R7 Data Analysis & Computers II Slide 29 Producing the revised output First, click on the OK button to request the revised output.
30
SW388R7 Data Analysis & Computers II Slide 30 Appropriateness of factor analysis: Sampling adequacy of individual variables In the revised analysis, the MSA for the all of the variables is now greater than 0.50, so we satisfy that requirement.
31
SW388R7 Data Analysis & Computers II Slide 31 Appropriateness of factor analysis: Sample adequacy for set of variables In addition, the overall MSA for the set of variables included in the analysis was 0.762, which exceeds the minimum requirement of 0.50 for overall MSA. The eleven variables in the analysis satisfy this criteria for appropriateness of factor analysis.
32
SW388R7 Data Analysis & Computers II Slide 32 Appropriateness of factor analysis: Bartlett test of sphericity Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity be less than the level of significance. The probability associated with the Bartlett test is p<0.001, which satisfies this requirement. The variables now included in the analysis satisfy the screening criteria for the appropriateness of factor analysis. The next step is to determine the number of factors that should be included in the factor solution.
33
SW388R7 Data Analysis & Computers II Slide 33 Answer 3
34
SW388R7 Data Analysis & Computers II Slide 34 Question 4
35
SW388R7 Data Analysis & Computers II Slide 35 Number of factors to extract The latent root criterion for number of factors to extract would indicate that there were 3 components to be extracted for these variables, since there were 3 eigenvalues greater than 1.0 (3.032, 1.647, and 1.272). In contrast, the cumulative proportion of variance criteria would require 4 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 4 component solution would explain 63.131% of the total variance. Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution was based on the extraction of 3 components.
36
SW388R7 Data Analysis & Computers II Slide 36 Answer 4 The question indicated that there were 2 components, but our output indicated 3, so the question is false.
37
SW388R7 Data Analysis & Computers II Slide 37 Question 5
38
SW388R7 Data Analysis & Computers II Slide 38 Evaluating communalities The first adjustment that we make to the factor solution is to examine the communalities. The communalities represent the proportion of the variance for each of the variables included in the analysis that is explained or accounted for by the components in the factor solution. The derived components should explain at least half of each original variable's variance, so the communality value for each variable should be 0.50 or higher. If one or more variables have a value for communality that is less than 0.50, the variable with the lowest communality should be excluded and the principal component analysis should be computed again.
39
SW388R7 Data Analysis & Computers II Slide 39 Communality requiring variable removal Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "agreement that harmony in US best achieved by ignoring ethnic differences" [ethignor] from the analysis. The communality for "agreement that harmony in US best achieved by ignoring ethnic differences" [ethignor] was.260. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis. While other variables in the analysis also had communalities lower than 0.50, this variable was selected for removal because it had the lowest communality.
40
SW388R7 Data Analysis & Computers II Slide 40 Answer 5 To remove ethignor from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.
41
SW388R7 Data Analysis & Computers II Slide 41 Question 6
42
SW388R7 Data Analysis & Computers II Slide 42 Communality requiring variable removal Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "agreement that ethnic minorities must better adapt to mainstream American culture" [ethadapt] from the analysis. The communality for "agreement that ethnic minorities must better adapt to mainstream American culture" [ethadapt] was.338. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis.
43
SW388R7 Data Analysis & Computers II Slide 43 Answer 6 To remove ethadapt from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.
44
SW388R7 Data Analysis & Computers II Slide 44 Question 7
45
SW388R7 Data Analysis & Computers II Slide 45 Communality requiring variable removal Examination of the third principal components model extracted by SPSS resulted in the removal of the variable "agreement that ethnic group members are similar to one another" [ethsame] from the analysis, not "feelings toward African Americans" [feelblks],. The communality for "agreement that ethnic group members are similar to one another" [ethsame] was.368. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable, so this variable was removed from the analysis.
46
SW388R7 Data Analysis & Computers II Slide 46 Answer 7 To remove ethsame from the analysis, we follow the same sequence of steps that we used to eliminate ethimp.
47
SW388R7 Data Analysis & Computers II Slide 47 Satisfactory communalities for all variables Running the principal components analysis after removing the last variable produces a table of communalities where all are above 0.50.
48
SW388R7 Data Analysis & Computers II Slide 48 Variable loadings on components Once variables have been removed for low communalities, we examine the pattern of factor loadings (loadings greater than 0.40) to make certain that each variable loads on one and only one component. This pattern is called simple structure and is an accurate description of this table. If a variable does not have simple structure, it is removed from the analysis. If we remove a variable for complex structure, we start back with examining communalities after we run the factor analysis again.
49
SW388R7 Data Analysis & Computers II Slide 49 Single variable components If we end up with a component that contains only a single variable, the variable should be removed from the analysis. There is no advantage to using a single component to represent one variable. In this table of loadings, all three components have two or more variables loaded on them. If we remove a variable because it is the only one loading on a component, we start back with examining communalities after we run the factor analysis again.
50
SW388R7 Data Analysis & Computers II Slide 50 Question 8
51
SW388R7 Data Analysis & Computers II Slide 51 Answer 8 The variables which we removed in either the screening for suitability for factor analysis or in the extraction of factors should be used as individual variables in future analyses.
52
SW388R7 Data Analysis & Computers II Slide 52 Question 9
53
SW388R7 Data Analysis & Computers II Slide 53 Pattern of factor loadings The components and variables which they contain must match the table of component loadings in order for this question to be true. In this example, the pattern of loadings is correctly described.
54
SW388R7 Data Analysis & Computers II Slide 54 Answer 9 The components and variables which they contain must match the table of component loadings in order for this question to be true. In this example, the pattern of loadings is correctly described.
55
SW388R7 Data Analysis & Computers II Slide 55 Question 10
56
SW388R7 Data Analysis & Computers II Slide 56 Cumulative percent of variance explained The components explain 69.718% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables. If the percentage of variance explained is less than 60%, we should attach a note of caution to our solution, since using the components as substitutes for the variables may not be all that useful.
57
SW388R7 Data Analysis & Computers II Slide 57 Answer 10
58
SW388R7 Data Analysis & Computers II Slide 58 Steps in answering questions about principal components analysis - 1 Question: Variables included satisfy level of measurement requirements? Incorrect application of a statistic Yes No Are the variables included in the analysis metric or dichotomous? True
59
SW388R7 Data Analysis & Computers II Slide 59 Steps in answering questions about principal components analysis - 2 Question: Number of variables and cases satisfy sample size requirements? Is the number of valid cases 50 or more? Yes No Is the ratio of cases to variables at least 5 to 1? Yes No Incorrect application of a statistic Is the number of valid cases 100 or more? No Yes True True with caution
60
SW388R7 Data Analysis & Computers II Slide 60 Steps in answering questions about principal components analysis – 3a Question: Available data satisfies suitability criteria for principal components analysis? False No False Yes No Probability for Bartlett test of sphericity less than level of significance? Yes Are there two or more correlations that are 0.30 or greater?
61
SW388R7 Data Analysis & Computers II Slide 61 Steps in answering questions about principal components analysis – 3b Question: Available data satisfies suitability criteria for principal components analysis? True Yes No Yes No False Remove variable with lowest MSA and repeat analysis Is the measure of sampling adequacy larger than 0.50 for each variable? Overall measure of sampling adequacy greater than 0.50?
62
SW388R7 Data Analysis & Computers II Slide 62 Steps in answering questions about principal components analysis - 4 Question: Number of components to be extracted initially? Correct number of eigenvalues > 1.0? True No Yes False
63
SW388R7 Data Analysis & Computers II Slide 63 Steps in answering questions about principal components analysis - 5 Question: Examination of outputs indicates a variable should be excluded from principal components analysis? Communality for a variable less than 0.50? No Yes Remove variable with lowest communality and repeat analysis Does any variables show complex structure (2+ loadings > 0.40)? No Yes Does any of the components have one variable loading on it? No Yes False True Remove single variable loading on component and repeat analysis Remove variable with complex structure and lowest communality and repeat analysis
64
SW388R7 Data Analysis & Computers II Slide 64 Steps in answering questions about principal components analysis - 6 Question: Omitted variables to be included as individual variables in further analyses? False No Is the list of variables omitted from the analysis correct? Yes True
65
SW388R7 Data Analysis & Computers II Slide 65 Steps in answering questions about principal components analysis - 7 Question: Components to be substituted for individual variables? False No Are the number of components and pattern of loadings correct? Yes True
66
SW388R7 Data Analysis & Computers II Slide 66 Steps in answering questions about principal components analysis - 8 Question: Principal components solution explains satisfactory percentage of variance in included variables? True with caution No Is the cumulative proportion of variance for variables 60% or higher? Yes True
67
SW388R7 Data Analysis & Computers II Slide 67 Homework problems - 1 Homework problems ask a single question for each problem – whether or not a component can substitute for a subset of the variables included in the analysis.
68
SW388R7 Data Analysis & Computers II Slide 68 Homework problems - 2 Homework problems require you to complete all of the steps needed in the principal components practice problems.
69
SW388R7 Data Analysis & Computers II Slide 69 Homework problems - 3 The question will be true if, at the end of the analysis, the variables included on the first component are identical to the variables for which one component could be substituted. If the first component contains ordinal variables, or if the proportion of total variance explained is less than 60%, a caution is added to the true answer.
70
SW388R7 Data Analysis & Computers II Slide 70 Homework problems - 4 The question will be false if, at the end of the analysis, the variables included on the first component are different from the variables for which one component could be substituted. If level of measurement or sample size violations are found, the problem is an incorrect application of a statistic.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.