Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principal Components Analysis Complete Problems

Similar presentations


Presentation on theme: "Principal Components Analysis Complete Problems"— Presentation transcript:

1 Principal Components Analysis Complete Problems

2 Complete Principal Components Analysis
We add three steps to the end of the principal components analysis testing basic relationships: Analysis of the impact of outliers Split-sample validation analysis Computation of Chronbach’s alpha to measure feasibility of using components as summated scales

3 Outliers Outliers can change the factor structure found for a principal components analysis, creating the dilemma of determining which factor structure should be reported SPSS calculates factor scores as standard scores. SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers. If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers. If there is the analysis excluding outliers still satisfies the requirement for communalities and the factor structure is the same as the analysis with all cases, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them. After testing outliers, restore full data set before any further calculations

4 Split Sample Validation
To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified. A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results. If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.

5 Misleading Results to Watch Out For
When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same. Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings. Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.

6 When validation fails If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings. We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the analysis. Randomly selected samples are not always representative. We might try some different random number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient.

7 Reliability of Summated Scales
One of the common uses of factor analysis is the formation of summated scales, where we sum or average the scores on all the variables loading on a component to create the score for the component. To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha. If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale. Chronbach’s alpha requires that all variables be coded in the same direction. If there are negative loadings on a component, the variable must be reverse coded to get the correct value for alpha.

8 The Problem in BlackBoard
The problem statement tells us: the data set and variables included in the analysis the alpha for the statistical tests The seed number to use for the validation analysis

9 Statement about Level of Measurement
The first statement in the problem asks about level of measurement. Principal components analysis requires that all of the variables included in the analysis are metric.

10 Marking the Statement about Level of Measurement
All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis. Since we treated all variables as metric, we mark the check box.

11 Statement about Sample Size
We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

12 Run the Principal Components Analysis - 1
To answer the question about the sample size, we run the first principal components analysis. Select the Factor command from the Analyze > Data Reduction menu.

13 Run the Principal Components Analysis - 2
First, move the variables listed in the problem to the Variables list box. Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.

14 Run the Principal Components Analysis - 3
First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis. Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box. Second, mark the check boxes for the statistics for the suitability of factor analysis: Coefficients of the correlation matrix, KMO and Bartlett’s test of sphericity, and Anti-image correlation matrix.

15 Run the Principal Components Analysis - 4
Click on the Extraction button to tell SPSS what method it should use to extract the factors.

16 Run the Principal Components Analysis - 5
We will use the default method of Principal Components. The drop down list contains numerous other methods. Click on the Continue button to close the dialog box. We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.

17 Run the Principal Components Analysis - 6
Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.

18 Run the Principal Components Analysis - 7
Click on the Continue button to close the dialog box. We mark the option button for the Varimax rotation which will make the factors independent of each other.

19 Run the Principal Components Analysis - 8
Having specified the analysis, click on the OK button to produce the output.

20 Output for Sample Size Requirement
The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

21 Marking the Statement about Sample Size
Since we satisfied the minimum sample size requirement, we mark the statement. If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.

22 The Statement about Suitability for Factor Analysis: Sufficient Correlations
Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.

23 Sufficient Correlations in Correlation Matrix
For this set of variables, there are 9 correlations in the matrix greater than 0.30.

24 Marking the Statement about Sufficient Correlations
Since there are 9 correlations greater than 0.30, we mark the statement.

25 The Statement about Suitability for Factor Analysis: Test of Sphericity
Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements. If this is true, the variables are not correlated and the factor analysis will not work. Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.

26 Bartlett’s Test of Sphericity
Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = , p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.

27 Marking the Statement about Bartlett’s Test of Sphericity
Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.

28 The Statement about Suitability for Factor Analysis: Sampling Adequacy
Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables. Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.

29 Measures of Sampling Adequacy for Individual Variables
In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84). Note: Not all MSA’s are shown on this slide.

30 Kaiser-Meyer-Olkin Measure of Sampling Adequacy
In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.

31 Marking the Statement about Measures of Sampling Adequacy
Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.

32 Statement about Initial Number of Factors
Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand. Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables. The problem offers two possible responses.

33 Initial Number of Factors: Eigenvalues Greater than One
The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).

34 Initial Number of Factors: Percentage of Variance Explained
In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.

35 Marking the Statement about Initial Number of Factors
Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components. We mark the second statement in the pair. Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.

36 Statement about First Iteration of Factor Extraction
The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

37 Output for Communalities on First Iteration
Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.

38 Marking the Statement about First Iteration of Factor Extraction
My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.

39 Removing a Variable from the Factor Analysis - 1
To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.

40 Removing a Variable from the Factor Analysis - 2
To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

41 Removing a Variable from the Factor Analysis - 3
Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

42 Statement about Second Iteration of Factor Extraction
The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

43 Output for Communalities on Second Iteration
Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.

44 Output for Factor Structure on Second Iteration
Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536). Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.

45 Marking the Statement about Second Iteration of Factor Extraction
I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.

46 Removing a Variable from the Factor Analysis - 1
To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.

47 Removing a Variable from the Factor Analysis - 2
To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

48 Removing a Variable from the Factor Analysis - 3
Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

49 Statement about Third Iteration of Factor Extraction
The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution: all the variables remaining in the analysis had communalities above 0.50, demonstrated simple structure, and each component had more than one variable loading on it

50 Output for Communalities on Third Iteration - 1
Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.

51 Output for Factor Structure on Third Iteration - 2
Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.

52 Output for Factor Structure on Third Iteration - 3
Each of the components has two variables loading on it. If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.

53 Marking the Statement about Third Iteration of Factor Extraction
On the third iteration, all of the requirements for a factor solution were satisfied. For the 4 variables not excluded from the analysis, two components can be substituted for the 4 variables. Since the final solution found two components, so we mark the statement.

54 Statement about Variables Loading on the First Component
Two options are given which suggest different combinations of variables loading on the first component.

55 Output for Component One
Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix" Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] (loading = .901); and "an effort is made to get the opinions of people throughout the organization" [q77] (loading = .912). We can substitute one component variable for this combination of variables in further analyses.

56 Marking the Statement about Variables Loading on the First Component
Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77]. We mark the fist statement in the pair.

57 Statement about Variables Loading on the Second Component
Two options are given which suggest different combinations of variables loading on the second component.

58 Output for Component Two
Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix" Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] (loading = .821); and "I have a good understanding of our mission, vision, and strategic plan" [q84] (loading = .833). We can substitute one component variable for this combination of variables in further analyses.

59 Marking the Statement about Variables Loading on the Second Component
Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]. We mark the fist statement in the pair.

60 Statement about Percentage of Variance Explained by Factors
The final statement questions whether or not the factor solution met the standard of explaining 60% of the variance in the variables that were replaced.

61 Output for Percentage of Variance Explained by Factors
The components explain 77.25% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables.

62 Marking the Statement about Percentage of Variance Explained by Factors
Since the percentage of variance explained by the factors satisfies the goal of explaining 60% or more of the total original variance in the variables the components will replace, we mark the final statement.

63 Statement about Outliers
The next statement requires us to determine whether or not there are any outliers in the results of the principal components analysis. If outliers are found, they are removed from the analysis and the results computed again. If the factor solution is the same as that based on all cases, we conclude that outliers do not have any impact and we report the results based on all cases. If the solution without outliers is different, we face the difficult decision of which factor structure should be reported. In our problems, we will halt the analysis.

64 Detecting Outliers - 1 To detect outliers, we compute the factor scores in SPSS. Select the Factor Analysis command from the Dialog Recall tool button

65 Detecting Outliers - 2 The only command we need to change is to request SPSS to compute the factor scores. Click on the Scores… button to access the factor scores dialog box.

66 Detecting Outliers - 3 First, click on the Save as variables checkbox to create factor variables. Third, click on the Continue button to complete the specifications. Second, accept the default method using a Regression equation to calculate the scores.

67 Detecting Outliers - 4 Click on the Continue button to compute the factor scores.

68 Outliers in the Data Editor
SPSS creates the factor score variables in the data editor window. It names the first factor score “FAC1_1,” and the second factor score “FAC2_1.” We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range. Should you forget to delete the factor scores from the previous analysis, SPSS will alter the final digit in the factor name, i.e. instead of naming it FAC1_1, it will name it FAC1_2.

69 Sort the data to locate outliers for factor one
First, select the FAC1_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu.

70 Negative outliers for factor one
Scroll down past the cases for whom factor scores could not be computed because of missing data. We see that none of the scores for factor one are less than or equal to -3.0, so there are no outliers detected yet.

71 Positive outliers for factor one
Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0. There are no outliers on factor one.

72 Sort the data to locate outliers on factor two
First, select the fac2_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu.

73 Negative outliers for factor two
Scrolling down past the cases for whom factor scores could not be computed, we see that there are five cases that have a score factor less than or equal to -3.0 on factor 2.

74 Positive outliers for factor two
Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor two are greater than or equal to +3.0. We will run the analysis excluding the five negative outliers, and see if it changes our interpretation of the analysis.

75 Removing the outliers To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results. To remove the outliers, we will include the cases that are not outliers. Choose the Select Cases… command from the Data menu.

76 Setting the If condition
First, mark the option button for the If condition is satisfied. Click on the If… button to enter the formula for selecting cases to include in the analysis.

77 Formula to select cases that are not outliers
First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0. Second, click on the Continue button to complete the specification.

78 Complete the select cases command
SPSS writes the formula we entered next to the IF button. Having entered the formula for including cases, click on the OK button to complete the selection.

79 The outliers selected out of the analysis
When SPSS selects a case out of the data analysis, it draws a slash through the case number. The cases that we identified as outliers will be excluded. The cases with missing data are also excluded because they do not satisfy the criteria in the formula.

80 Repeating the factor analysis
To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button

81 Stopping SPSS from computing factor scores again
On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification. Click on the Scores… button to access the factor scores dialog.

82 Clearing the command to save factor scores
First, clear the Save as variables checkbox. This will deactivate the Method options. Second, click on the Continue button to complete the specification

83 Computing the factor analysis
To produce the output for the factor analysis excluding outliers, click on the OK button.

84 Comparing communalities
All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.50. All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.50. Though the communalities for each variable are slightly smaller when we excluded outliers, we would not alter our interpretation of the role of these four variables in the solution.

85 Comparing factor loadings
The factor loadings for the factor analysis including all cases is shown on the left. The factor loadings for the factor analysis excluding outliers is shown on the right. The pattern of variable loadings on components did not change when the outliers were removed. Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization“. Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84].

86 Marking the Statement about Outliers
The presence of outliers did not alter the factor solution. The factor solution based on all cases should be used in further analyses. We mark the check box for no impact due to outliers. Had the factor solution changed, we would have halted the analysis until we could understand the problem further.

87 Statement about Generalizability
Since factor analysis tends to over-fit the data used to develop the model at the expense of generalizability, we will test generalizability with a split sample validation strategy. In this strategy, we divide the sample in half, conduct the factor analysis on each half, and compare the results to the analysis on the full data set.

88 Deleting the Factor Scores
Before we do the split-sample validation, we will delete the factors scores that we used to detect outliers. First, highlight the columns containing the factors scores. Second, select the Clear command from the Edit menu.

89 Restoring All Cases to the Analysis - 1
We removed cases that were detected as outliers. Before doing our validation, we need to restore these cases to subsequent analyses. Select the Select Cases command from the Data menu.

90 Restoring All Cases to the Analysis - 2
First, click on the All cases option button. Click on the OK button to restore the cases.

91 All Cases Restored to the Data Set
The slash lines are removed from the case numbers, indicating that all cases are available to the analysis.

92 Split-sample validation
We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set. To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in. To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results. Before we do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id.

93 Sorting the data set in ascending order
To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu.

94 Setting the random number seed - 1
To set the random number seed, select the Random Number Generators… command from the Transform menu. NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.

95 Setting the random number seed - 2
First, mark the check for Set Starting Point. Fourth, click on the OK button to complete the action. Second, select the option button for a Fixed Value. Third, type the seed number provided in the problem directions: NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.

96 Computing the split variable - 1
To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

97 Computing the split variable - 2
First, type the name for the new variable, split, into the Target Variable text box. Second, the formula for the value of split is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50. If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent to false. Third, click on the OK button to complete the dialog box.

98 The split variable in the data editor
In the data editor, the split variable shows a random pattern of zero’s and one’s. To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1.

99 Repeating the analysis with the first validation sample
To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button.

100 Using "split" as the selection variable
First, scroll down the list of variables and highlight the variable split. Second, click on the right arrow button to move the split variable to the Selection Variable text box.

101 Setting the value of split to select cases
When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split. Click on the Value… button to enter a value for split.

102 Completing the value selection
Second, click on the Continue button to complete the value entry. First, type the value for the first half of the sample, 0, into the Value for Selection Variable text box.

103 Requesting output for the first validation sample
Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable. Since the validation analysis requires us to compare the results of the analysis using the two split sample, we will request the output for the second sample before doing any comparison.

104 Repeating the analysis with the second validation sample
To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button.

105 Setting the value of split to select cases
Since the split variable is already in the Selection Variable text box, we only need to change its value. Click on the Value… button to enter a different value for split.

106 Completing the value selection
Second, click on the Continue button to complete the value entry. First, type the value for the second half of the sample, 1, into the Value for Selection Variable text box.

107 Requesting output for the second validation sample
Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

108 Comparing communalities
All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.50. All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0.50. Note how SPSS identifies for us which cases we selected for the analysis.

109 Comparing factor loadings
The pattern of factor loading for both split samples shows the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77] loading on component one, and "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]loading on the second component.

110 Marking the Statement about Generalizability
All of the communalities in both validation samples met the criteria. The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample. In effect, we have done the same analysis on two separate sub-samples of cases and obtained the same results. This validation analysis supports a finding that the results of this principal component analysis are generalizable to the population represented by this data set. We mark the check box.

111 Statement about Summated Scales
The next statement indicates that we can form a summative scale from the variables loading on a component, i.e. summing or averaging the scores for the variables. The utility of summated scales is measured by Chronbach’s alpha, which should minimally be greater than 0.60, and preferably be greater than 0.70.

112 Computing Chronbach's Alpha
To compute Chronbach's alpha for each component in our analysis, we select Scale > Reliability Analysis… from the Analyze menu.

113 Selecting the variables for the first component
First, move the two variables that loaded on the first component (q76 and q77) to the Items list box. Second, click on the Statistics… button to select the statistics we will need.

114 Selecting the statistics for the output
Second, click on the Continue button. First, mark the checkboxes for Item, Scale, and Scale if item deleted.

115 Completing the specifications
Second, click on the OK button to produce the output. First, If Alpha is not selected as the Model in the drop down menu, select it now.

116 Chronbach's Alpha The reliability for component 1 as measured by Chronbach's alpha is 0.814, which is greater than the generally agreed upon lower limit of The variables included on this component ("information and knowledge are shared openly within this organization" and "an effort is made to get the opinions of people throughout the organization") can be used in a summated scale.

117 Computing Chronbach's Alpha
To compute Chronbach's alpha for the second scale we select Reliability Analysis from the Dialog Recall menu.

118 Selecting the variables for the second component
First, remove the variables that loaded on the first component and move the two variables that loaded on the second component to the Items list box. Second, since we want the same output we had for the first component, we only need to click on the OK button to produce the output.

119 Chronbach's Alpha The reliability for component 2 as measured by Chronbach's alpha is 0.561, which is less than the generally agreed upon lower limit of 0.70, and even less than the 0.60 lower limit for exploratory research. A summated scale based on these variables ("our web site is easy to use and contains helpful information" and "I have a good understanding of our mission, vision, and strategic plan") should not be used.

120 Chronbach's Alpha if Item Deleted - 1
If alpha is too small, the Chronbach’s Alpha if Item Deleted column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale. In this example, it does not produce a result because there are only two items and the removal of one would result in a one-item scale, which is not useful.

121 Chronbach's Alpha if Item Deleted - 2
Though not part of this problem, this output demonstrates the output for deleting an item to increase alpha. If the last item in this table were deleted, alpha would increase to .820, instead of the .686 for alpha with this item included.

122 Marking the Statement about Summated Scales
Since the variables loading on the second component did not satisfy the reliability scale, we leave the check box un-marked.

123 Principal Components Analysis: Level of Measurement
Level of measurement ok (all variables metric)? Do not mark check box for level of measurement No Mark: Inappropriate application of the statistic Yes Stop Mark check box for level of measurement Ordinal level variable treated as metric? Consider limitation in discussion of findings Yes No

124 Principal Components Analysis: Sample Size
Run Principal Components Analysis Adequate Sample Size (at least 150 valid cases) Do not mark check box for sample size No Consider limitation in discussion of findings Yes Mark check box for sample size

125 Principal Components Analysis: Suitability for Factor Analysis - 1
Two or more correlations ≥ 0.30? Do not mark check box for correlations No Stop, variables not good candidate for factor analysis Yes Mark check box for correlations Probability for Bartlett test of sphericity ≤ alpha? Do not mark check box for sphericity test No Stop, variables not good candidate for factor analysis Yes Mark check box for sphericity test

126 Principal Components Analysis: Suitability for Factor Analysis - 2
Remove variable with lowest MSA. Run PCA again. No Sampling adequacy ≥ 0.50 for each variable? One variable remaining in analysis? No Yes Yes Do not mark check box for MSA Stop, variables not good candidate for factor analysis KMO measure of sampling adequacy ≥ 0.50? Do not mark check box for MSA No Stop, variables not good candidate for factor analysis Yes Mark check box for MSA

127 Principal Components Analysis: Anticipated Number of Factors
Today, this step provides information to the analyst about the potential solution. When factor analysis was calculated by hand, this step determined how one would do the calculations. Correct umber factors supported by eigenvalues > 1.0 and the number of components needed to explain 60% of the variance? Don’t mark check box for number of factors No Yes Mark correct check box for number of factors

128 Principal Components Analysis: Excluding Variables for Low Communality
Communality for all variables ≥ 0.50? Do not mark check box for communality removal Yes No Mark check box for communality removal Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

129 Principal Components Analysis: Excluding Variables for Complex Structure
Simple structure (all variables load on single component)? Do not mark check box for complex structure removal Yes No Mark check box for complex structure removal Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

130 Principal Components Analysis: Excluding Variables for One-variable Components
All components have more than one variable loading? Do not mark check box for one-variable component Yes No Mark check box for one-variable component Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

131 Principal Components Analysis: Factor Structure
Repeat this step for each component Correct number of components extracted? Do not mark check box for number of component No Yes Mark check box for number of components Correct list of variables loaded on component? Do not mark check box for loadings on component No Yes Mark check box for loadings on component

132 Principal Components Analysis: Percent of Variance Explained
Components explain 60% or more of variance of included variables? Do not mark check box for percent of variance No Include as limitation in discussion of findings Yes Mark check box for percent of variance

133 Principal Components Analysis: Impact of Outliers - 1
Starting here, we include only the variables in the factor solution. Re-run factor analysis, requesting regression factor scores Yes Are any of the factor scores outliers (larger than ±3.0)? No No outliers, mark check box for no impact Go to validation analysis Yes Re-run factor analysis, excluding outliers

134 Principal Components Analysis: Impact of Outliers - 2
Are all of the communalities excluding outliers greater than 0.50? Do not mark check box for no impact No Stop, clarify which analysis should be reported Yes Pattern of factor loadings excluding outliers match pattern for full data set? Do not mark check box for no impact No Stop, clarify which analysis should be reported Yes Mark check box for no impact Re-run factor analysis, including all cases Since outliers had no effect, there is no reason to exclude them from the analysis

135 Principal Components Analysis: Validation Analysis - 1
Compute split variable using specified random number seed Run factor analysis, selecting cases where split = 0 Run factor analysis, selecting cases where split = 1 Are all of the communalities for both split samples greater than 0.50? Do not mark check box for validation analysis No Stop, generalizability of findings is questionable Yes

136 Principal Components Analysis: Impact of Outliers - 2
Pattern of factor loadings for split samples matches factor loadings for full data set? Do not mark check box for validation analysis No Stop, generalizability of findings is questionable Yes Mark check box for generalizability

137 Principal Components Analysis: Reliability Analysis
Compute Chronbach’s alpha for all components Chronbach’s alpha greater than .60 for all components? Do not mark check box for summated scales No Yes Chronbach’s alpha greater than .70 for all components? Mark check box for summated scales No Add note of caution to interpretation Yes Mark check box for summated scales


Download ppt "Principal Components Analysis Complete Problems"

Similar presentations


Ads by Google