Principal Components Analysis Complete Problems

Slides:



Advertisements
Similar presentations
4/4/2015Slide 1 SOLVING THE PROBLEM A one-sample t-test of a population mean requires that the variable be quantitative. A one-sample test of a population.
Advertisements

Principal component analysis
One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
Outliers Split-sample Validation
SW388R7 Data Analysis & Computers II Slide 1 Principal Component Analysis: Additional Topics Split Sample Validation Detecting Outliers Reliability of.
Detecting univariate outliers Detecting multivariate outliers
Chi-square Test of Independence
Outliers Split-sample Validation
Principal component analysis
Multiple Regression – Assumptions and Outliers
Multiple Regression – Basic Relationships
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Logistic Regression – Complete Problems
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Stepwise Binary Logistic Regression
Sampling Distribution of the Mean Problem - 1
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Slide 1 Stepwise Multiple Regression. Slide 2 Different Methods for Entering Variables in Multiple Regression  Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Practice Problems Homework Problems Using Scripts.
Hierarchical Binary Logistic Regression
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Multinomial Logistic Regression Basic Relationships
Stepwise Multiple Regression
SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression: Complete Problems Outliers and Influential Cases Split-sample Validation.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Chi-square Test of Independence
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Lecture 12 Factor Analysis.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Applied Quantitative Analysis and Practices
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
Principal Component Analysis
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
1 FACTOR ANALYSIS Kazimieras Pukėnas. 2 Factor analysis is used to uncover the latent (not observed directly) structure (dimensions) of a set of variables.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Assumption of normality
Principal Component Analysis
Multiple Regression – Split Sample Validation
Multinomial Logistic Regression: Complete Problems
Presentation transcript:

Principal Components Analysis Complete Problems

Complete Principal Components Analysis We add three steps to the end of the principal components analysis testing basic relationships: Analysis of the impact of outliers Split-sample validation analysis Computation of Chronbach’s alpha to measure feasibility of using components as summated scales

Outliers Outliers can change the factor structure found for a principal components analysis, creating the dilemma of determining which factor structure should be reported SPSS calculates factor scores as standard scores. SPSS suggests that one way to identify outliers is to compute the factors scores and identify those have a value greater than ±3.0 as outliers. If we find outliers in our analysis, we redo the analysis, omitting the cases that were outliers. If there is the analysis excluding outliers still satisfies the requirement for communalities and the factor structure is the same as the analysis with all cases, it implies that there outliers do not have an impact. If our factor solution changes, we will have to study the outlier cases to determine whether or not we should exclude them. After testing outliers, restore full data set before any further calculations

Split Sample Validation To test the generalizability of findings from a principal component analysis, we could conduct a second research study to see if our findings are verified. A less costly alternative is to split the sample randomly into two halves, do the principal component analysis on each half and compare the results. If the communalities and the factor loadings are the same on the analysis on each half and the full data set, we have evidence that the findings are generalizable and valid because, in effect, the two analyses represent a study and a replication.

Misleading Results to Watch Out For When we examine the communalities and factor loadings, we are matching up overall patterns, not exact results: the communalities should all be greater than 0.50 and the pattern of the factor loadings should be the same. Sometimes the variables will switch their components (variables loading on the first component now load on the second and vice versa), but this does not invalidate our findings. Sometimes, all of the signs of the factor loadings will reverse themselves (the plus's become minus's and the minus's become plus's), but this does not invalidate our findings because we interpret the size, not the sign of the loadings.

When validation fails If the validation fails, we are warned that the solution found in the analysis of the full data set is not generalizable and should not be reported as valid findings. We do have some options when validation fails: If the problem is limited to one or two variables, we can remove those variables and redo the analysis. Randomly selected samples are not always representative. We might try some different random number seeds and see if our negative finding was a fluke. If we choose this option, we should do a large number of validations to establish a clear pattern, at least 5 to 10. Getting one or two validations to negate the failed validation and support our findings is not sufficient.

Reliability of Summated Scales One of the common uses of factor analysis is the formation of summated scales, where we sum or average the scores on all the variables loading on a component to create the score for the component. To verify that the variables for a component are measuring similar entities that are legitimate to add together, we compute Chronbach's alpha. If Chronbach's alpha is 0.70 or greater (0.60 or greater for exploratory research), we have support on the interval consistency of the items justifying their use in a summated scale. Chronbach’s alpha requires that all variables be coded in the same direction. If there are negative loadings on a component, the variable must be reverse coded to get the correct value for alpha.

The Problem in BlackBoard The problem statement tells us: the data set and variables included in the analysis the alpha for the statistical tests The seed number to use for the validation analysis

Statement about Level of Measurement The first statement in the problem asks about level of measurement. Principal components analysis requires that all of the variables included in the analysis are metric.

Marking the Statement about Level of Measurement All of the variables included in the analysis are ordinal level. We will employ the common convention of treating ordinal variables as metric variables, but we should consider mentioning this as a limitation to the analysis. Since we treated all variables as metric, we mark the check box.

Statement about Sample Size We will use the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Run the Principal Components Analysis - 1 To answer the question about the sample size, we run the first principal components analysis. Select the Factor command from the Analyze > Data Reduction menu.

Run the Principal Components Analysis - 2 First, move the variables listed in the problem to the Variables list box. Next, click on the Descriptives button to request the statistics needed to evaluate the suitability of the data for factor analysis.

Run the Principal Components Analysis - 3 First, mark the check box for Univariate Statistics to get the number of valid cases for the analysis. Third, click on the Continue button to close the Factor Analysis: Descriptives dialog box. Second, mark the check boxes for the statistics for the suitability of factor analysis: Coefficients of the correlation matrix, KMO and Bartlett’s test of sphericity, and Anti-image correlation matrix.

Run the Principal Components Analysis - 4 Click on the Extraction button to tell SPSS what method it should use to extract the factors.

Run the Principal Components Analysis - 5 We will use the default method of Principal Components. The drop down list contains numerous other methods. Click on the Continue button to close the dialog box. We accept the other defaults for displaying the unrotated factor solution and extracting eigenvalues over 1.

Run the Principal Components Analysis - 6 Click on the Rotation button to tell SPSS what method it should use to rotate the factors to clarify the interpretation.

Run the Principal Components Analysis - 7 Click on the Continue button to close the dialog box. We mark the option button for the Varimax rotation which will make the factors independent of each other.

Run the Principal Components Analysis - 8 Having specified the analysis, click on the OK button to produce the output.

Output for Sample Size Requirement The 509 cases available for this principal components analysis satisfy the minimum sample size requirement of 150 valid cases recommended by Tabachnick and Fidell (1996).

Marking the Statement about Sample Size Since we satisfied the minimum sample size requirement, we mark the statement. If we did not satisfy the sample size requirement, we should consider mentioning this fact as a limitation to the analysis. Factor analysis can be numerically unstable when the sample size is small.

The Statement about Suitability for Factor Analysis: Sufficient Correlations Principal components analysis requires that there be some correlations greater than 0.30 (more than 1) between the variables included in the analysis.

Sufficient Correlations in Correlation Matrix For this set of variables, there are 9 correlations in the matrix greater than 0.30.

Marking the Statement about Sufficient Correlations Since there are 9 correlations greater than 0.30, we mark the statement.

The Statement about Suitability for Factor Analysis: Test of Sphericity Bartlett’s test of sphericity tests the null hypothesis that the correlation matrix is an identity matrix with 1’s, or perfect correlations, on the main diagonal, and 0’s for all of the remaining elements. If this is true, the variables are not correlated and the factor analysis will not work. Our goal in this test is to reject the null hypothesis, supporting the contention that there are sufficient correlations, or similarity of values, among the variables that several can be combined into a factor or component.

Bartlett’s Test of Sphericity Principal component analysis requires that the probability associated with Bartlett's Test of Sphericity (χ²(df=15, N = 509) = 854.15, p < .001) be less than or equal to the level of significance (0.05). The probability associated with the Bartlett Test satisfies this requirement.

Marking the Statement about Bartlett’s Test of Sphericity Since the probability associated with the Bartlett Test is sufficient to reject the null hypothesis, we mark the check box.

The Statement about Suitability for Factor Analysis: Sampling Adequacy Sampling adequacy predicts if data are likely to factor well, based on correlation and partial correlation. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy (MSA) must be greater than 0.50 for each individual variable as well as the set of variables. Variables that do not have an MSA of .50 or greater are removed from the analysis one at a time, until all variables and the overall measure are above .50.

Measures of Sampling Adequacy for Individual Variables In the initial iteration for suitability of principal components analsyis , the MSA for all of the individual variables was greater than 0.50 ("information and knowledge are shared openly within this organization" [q76] - .70; "an effort is made to get the opinions of people throughout the organization" [q77] - .69; "our web site is easy to use and contains helpful information" [q83] - .76; "I have a good understanding of our mission, vision, and strategic plan" [q84] - .73; "I believe we communicate our mission effectively to the public" [q85] - .81; and "my organization encourages me to be involved in my community" [q86] - .84). Note: Not all MSA’s are shown on this slide.

Kaiser-Meyer-Olkin Measure of Sampling Adequacy In addition, the overall MSA for the set of variables included in the analysis was 0.75, which exceeds the minimum requirement of 0.50 for overall MSA.

Marking the Statement about Measures of Sampling Adequacy Since the sampling adequacy measures met the criteria for both individual variables and overall, the check box is marked.

Statement about Initial Number of Factors Various tests are used to estimate the number of factors to be extracted. This was very important when factor analysis was calculated by hand. Two of the criteria were the latent root criterion which was based on the number of eigenvalues greater than 1.0 and the cumulative proportion of variance criteria which calculated the number of components needed to explain 60% or more of the total variance in the original set of variables. The problem offers two possible responses.

Initial Number of Factors: Eigenvalues Greater than One The latent root criterion for number of factors to extract would indicate that there were 2 components to be extracted for these variables, since there were 2 eigenvalues greater than 1.0 (2.84, and 1.05).

Initial Number of Factors: Percentage of Variance Explained In addition, the cumulative proportion of variance criteria can be met with 2 components to satisfy the criterion of explaining 60% or more of the total variance in the original set of variables. A 2 component solution would explain an estimated 64.86% of the total variance.

Marking the Statement about Initial Number of Factors Since the SPSS default is to extract the number of components indicated by the latent root criterion, our initial factor solution will be based on the extraction of 2 components. We mark the second statement in the pair. Note: the question is worded to indicate that both criteria suggest the same number of factors. Should they suggest a different number of factors, neither statement would be marked, but we would still continue with the factor analysis using the number of factors suggested by the latent root criteria.

Statement about First Iteration of Factor Extraction The problem suggests that the first iteration of the factor solution included a variable (my organization encourages me to be involved in my community [q86] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Output for Communalities on First Iteration Examination of the first principal components model extracted by SPSS resulted in the removal of the variable "my organization encourages me to be involved in my community" [q86] from the analysis. "My organization encourages me to be involved in my community" [q86]was removed because it communality (.467) meant that the factor solution explained less than half of the variable's variance. The communality for this variable was less than the minimum requirement that the factor solution should explain at least 50% of the variance in the original variable.

Marking the Statement about First Iteration of Factor Extraction My organization encourages me to be involved in my community [q86] was removed because it did not satisfy the requirement for communalities, i.e. the factors should explain at least 50% of the variance in the variable. Since we have already determined that the variable is to be removed, it was not necessary to check the factor loadings for simple structure. The first statement in the pair is marked.

Removing a Variable from the Factor Analysis - 1 To remove the variable, my organization encourages me to be involved in my community [q86], we select Factor Analysis from the Dialog Recall drop down menu.

Removing a Variable from the Factor Analysis - 2 To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

Removing a Variable from the Factor Analysis - 3 Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Statement about Second Iteration of Factor Extraction The problem suggests that the second iteration of the factor solution included a variable (I believe we communicate our mission effectively to the public [q85] ) that should be excluded, with because it did not satisfy the requirement for communalities, or because it violated simple structure.

Output for Communalities on Second Iteration Examination of the second principal components model extracted by SPSS produced a table of Communalities in which all variables have the required minimum of .50.

Output for Factor Structure on Second Iteration Examination of the second principal components model extracted by SPSS resulted in the removal of the variable "I believe we communicate our mission effectively to the public" [q85] from the analysis. The variable "I believe we communicate our mission effectively to the public" [q85] had loadings of 0.40 or higher on component 1 (.526) and component 2 (.536). Multiple high loadings violates the requirement for simple structure, so this variable was removed from the analysis.

Marking the Statement about Second Iteration of Factor Extraction I believe we communicate our mission effectively to the public [q85] was removed because it did not satisfy the requirement for simple structure, so the first statement in the pair is marked.

Removing a Variable from the Factor Analysis - 1 To remove the variable, I believe we communicate our mission effectively to the public [q85], we select Factor Analysis from the Dialog Recall drop down menu.

Removing a Variable from the Factor Analysis - 2 To remove the variable, highlight the target variable in the Variables list box, and click on the arrow button pointing to the left.

Removing a Variable from the Factor Analysis - 3 Since all of the other specifications for the analysis remain the same, click on the OK button to produce the output for the second iteration.

Statement about Third Iteration of Factor Extraction The problem does not indicate that any variables were removed on the third iteration of the factor extraction, and that the solution met all of the requirements for a factor analysis solution: all the variables remaining in the analysis had communalities above 0.50, demonstrated simple structure, and each component had more than one variable loading on it

Output for Communalities on Third Iteration - 1 Examination of the third principal components model extracted by SPSS produced a table of Communalities in which all four variables have the required minimum of .50.

Output for Factor Structure on Third Iteration - 2 Examination of the third principal components model extracted by SPSS did not show any variables having a loading of .40 on both of the components.

Output for Factor Structure on Third Iteration - 3 Each of the components has two variables loading on it. If a component had only one variable loading on it, it would make more sense to use the original variable in subsequent analyses rather than the component.

Marking the Statement about Third Iteration of Factor Extraction On the third iteration, all of the requirements for a factor solution were satisfied. For the 4 variables not excluded from the analysis, two components can be substituted for the 4 variables. Since the final solution found two components, so we mark the statement.

Statement about Variables Loading on the First Component Two options are given which suggest different combinations of variables loading on the first component.

Output for Component One Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix" Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] (loading = .901); and "an effort is made to get the opinions of people throughout the organization" [q77] (loading = .912). We can substitute one component variable for this combination of variables in further analyses.

Marking the Statement about Variables Loading on the First Component Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77]. We mark the fist statement in the pair.

Statement about Variables Loading on the Second Component Two options are given which suggest different combinations of variables loading on the second component.

Output for Component Two Since more than one component was extracted, the factor structure is based on the "Rotated Component Matrix" Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] (loading = .821); and "I have a good understanding of our mission, vision, and strategic plan" [q84] (loading = .833). We can substitute one component variable for this combination of variables in further analyses.

Marking the Statement about Variables Loading on the Second Component Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]. We mark the fist statement in the pair.

Statement about Percentage of Variance Explained by Factors The final statement questions whether or not the factor solution met the standard of explaining 60% of the variance in the variables that were replaced.

Output for Percentage of Variance Explained by Factors The components explain 77.25% of the total variance in the variables which are included on the components. This percentage of variance explained satisfies the goal of explaining 60% or more of the total original variance in the variables.

Marking the Statement about Percentage of Variance Explained by Factors Since the percentage of variance explained by the factors satisfies the goal of explaining 60% or more of the total original variance in the variables the components will replace, we mark the final statement.

Statement about Outliers The next statement requires us to determine whether or not there are any outliers in the results of the principal components analysis. If outliers are found, they are removed from the analysis and the results computed again. If the factor solution is the same as that based on all cases, we conclude that outliers do not have any impact and we report the results based on all cases. If the solution without outliers is different, we face the difficult decision of which factor structure should be reported. In our problems, we will halt the analysis.

Detecting Outliers - 1 To detect outliers, we compute the factor scores in SPSS. Select the Factor Analysis command from the Dialog Recall tool button

Detecting Outliers - 2 The only command we need to change is to request SPSS to compute the factor scores. Click on the Scores… button to access the factor scores dialog box.

Detecting Outliers - 3 First, click on the Save as variables checkbox to create factor variables. Third, click on the Continue button to complete the specifications. Second, accept the default method using a Regression equation to calculate the scores.

Detecting Outliers - 4 Click on the Continue button to compute the factor scores.

Outliers in the Data Editor SPSS creates the factor score variables in the data editor window. It names the first factor score “FAC1_1,” and the second factor score “FAC2_1.” We need to check to see if we have any values for either factor score that are larger than ±3.0. One way to check for the presence of large values indicating outliers is to sort the factor variables and see if any fall outside the acceptable range. Should you forget to delete the factor scores from the previous analysis, SPSS will alter the final digit in the factor name, i.e. instead of naming it FAC1_1, it will name it FAC1_2.

Sort the data to locate outliers for factor one First, select the FAC1_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Negative outliers for factor one Scroll down past the cases for whom factor scores could not be computed because of missing data. We see that none of the scores for factor one are less than or equal to -3.0, so there are no outliers detected yet.

Positive outliers for factor one Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor one are greater than or equal to +3.0. There are no outliers on factor one.

Sort the data to locate outliers on factor two First, select the fac2_1 column by clicking on its header. Second, right click on the column header and select the Sort Ascending command from the drop down menu.

Negative outliers for factor two Scrolling down past the cases for whom factor scores could not be computed, we see that there are five cases that have a score factor less than or equal to -3.0 on factor 2.

Positive outliers for factor two Scrolling down to the bottom of the sorted data set, we see that none of the scores for factor two are greater than or equal to +3.0. We will run the analysis excluding the five negative outliers, and see if it changes our interpretation of the analysis.

Removing the outliers To see whether or not outliers are having an impact on the factor solution, we will compute the factor analysis without the outliers and compare the results. To remove the outliers, we will include the cases that are not outliers. Choose the Select Cases… command from the Data menu.

Setting the If condition First, mark the option button for the If condition is satisfied. Click on the If… button to enter the formula for selecting cases to include in the analysis.

Formula to select cases that are not outliers First, type the formula as shown. The formula says: include cases if the absolute value of the first and second factor scores are less than 3.0. Second, click on the Continue button to complete the specification.

Complete the select cases command SPSS writes the formula we entered next to the IF button. Having entered the formula for including cases, click on the OK button to complete the selection.

The outliers selected out of the analysis When SPSS selects a case out of the data analysis, it draws a slash through the case number. The cases that we identified as outliers will be excluded. The cases with missing data are also excluded because they do not satisfy the criteria in the formula.

Repeating the factor analysis To repeat the factor analysis without the outliers, select the Factor Analysis command from the Dialog Recall tool button

Stopping SPSS from computing factor scores again On the last factor analysis, we included the specification to compute factor scores. Since we do not need to do this again, we will remove the specification. Click on the Scores… button to access the factor scores dialog.

Clearing the command to save factor scores First, clear the Save as variables checkbox. This will deactivate the Method options. Second, click on the Continue button to complete the specification

Computing the factor analysis To produce the output for the factor analysis excluding outliers, click on the OK button.

Comparing communalities All of the communalities for the factor analysis including all cases satisfy the minimum requirement of being larger than 0.50. All of the communalities for the factor analysis excluding outliers satisfy the minimum requirement of being larger than 0.50. Though the communalities for each variable are slightly smaller when we excluded outliers, we would not alter our interpretation of the role of these four variables in the solution.

Comparing factor loadings The factor loadings for the factor analysis including all cases is shown on the left. The factor loadings for the factor analysis excluding outliers is shown on the right. The pattern of variable loadings on components did not change when the outliers were removed. Component 1 included the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization“. Component 2 included the variables: "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84].

Marking the Statement about Outliers The presence of outliers did not alter the factor solution. The factor solution based on all cases should be used in further analyses. We mark the check box for no impact due to outliers. Had the factor solution changed, we would have halted the analysis until we could understand the problem further.

Statement about Generalizability Since factor analysis tends to over-fit the data used to develop the model at the expense of generalizability, we will test generalizability with a split sample validation strategy. In this strategy, we divide the sample in half, conduct the factor analysis on each half, and compare the results to the analysis on the full data set.

Deleting the Factor Scores Before we do the split-sample validation, we will delete the factors scores that we used to detect outliers. First, highlight the columns containing the factors scores. Second, select the Clear command from the Edit menu.

Restoring All Cases to the Analysis - 1 We removed cases that were detected as outliers. Before doing our validation, we need to restore these cases to subsequent analyses. Select the Select Cases command from the Data menu.

Restoring All Cases to the Analysis - 2 First, click on the All cases option button. Click on the OK button to restore the cases.

All Cases Restored to the Data Set The slash lines are removed from the case numbers, indicating that all cases are available to the analysis.

Split-sample validation We validate our analysis by conducting an analysis on each half of the sample. We compare the results of these two split sample analyses with the analysis of the full data set. To split the sample into two half, we generate a random variable that indicates which half of the sample each case should be placed in. To compute a random selection of cases, we need to specify the starting value, or random number seed. Otherwise, the random sequence of numbers that you generate will not match mine, and we will get different results. Before we do the random selection, you must make certain that your data set is sorted in the original sort order, or the cases in your two half samples will not match mine. To make certain your data set is in the same order as mine, sort your data set in ascending order by case id.

Sorting the data set in ascending order To make certain the data set is sorted in the original order, highlight the case id column, right click on the column header, and select the Sort Ascending command from the popup menu.

Setting the random number seed - 1 To set the random number seed, select the Random Number Generators… command from the Transform menu. NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.

Setting the random number seed - 2 First, mark the check for Set Starting Point. Fourth, click on the OK button to complete the action. Second, select the option button for a Fixed Value. Third, type the seed number provided in the problem directions: 291769. NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.

Computing the split variable - 1 To enter the formula for the variable that will split the sample in two parts, click on the Compute… command.

Computing the split variable - 2 First, type the name for the new variable, split, into the Target Variable text box. Second, the formula for the value of split is shown in the text box. The uniform(1) function generates a random decimal number between 0 and 1. The random number is compared to the value 0.50. If the random number is less than or equal to 0.50, the value of the formula will be 1, the SPSS numeric equivalent to true. If the random number is larger than 0.50, the formula will return a 0, the SPSS numeric equivalent to false. Third, click on the OK button to complete the dialog box.

The split variable in the data editor In the data editor, the split variable shows a random pattern of zero’s and one’s. To select half of the sample for each validation analysis, we will first select the cases where split = 0, then select the cases where split = 1.

Repeating the analysis with the first validation sample To repeat the principal component analysis for the first validation sample, select Factor Analysis from the Dialog Recall tool button.

Using "split" as the selection variable First, scroll down the list of variables and highlight the variable split. Second, click on the right arrow button to move the split variable to the Selection Variable text box.

Setting the value of split to select cases When the variable named split is moved to the Selection Variable text box, SPSS adds "=?" after the name to prompt up to enter a specific value for split. Click on the Value… button to enter a value for split.

Completing the value selection Second, click on the Continue button to complete the value entry. First, type the value for the first half of the sample, 0, into the Value for Selection Variable text box.

Requesting output for the first validation sample Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 0 for the split variable. Since the validation analysis requires us to compare the results of the analysis using the two split sample, we will request the output for the second sample before doing any comparison.

Repeating the analysis with the second validation sample To repeat the principal component analysis for the second validation sample, select Factor Analysis from the Dialog Recall tool button.

Setting the value of split to select cases Since the split variable is already in the Selection Variable text box, we only need to change its value. Click on the Value… button to enter a different value for split.

Completing the value selection Second, click on the Continue button to complete the value entry. First, type the value for the second half of the sample, 1, into the Value for Selection Variable text box.

Requesting output for the second validation sample Click on the OK button to request the output. When the value entry dialog box is closed, SPSS adds the value we entered after the equal sign. This specification now tells SPSS to include in the analysis only those cases that have a value of 1 for the split variable.

Comparing communalities All of the communalities for the first split sample satisfy the minimum requirement of being larger than 0.50. All of the communalities for the second split sample satisfy the minimum requirement of being larger than 0.50. Note how SPSS identifies for us which cases we selected for the analysis.

Comparing factor loadings The pattern of factor loading for both split samples shows the variables: "information and knowledge are shared openly within this organization" [q76] and "an effort is made to get the opinions of people throughout the organization" [q77] loading on component one, and "our web site is easy to use and contains helpful information" [q83] and "I have a good understanding of our mission, vision, and strategic plan" [q84]loading on the second component.

Marking the Statement about Generalizability All of the communalities in both validation samples met the criteria. The pattern of loadings for both validation samples is the same, and the same as the pattern for the analysis using the full sample. In effect, we have done the same analysis on two separate sub-samples of cases and obtained the same results. This validation analysis supports a finding that the results of this principal component analysis are generalizable to the population represented by this data set. We mark the check box.

Statement about Summated Scales The next statement indicates that we can form a summative scale from the variables loading on a component, i.e. summing or averaging the scores for the variables. The utility of summated scales is measured by Chronbach’s alpha, which should minimally be greater than 0.60, and preferably be greater than 0.70.

Computing Chronbach's Alpha To compute Chronbach's alpha for each component in our analysis, we select Scale > Reliability Analysis… from the Analyze menu.

Selecting the variables for the first component First, move the two variables that loaded on the first component (q76 and q77) to the Items list box. Second, click on the Statistics… button to select the statistics we will need.

Selecting the statistics for the output Second, click on the Continue button. First, mark the checkboxes for Item, Scale, and Scale if item deleted.

Completing the specifications Second, click on the OK button to produce the output. First, If Alpha is not selected as the Model in the drop down menu, select it now.

Chronbach's Alpha The reliability for component 1 as measured by Chronbach's alpha is 0.814, which is greater than the generally agreed upon lower limit of 0.70. The variables included on this component ("information and knowledge are shared openly within this organization" and "an effort is made to get the opinions of people throughout the organization") can be used in a summated scale.

Computing Chronbach's Alpha To compute Chronbach's alpha for the second scale we select Reliability Analysis from the Dialog Recall menu.

Selecting the variables for the second component First, remove the variables that loaded on the first component and move the two variables that loaded on the second component to the Items list box. Second, since we want the same output we had for the first component, we only need to click on the OK button to produce the output.

Chronbach's Alpha The reliability for component 2 as measured by Chronbach's alpha is 0.561, which is less than the generally agreed upon lower limit of 0.70, and even less than the 0.60 lower limit for exploratory research. A summated scale based on these variables ("our web site is easy to use and contains helpful information" and "I have a good understanding of our mission, vision, and strategic plan") should not be used.

Chronbach's Alpha if Item Deleted - 1 If alpha is too small, the Chronbach’s Alpha if Item Deleted column may suggest which variable should be removed to improve the internal consistency of the scale variables. It tells us what alpha we would get if the variable listed were removed from the scale. In this example, it does not produce a result because there are only two items and the removal of one would result in a one-item scale, which is not useful.

Chronbach's Alpha if Item Deleted - 2 Though not part of this problem, this output demonstrates the output for deleting an item to increase alpha. If the last item in this table were deleted, alpha would increase to .820, instead of the .686 for alpha with this item included.

Marking the Statement about Summated Scales Since the variables loading on the second component did not satisfy the reliability scale, we leave the check box un-marked.

Principal Components Analysis: Level of Measurement Level of measurement ok (all variables metric)? Do not mark check box for level of measurement No Mark: Inappropriate application of the statistic Yes Stop Mark check box for level of measurement Ordinal level variable treated as metric? Consider limitation in discussion of findings Yes No

Principal Components Analysis: Sample Size Run Principal Components Analysis Adequate Sample Size (at least 150 valid cases) Do not mark check box for sample size No Consider limitation in discussion of findings Yes Mark check box for sample size

Principal Components Analysis: Suitability for Factor Analysis - 1 Two or more correlations ≥ 0.30? Do not mark check box for correlations No Stop, variables not good candidate for factor analysis Yes Mark check box for correlations Probability for Bartlett test of sphericity ≤ alpha? Do not mark check box for sphericity test No Stop, variables not good candidate for factor analysis Yes Mark check box for sphericity test

Principal Components Analysis: Suitability for Factor Analysis - 2 Remove variable with lowest MSA. Run PCA again. No Sampling adequacy ≥ 0.50 for each variable? One variable remaining in analysis? No Yes Yes Do not mark check box for MSA Stop, variables not good candidate for factor analysis KMO measure of sampling adequacy ≥ 0.50? Do not mark check box for MSA No Stop, variables not good candidate for factor analysis Yes Mark check box for MSA

Principal Components Analysis: Anticipated Number of Factors Today, this step provides information to the analyst about the potential solution. When factor analysis was calculated by hand, this step determined how one would do the calculations. Correct umber factors supported by eigenvalues > 1.0 and the number of components needed to explain 60% of the variance? Don’t mark check box for number of factors No Yes Mark correct check box for number of factors

Principal Components Analysis: Excluding Variables for Low Communality Communality for all variables ≥ 0.50? Do not mark check box for communality removal Yes No Mark check box for communality removal Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

Principal Components Analysis: Excluding Variables for Complex Structure Simple structure (all variables load on single component)? Do not mark check box for complex structure removal Yes No Mark check box for complex structure removal Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

Principal Components Analysis: Excluding Variables for One-variable Components All components have more than one variable loading? Do not mark check box for one-variable component Yes No Mark check box for one-variable component Remove variable load that is only one loading on component. One variable remaining in analysis? No Yes Run PCA again. Stop, no viable factor solution

Principal Components Analysis: Factor Structure Repeat this step for each component Correct number of components extracted? Do not mark check box for number of component No Yes Mark check box for number of components Correct list of variables loaded on component? Do not mark check box for loadings on component No Yes Mark check box for loadings on component

Principal Components Analysis: Percent of Variance Explained Components explain 60% or more of variance of included variables? Do not mark check box for percent of variance No Include as limitation in discussion of findings Yes Mark check box for percent of variance

Principal Components Analysis: Impact of Outliers - 1 Starting here, we include only the variables in the factor solution. Re-run factor analysis, requesting regression factor scores Yes Are any of the factor scores outliers (larger than ±3.0)? No No outliers, mark check box for no impact Go to validation analysis Yes Re-run factor analysis, excluding outliers

Principal Components Analysis: Impact of Outliers - 2 Are all of the communalities excluding outliers greater than 0.50? Do not mark check box for no impact No Stop, clarify which analysis should be reported Yes Pattern of factor loadings excluding outliers match pattern for full data set? Do not mark check box for no impact No Stop, clarify which analysis should be reported Yes Mark check box for no impact Re-run factor analysis, including all cases Since outliers had no effect, there is no reason to exclude them from the analysis

Principal Components Analysis: Validation Analysis - 1 Compute split variable using specified random number seed Run factor analysis, selecting cases where split = 0 Run factor analysis, selecting cases where split = 1 Are all of the communalities for both split samples greater than 0.50? Do not mark check box for validation analysis No Stop, generalizability of findings is questionable Yes

Principal Components Analysis: Impact of Outliers - 2 Pattern of factor loadings for split samples matches factor loadings for full data set? Do not mark check box for validation analysis No Stop, generalizability of findings is questionable Yes Mark check box for generalizability

Principal Components Analysis: Reliability Analysis Compute Chronbach’s alpha for all components Chronbach’s alpha greater than .60 for all components? Do not mark check box for summated scales No Yes Chronbach’s alpha greater than .70 for all components? Mark check box for summated scales No Add note of caution to interpretation Yes Mark check box for summated scales