Download presentation
Presentation is loading. Please wait.
Published byAshlyn Kennedy Modified over 9 years ago
1
Principal Components Principal components is a method of dimension reduction. Suppose that you have a dozen variables that are correlated. You might use principal components analysis to reduce your 12 measures to a few principal components. Unlike factor analysis, principal components analysis is not usually used to identify underlying latent variables. Mike Cox, Newcastle University, me fecit 22/11/2014 Thursday, 20 April :30 PM
2
Principal Components Principal components is a technique that requires a large sample size. Principal components is based on the correlation matrix of the variables involved, and correlations usually need a large sample size before they stabilize.
3
Principal Components As a rule of thumb, a bare minimum of 10 observations per variable is necessary to avoid computational difficulties. Comrey and Lee (1992) A First Course In Factor Analysis
4
Principal Components In this example we have included many options, while you may not wish to use all of these options, we have included them here to aid in the explanation of the analysis.
5
Principal Components In this example we examine students assessment of academic courses. We restrict attention to 12 variables. Scored on a five point Likert scale, seven is better.
6
Principal Components In this example we examine students assessment of academic courses. We restrict attention to 12 variables. Scored on a five point Likert scale.
7
Principal Components Analyze > Dimension Reduction > Factor
8
Principal Components Select variables that is “instructor well prepared” to “compared to other courses this course was”. By using the arrow button. Use the buttons at the side of the screen to set additional options.
9
Principal Components Use the buttons at the side of the screen to set the Descriptives employ the Continue button to return to the main Factor Analysis screen.
10
Principal Components Use the buttons at the side of the screen to set the Extraction employ the Continue button to return to the main Factor Analysis screen. Select the appropriate method and the eigen value criteria, set at 1. It is essential to obtain a scree plot.
11
Principal Components Select the OK button to proceed with the analysis, or Paste to preserve the syntax. Syntax factor /variables item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 /print initial correlation det kmo repr extraction univariate /format blank(.30) /plot eigen /extraction pc /method = correlate. After “/extraction” you can introduce a promax rotation /rotation promax(4)
12
Principal Components The descriptive statistics table is output because we used the univariate option.
13
Principal Components Mean - These are the means of the variables used in the factor analysis. Are these appropriate for a Likert scale?
14
Principal Components Std. Deviation - These are the standard deviations of the variables used in the factor analysis. Are these appropriate for a Likert scale?
15
Principal Components Analysis N - This is the number of cases used in the factor analysis.
16
Principal Components The correlation matrix table was included in the output because we included the correlation option. This table gives the correlations between the original variables (which were specified). Before conducting a principal components analysis, you want to check the correlations between the variables. If any of the correlations are too high (say above 0.9), you may need to remove one of the variables from the analysis, as the two variables seem to be measuring the same thing. Another alternative would be to combine the variables in some way (perhaps by taking the average).
17
Principal Components If the correlations are too low, say below 0.1, then one or more of the variables might load only onto one principal component (in other words, make its own principal component). This is not helpful, as the whole point of the analysis is to reduce the number of items (variables).
18
Principal Components The correlation matrix is extremely large.
19
Principal Components The correlation matrix is extremely large.
20
Principal Components Kaiser-Meyer-Olkin Measure of Sampling Adequacy This measure varies between 0 and 1, and values closer to 1 are better. A value of 0.6 is a suggested minimum.
21
Principal Components Bartlett's Test of Sphericity - This tests the null hypothesis that the correlation matrix is an identity matrix. An identity matrix is matrix in which all of the diagonal elements are 1 and all off diagonal elements are 0. You want to reject this null hypothesis.
22
Principal Components Taken together, these tests provide a minimum standard, which should be passed before a principal components analysis (or a factor analysis) should be conducted.
23
Principal Components Communalities - This is the proportion of each variable's variance that can be explained by the principal components (e.g. the underlying latent continua).
24
Principal Components Initial - By definition, the initial value of the communality in a principal components analysis is 1.
25
Principal Components Extraction - The values in this column indicate the proportion of each variable's variance that can be explained by the principal components. Variables with high values are well represented in the common factor space, while variables with low values are not well represented. (In this example, we don't have any particularly low values.)
26
Principal Components Component - There are as many components extracted during a principal components analysis, as there are variables that are put into it. In our example, we used 12 variables (item13 through item24), so we have 12 components.
27
Principal Components Initial eigen values - eigen values are the variances of the principal components. Because we conducted our principal components analysis on the correlation matrix, the variables are standardized, which means that the each variable has a variance of 1, and the total variance is equal to the number of variables used in the analysis, in this case, 12.
28
Principal Components Initial eigen values - Total - This column contains the eigen values. The first component will always account for the most variance (and hence have the highest eigen value), and the next component will account for as much of the left over variance as it can, and so on. Hence, each successive component will account for less and less variance.
29
Principal Components Initial eigen values - % of Variance - This column contains the percent of variance accounted for by each principal component (6.249/12 = 0.52).
30
Principal Components Initial eigen values - Cumulative % - This column contains the cumulative percentage of variance accounted for by the current and all preceding principal components. For example, the second row shows a value of This means that the first two components together account for % of the total variance.
31
Principal Components Extraction Sums of Squared Loadings - The three columns in this half of the table exactly reproduce the values given on the same row on the left side of the table. The number of rows reproduced on the right side of the table is determined by the number of principal components whose eigen values are 1 or greater. Totally agree
32
Principal Components The scree plot graphs the eigen value against the component number.
33
Principal Components In general, we are interested in keeping only those principal components whose eigen values are greater than 1 (we set this value).
34
Principal Components Component Matrix - This table contains component loadings, which are the correlations between the variable and the component. Because these are correlations, possible values range from -1 to +1. It is usual to not report any correlations that are less than |.3|. As shown.
35
Principal Components Component - The columns under this heading are the principal components that have been extracted. As you can see by the footnote provided by SPSS, two components were extracted (the two components that had an eigen value greater than 1).
36
Principal Components You usually do not try to interpret the components in the way that you would factors that have been extracted from a factor analysis. Rather, most people are interested in the component scores, which are used for dimension reduction (as opposed to factor analysis where you are looking for underlying latent continua).
37
Principal Components For a component plot employ the Rotation option
38
Principal Components Its always wise to plot your results. Note the clusters.
39
Principal Components The advantages in adopting Factor Analysis as opposed to Principal Components Analysis for component evaluation and/or instrumental variable estimation purposes are reported (Travaglini 2011). Under Factor Analysis, the scores are in fact shown to produce more efficient slope estimators when utilized as regressor’s and/or instruments. Together with the factors they also exhibit a higher degree of consistency even for large sample dimensions. Finally under Factor Analysis, dimension reduction is definitely more stringent, greatly facilitating the search and identification of the common components of the available dataset (Travaglini 2011).
40
Principal Components Principal Components Analysis and Factor Analysis share the search for a common structure characterized by few common components, usually known as “scores” that determine the observed variables contained in matrix X. However, the two methods differ on the characterization of the scores as well as on the technique adopted for selecting their true number. In Principal Components Analysis the scores are the orthogonalised principal components obtained through rotation, while in Factor Analysis the scores are latent variables determined by unobserved factors and loadings which involve idiosyncratic error terms. The dimension reduction of X implemented by each method produces a set of fewer homogenous variables – the true scores – which contain most of the model’s information.
41
Principal Components For a detailed discussion and a brief numerical derivation see Velicer and Jackson (1990), who also give an extensive bibliography. “Should one do a component analysis? The choice is not obvious, because the two broad classes of procedures serve a similar purpose, and share many important mathematical characteristics. Despite many textbooks describing common factor analysis as the preferred procedure, principal component analysis has been the most widely applied.” Velicer, W.F. and Jackson, D.N “Component Analysis Versus Common Factor Analysis: Some Issues In Selecting An Appropriate Procedure” Multivariate Behavioral Research 25(1) 1-28.
42
Principal Components After some mathematics!
“An examination of the algebraic representations of the two methods of analysis has served to highlight the differences between them. However, when the same number of components or factors are extracted, the results from different types of component or factor analysis procedures typically yield highly similar results. Discrepancies are rarely, if ever, of any practical importance in subsequent interpretations.” Velicer, W.F. and Jackson, D.N “Component Analysis Versus Common Factor Analysis: Some Issues In Selecting An Appropriate Procedure” Multivariate Behavioral Research 25(1) 1-28.
43
Principal Components Summary
Principal Components is used to help understand the covariance structure in the original variables and/or to create a smaller number of variables using this structure. Factor Analysis like principal components is used to summarise the data covariance structure in a smaller number of dimensions. The emphasis is the identification of underlying “factors” that might explain the dimensions associated with large data variability.
44
Similarities Principal Components Analysis and Factor Analysis have these assumptions in common: Measurement scale is interval or ratio level. Random sample - at least 5 observations per observed variable and at least 100 observations. Larger sample sizes recommended for more stable estimates, observations per observed variable.
45
Similarities Principal Components Analysis and Factor Analysis have these assumptions in common: Over sample to compensate for missing values Linear relationship between observed variables Normal distribution for each observed variable Each pair of observed variables has a bivariate normal distribution Are both variable reduction techniques. If communalities are large, close to 1.00, results could be similar.
46
Similarities Principal Components Analysis assumes the absence of outliers in the data. Factor Analysis assumes a multivariate normal distribution when using Maximum Likelihood extraction method.
47
Differences
48
SPSS Tips Now you should go and try for yourself.
Each week our cluster (5.05) is booked for 2 hours after this session. This will enable you to come and go as you please. Obviously other timetabled sessions for this module take precedence.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.