Download presentation
Presentation is loading. Please wait.
Published byAugusta Bathsheba Cox Modified over 9 years ago
1
Advanced Statistics Factor Analysis, II
2
Last lecture 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try to confirm) our a priori assumption about this relation? Ad 1. The difference between PCA (principal component analysis) and FA (factor analysis). Ad 2. The difference between EFA (exploratory factor analysis and CFA (confirmatory factor analysis).
3
PCA and FA extraction Factor loadings (components for PCA) are correlations between factors and variables. For PCA and FA they are extracted on the basis of eigenvectors and eigenvalues associated with this vectors. Eiganvectors V are linear combination of variables to account for the variance measured by the corresponding eigenvalues L (variance refers to factors) Basic equation of extraction is
4
VariablesUnrotatedRotatedCommun- alities Factor 1Factor 2Factor 1Factor 2 COST-.400.900.086.981.970 LIFT.251-.947-.071-.977.960 DEPTH.932.348.994.026.989 POWDER.956.286.997-.040.996 SSL1.9941.9191.9941.9193.915 % of variance 5048504898
5
Number of factors
6
Extraction methods Principal component Principal factors Image factoring (rescales – unique variances eliminated) MLF – maximum likelihood factoring (has significance test for factors)
7
T-F: Practical issues 1. Sample size and missing data N > 300 + Missing data: consider regression imputation 2. Normality Normal distribution: – (1) unimodal, – (2) symmetric (0 skewness), – (3) mesokurtic (not too tall, not to flat) Checking the distribution of x i as for the regression and other analysis.
8
T-F: Practical issues 3. Linearity. Scaterplots for pairs of x i 4. Absence of outliers among cases 5. Absence of multicolinearity Computation of SMCs. SMC = squared multiple correlation of a variable where it serves as DV and the rest of variables in the analysis are IV. SMC >.9 indicates multicolinearity. SMC = 1 is called singularity.
9
T-F: Practical issues 6. Factoriability of R. Some r >.3 Recommendation: Kaiser’s ratio: sum of squared correlations divided by sum of squared correlations plus sum of squared partial correlations. Partial correlations should be small, if they are 0 then K-ratio = 1. K-ratio >.6 is usually required for FA. 7. Outliers among variables Omit variables that have low squared multiple correlation with all other variables initially concidered for FA.
10
Credits This lecture is partially based on: Melvin Kohn and Kazimierz M. Slomczynski. Social Structure and Self-Direction. Blackwell 1993. IFiS Publishers 2007. Albright, Jeremy J., and Hun Myoung Park. 2009. Confirmatory Factor Analysis Using Amos, LISREL, Mplus, and SAS/STAT CALIS. Working Paper. The University Information Technology Services (UITS) Center for Statistical and Mathematical Computing, Indiana University. Bloomington, IN 47408.
12
Notation for Confirmatory Factor Analysis
13
ξ x λ ϕ δ It is common to display confirmatory factor models as path diagrams in which square represent observed variables and circles represent the latent variables. E.g.: Consider two latent variables ξ 1 and ξ 2 and six observed variables x 1 through x 6. Factor loadings are represented by λ ij. Covariance between ξ 1 and ξ 2 is ϕ. The δ i incorporate all the variance in x i which is not captured by the common factors.
15
Equation for X Latent variables are mean centered to have deviations from their means. Under this assumption the confirmatory factor model is summarized by the equation: X = Λ ξ + δ X is the vector of observed variables; Λ (lambda) is the matrix of factor loadings connecting the ξ i to the x i ; ξ is the vector of common factors, and δ is the vector of errors. The error terms have a mean of zero, E(δ) = 0, and common factors and errors are uncorrelated, E(ξδ’)=0.
16
Specific equation for x 1 to x 6 X 1 = λ 11 * ξ 1 + δ 1 X 2 = λ 21 * ξ 1 + δ 2 X 3 = λ 31 * ξ 1 + δ 3 X 4 = λ 42 * ξ 2 + δ 4 X 5 = λ 52 * ξ 2 + δ 5 X 6 = λ 62 * ξ 2 + δ 6
17
Similarities with regression Equation for x i is a linear function of one or more common factors plus an error term. There is no intercept since the variables are mean centered. The primary difference between these factor equations and regression analysis is that the ξi are unobserved in CFA. Consequently, estimation proceeds in a manner distinct from the conventional approach of regressing each x on the ξi.
18
Identification One essential step in CFA is determining whether the specified model is identified. If the number of the unknown parameters to be estimated is smaller than the number of pieces of information provided, the model is underidentified. E.g.: 10 = 2x + 3y is not identified (two unknowns but only one piece of information - one equation); a large number of values for x and y makes the equation true: x = -10, y = 10; x = -25, y = 20; x = -40, y = 30, etc. To make it just-identified, another independent equation should be provided; for example, adding 3 = x + y ends up with x=-1 and y=4.
19
Identification: Input information In CFA, a model is identified if all of the unknown parameters can be rewritten in terms of the variances and covariances of the x variables. In our case, a variance/covariance matrix for variables x 1 …x 6 is: σ 61 σ 62 σ 63 σ 64 σ 65 σ 66 σ 51 σ 52 σ 53 σ 54 σ 55 σ 41 σ 42 σ 43 σ 44 σ 31 σ 32 σ 33 σ 21 σ 22 σ 11 The number of input information is 6(6+1)/2 = 21
20
Degrees of freedom Generally the input information is computed as: p(p+1)/2, where p is the number of observed variables. Unknowns: ϕ 21, six λ ij, six δ i, and δ 63 Degrees of freedom are 21 (knowns) -14 (unknowns) = 7. CFA is over-identified.
21
Scale of latent variables Without introducing some constraints any confirmatory factor model is not identified. The problem lies in the fact that the latent variables are unobserved and hence their scales are unknown. To identify the model, it therefore becomes necessary to set the metric of the latent variables in some manner. The two most common constraints are to set either the variance of the latent variable or one of its factor loadings to one.
22
Basic estimation equation When the x variables are measured as deviations from their means it is easy to show that the sample covariance matrix for x, represented by S, can be decomposed as follows: Σ = Λ Φ Λ’ + Θ where Φ (phi) represents the covariance matrix of the ξ factors and Θ (theta) represents the covariance matrix of the unique factors δ). Estimation proceeds by finding the parameters Λ, Φ, and Θ so that predicted x covariance matrix Σ (sigma) is as close to the sample covariance matrix S as possible.
23
Estimation Several different fitting functions exist for determining the closeness of the implied covariance matrix to the sample covariance matrix, of which maximum likelihood is the most common. A full discussion of the topic in the context of CFA is available in Bollen (1989, chapter 7), including some necessary and sufficient conditions for identification.
24
ML estimation Maximum Likelihood Method. The method of maximum likelihood (the term first used by Fisher, 1922a) is a general method of estimating parameters of a population by values that maximize the likelihood (L) of a sample.
25
Fit statistics A goodness-of-fit tests evaluate the model in terms of the fixed parameters used to specify the model, and acceptance or rejection of the model in terms of the overidentifying conditions in the model. Basic assessment: Chi square/degrees of freedom ratio tests the hypothesis that the model is consistent with the pattern of covariation among the observed variables; smaller rather than larger values indicate a good fit. Goodness-of-fit index (GFI): a measure of the relative amount of variances and covariances jointly accounted for by the model; the closer the GFI is to 1.00, the better is the fit of the model to the data.
26
Comparison of unstandardized and standardized solutions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.