Unit 7a: Factor Analysis

Slides:



Advertisements
Similar presentations
Structural Equation Modeling
Advertisements

Unit 6a: Motivating Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 1
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Instrumental Variables Estimation and Two Stage Least Square
Lecture 7: Principal component analysis (PCA)
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Factor Analysis Ulf H. Olsson Professor of Statistics.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Factor Analysis Purpose of Factor Analysis
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
“Ghost Chasing”: Demystifying Latent Variables and SEM
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Factor Analysis Ulf H. Olsson Professor of Statistics.
Education 795 Class Notes Factor Analysis II Note set 7.
Correlational Designs
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Relationships Among Variables
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Factor Analysis Psy 524 Ainsworth.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Correlation and Regression
Unit 6b: Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6b – Slide 1
Understanding Statistics
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Factor Analysis ( 因素分析 ) Kaiping Grace Yao National Taiwan University
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Correlation & Regression Analysis
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Exploratory Factor Analysis Principal Component Analysis Chapter 17.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chapter 13.  Both Principle components analysis (PCA) and Exploratory factor analysis (EFA) are used to understand the underlying patterns in the data.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
ALISON BOWLING CONFIRMATORY FACTOR ANALYSIS. REVIEW OF EFA Exploratory Factor Analysis (EFA) Explores the data All measured variables are related to every.
Tutorial I: Missing Value Analysis
FACTOR ANALYSIS 1. What is Factor Analysis (FA)? Method of data reduction o take many variables and explain them with a few “factors” or “components”
Principal Component Analysis
Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
FACTOR ANALYSIS & SPSS. First, let’s check the reliability of the scale Go to Analyze, Scale and Reliability analysis.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Exploratory Factor Analysis
Advanced Statistical Methods: Continuous Variables
Structural Equation Modeling using MPlus
Factor analysis Advanced Quantitative Research Methods
CJT 765: Structural Equation Modeling
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Principal Component Analysis
Product moment correlation
Structural Equation Modeling
Presentation transcript:

Unit 7a: Factor Analysis http://xkcd.com/419/ © Andrew Ho, Harvard Graduate School of Education

Course Roadmap: Unit 7a From principal components to factor analysis From factor analysis to structural equation modeling Interpreting factor analytic results Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete-time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Today’s Topic Area Use Factor Analysis: EFA or CFA? © Andrew Ho, Harvard Graduate School of Education

Teacher “Professional Defensiveness” Dataset FOWLER.txt Overview In this study, which was her qualifying paper, Amy Fowler reports on the piloting and construct validation of a self-designed survey instrument to measure teachers’ professional defensiveness within the context of a professional relationship with a peer.  The dataset contains responses from teachers on this Teacher Professional Defensiveness Scale (TPDS), and on two other published scales – the Specific Interpersonal Trust Scale (SITS) and the Fear of Negative Evaluation Scale (FNES) (Robinson et al., 1991). Source Fowler, A.M. (2009).  Measuring Teacher Defensiveness:  An Instrument Validation Study. Unpublished Qualifying Paper, Harvard Graduate School of Education. Robinson, J. P., Shaver, P. R., Wrightsman, L. S., & Andrews, F. M. (1991).  Measures Of Personality And Social Psychological Attitudes.  San Diego: Academic Press. Additional Info The three instruments are described more fully in Fowler (2009), but briefly are: ·   Specific Interpersonal Trust Scale (SITS): The SITS scale assesses the level of trust one individual places in another. Participants choose a response that reflects their feelings towards a referent peer, on a five-point scale (see Metric below).  There are 8 items, 3 of which must be reverse coded when totaling the responses and then larger total scores represent a higher level of respondents’ trust in their selected peers. ·   Teacher Professional Defensiveness Scale (TPDS): The TPDS Scale was designed by the investigator to assess teacher sensitivity to peer feedback.  Respondents are directed to imagine their referent peer making a series of comments regarding the respondent’s teaching practice and are then asked to respond to these comments on a seven-point scale (see Metric below). The scale contains 21 items presented in random order; statement stems are alternately complimentary,neutral or critical. ·   Fear of Negative Evaluation Scale (FNES): The FNES scale assesses the degree of fear respondents have regarding negative evaluation from others.  Respondents are asked to assess how characteristic each statement is of their own feelings, on a five- point scale (see Metric below). There are 12 items, including 4 for which responses must be reverse coded when totaling the responses and then larger total scores indicate higher levels of fear towards negative evaluation. Sample size 411 Teachers Construct Validation: A New Instrument to Measure Teacher Professional Defensiveness Three Sub-Scales, each with 7 items: “Criticisms” “Compliments” “Neutral” Research Objective: Check that three sub-scales of TPDS are each separately uni- dimensional. Check that the constructs measured by each sub-scale are distinct from each other (but perhaps correlated). Check the relationship between the TPD sub-scales and other accepted measures (SITS, FNES). © Andrew Ho, Harvard Graduate School of Education

Read Each Item. Take the Test. Know your scale. Indicators of Teacher Professional Defensiveness: “Criticisms” Sub-Scale (We’ll look at Amy’s final 5 Items) Col # Var Name Variable Description Variable Metric/Labels 1 TID Teacher identification code? Integer 2 SID School identification code? Teacher Professional Defensiveness Scale (TPDS): 16 D1 I like the tone of your room; it is friendly but serious about school at the same time. 1 = High Criticism 2 = Criticism 3 = Mild Criticism 4 = Neither 5 = Mild Compliment 6 = Compliment 7 = High Compliment 17 D2 I think you underestimated student’s prior knowledge of today’s topic. 20 D5 I’m not clear how today’s lesson related to the curriculum standards. 21 D6 The examples you used to explain the main concept helped students to understand the big ideas. 25 D10 I’m not sure the task you had kids do required kids to achieve the objective you had on the board. 26 D11 It seems like you worry about whether or not the students like you. 28 D13 I can tell by the way you talk to the students that you believe they can learn this material. 29 D14 I wonder if the students learned the concept you wanted them to learn through that hands-on lesson. 31 D16 I wonder if students in the class understand your sarcasm in the same way you mean it. 33 D18 Students seem to follow the classroom rules. 36 D21 The way you started the class with students’ interests got them involved and attentive to the lesson. Indicators of Teacher Professional Defensiveness: “Compliments ” Sub-Scale (Final 4 Items) The assumption here is that some teachers consistently score higher (or lower) than others on these items, a possible measure of defensiveness or sensitivity to criticism from peers. © Andrew Ho, Harvard Graduate School of Education

Exploratory Data Analysis – Univariate Let’s focus first on the “Criticisms” subscale of Teacher Professional Defensiveness…  Variable Label D2 You underestimated students knowledge of today’s topic D5 Not clear how lesson related to curriculum standards. D10 Task didn’t help kids achieve your objective. D11 You like to worry about whether students like you. D16 Wonder if students understand your sarcasm. On average, teachers were more sensitive/defensive to certain items than others. This is common and usually a good thing (some items are more “difficult” than others). Teachers’ responses to the items on the Criticisms sub-scale have different levels of variability across items. This is common and raises the question of whether this variability is comparable. 1 = High Compliment 2 = Compliment 3 = Mild Compliment 4 = Neither 5 = Mild Criticism 6 = Criticism 7 = High Criticism © Andrew Ho, Harvard Graduate School of Education

Exploratory Data Analysis – Bivariate Let’s focus first on the “Criticisms” subscale of Teacher Professional Defensiveness…  Variable Label D2 You underestimated students knowledge of today’s topic D5 Not clear how lesson related to curriculum standards. D10 Task didn’t help kids achieve your objective. D11 You like to worry about whether students like you. D16 Wonder if students understand your sarcasm. Pearson Correlation Coefficients, N = 399 (Estimated under listwise deletion) D2 D5 D10 D11 D16 D2 1.0000 D5 0.3452 1.0000 D10 0.4411 0.5932 1.0000 D11 0.3212 0.7997 0.6338 1.0000 D16 0.4677 0.5575 0.7501 0.5721 1.0000 The sample bivariate correlations among the indicators on the “Criticisms” sub-scale are all positive, and the relationships do not appear to be obviously nonlinear. Could the subscale be a useful indicator of a single common construct? Could the subscale be unidimensional? © Andrew Ho, Harvard Graduate School of Education

Principal Components Analysis (PCA) PCA vs. Exploratory Factor Analysis, Graphically Research Question? Rather than asking … “Can We Forge These Several Indicators Together Into A Smaller Number Of Composites With Defined Statistical Properties?” Then, we would need … Principal Components Analysis (PCA) Path Model of Principal Components Analysis C1i C2i D*2i D*5i D*10i D*11i D*16i C3i C4i C5i C6i We could ask … “Are There A Number Of Unseen (Latent) Factors (Constructs) Acting “Beneath” These Indicators To Forge Their Observed Values?” Instead, we would need … Factor Analysis (CFA or EFA?) Path Model of Factor Analysis D2i D5i D10i D11i D16i 2i 5i 10i 11i 16i 1i 2i Remaining variation (𝜖) Remaining variation © Andrew Ho, Harvard Graduate School of Education

PCA vs. EFA, Formulaically Statistical Model For Principal Components Analysis … Given the X’s (𝐷s), pick the a’s, & compute the PC’s ; this is the “Eigenvalue” problem, solved!!! For Factor Analysis … Given the X’s (𝐷s), estimate the ’s, guess the ’s, compute the ’s ... Too many parameters! Unambiguous Set of Orthogonal Composites The answer is determined completely by the data As written, there is no unique solution But by setting a scale for the latent variables 𝜼 and constraining their interrelationships, we can arrive at useful and informative solutions. © Andrew Ho, Harvard Graduate School of Education

( ) “Exploratory” Factor Analysis 7 22  #of Methods of EFA …at least Because the problem is ill-specified, there has been an oversaturated literature of methods to operationalize the number of dimensions and arrive at “ideal” solutions. This is like arguing over the best way to visualize a univariate relationship: stem and leaf? histogram? dot-plot? Ways of Obtaining an “Initial” Factor Solution (of either the covariance or the correlation matrix): Alpha factor analysis, Harris component analysis, Image component analysis, ML factor analysis, Principal axis factoring, Pattern, specified by user, Prinit factor analysis, Unweighted least-squares factor analysis. Ways Of Obtaining Initial Estimates Of The Measurement Error Variances: Absolute SMC, Input from external file, Maximum absolute correlation, Set to One, Set to Random, SMC. Ways Of Rotating To A Final Factor Solution : None, Biquartimax Equamax, Orthogonal Crawford-Ferguson, Generalized Crawford-Ferguson, Orthomax, Parsimax, Quartimax, Varimax. Biquartimin, Covarimin, Harris-Kaiser Ortho-Oblique, Oblique Biquartimax, Oblique Equamax, Oblique Crawford-Ferguson, Oblique Generalized Crawford-Ferguson, Oblimin, Oblique Quartimax, Oblique Varimax Procrustes, Promax, Quartimin, … etc. *----------------------------------------------------------------------- * Exploratory factor analysis of TPD "Criticisms" sub-scale, on its own factor D2 D5 D10 D11 D16, pf factor D2 D5 D10 D11 D16, pcf factor D2 D5 D10 D11 D16, ipf factor D2 D5 D10 D11 D16, ml 7 ( 22  ) #of Methods of EFA …at least = 2 6  1848 = © Andrew Ho, Harvard Graduate School of Education

Towards Structural Equation Modeling EFA is a specific and unstructured special case in a broader modeling framework known as “structural equation modeling” (SEM).Often more “confirmatory” than “exploratory,” incorporating both “measurement” components (multiple indicators of a latent construct) and “structural” components (prediction for latent variables). Hypothesized Uni-Dimensional Factor Structure for the “Criticisms” Sub-Scale… D2i D5i D10i D11i D16i 2i 5i 10i 11i 16i 1i Hypothesis: Indicators of the “Criticisms” subscale have a “unidimensional” factor structure in the population. We can contrast this with the equation for the first principal component: 𝐶 1𝑖 = 𝑎 1,2 𝐷 2𝑖 + 𝑎 1,5 𝐷 5𝑖 +…+ 𝑎 1,16 𝐷 16𝑖 In unidimensional FA, each indicator is predicted by a single factor. In single-component PCA, the first principal component is the weighted sum of all indicators. *-------------------------------------------------------------------------- * Unidimensional factor analysis of TPD "Criticisms" sub-scale, with standardized variables. Compares the "factor" and "sem" commands. * The principal factor method (a classic exploratory technique) factor D2 D5 D10 D11 D16, factor(1) * Maximum likelihood factor D2 D5 D10 D11 D16, ml factor(1) * Single Factor EFA sem (ETA1 -> D2 D5 D10 D11 D16), nocapslatent latent(ETA1) standardized See Unit7a.do © Andrew Ho, Harvard Graduate School of Education

Structural Equation Modeling: The Data More than conventional analyses, SEM techniques operationalize “the data” not as values for individual observations but as the elements of the covariance matrix. The Sample Correlation Matrix The Sample Covariance Matrix In structural equation modeling, the “Goodness of Fit” of the model is most often represented by how well the model’s “implied covariance matrix” reproduces the sample covariance matrix. Recall that a loose operational definition of the “degrees of freedom” is the number of observations minus the number of parameters estimated in the model. In the SEM world, the number of observations is 15, the number of unique elements in the variance covariance matrix. For a number of variables, 𝐾, the number of “observations” is 𝐾 𝐾+1 2 . © Andrew Ho, Harvard Graduate School of Education

The “factor” command does this automatically. Unidimensional EFA: The “Principal Factor” Extraction Method The “principal factor” method for “extracting factors” simply replaces the 1’s in the correlation matrix (the correlation of a variable with itself) with the R-squared value from a regression of that variable on all other variables (This is an extremely ad hoc estimate of the reliability of that variable). Then, we run a PCA as usual! Hypothesized Uni-Dimensional Factor Structure for the “Criticisms” Sub-Scale… D2i D5i D10i D11i D16i 2i 5i 10i 11i 16i 1i We can model error variance after running PCA on the adjusted correlation matrix, where the diagonals are the R-squared values of a regressions of each variable on all the remaining variables. For D2, this R-squared value is .2423. We substitute that for the 1 in the correlation matrix, for each variable, and run a PCA on the resulting matrix. The “factor” command does this automatically. Familiar PCA results. Negative eigenvalues because the matrix is not well formed due to its ad hoc construction, but we only keep one factor anyways. © Andrew Ho, Harvard Graduate School of Education

Unidimensional EFA: The “Principal Factor” Extraction Method The “factor loadings” are the estimated correlations between each variable and the respective factor. The “uniqueness” is the estimated error variance left unaccounted for by the factor structure. Hypothesized Uni-Dimensional Factor Structure for the “Criticisms” Sub-Scale… D2i D5i D10i D11i D16i 2i 5i 10i 11i 16i 1i Again, we can contrast this with the equation for the first principal component: 𝐶 1𝑖 = .33𝐷 2𝑖 +.47 𝐷 5𝑖 +…+.47 𝐷 16𝑖 In unidimensional FA, each indicator is predicted by a single factor. The unexplained variation is formally modeled as an error variance. In single-component PCA, the first principal component is the weighted sum of all indicators. The unexplained variation is informally represented by the remaining principal components. 𝐷 2 ∗ =.49 𝜂 1 𝐷 5 ∗ =.80 𝜂 1 𝐷 10 ∗ =.82 𝜂 1 𝐷 11 ∗ =.82 𝜂 1 𝐷 16 ∗ =.79 𝜂 1 © Andrew Ho, Harvard Graduate School of Education

Unidimensional EFA: The Maximum Likelihood Extraction Method By assuming multivariate normality (not always a realistic assumption), we gain the ability to conduct likelihood ratio tests. The reference point is the “perfectly fitting” saturated model, that reproduces the covariance matrix exactly. This is the best possible fit, with 0 degrees of freedom. This is the OPPOSITE of our conventional approach, which usually starts from bad fit! The independent model estimates the five indicator variances but no covariances. (15 observations minus 5 parameters = 10 df) Compared to the saturated model, the independent model effectively fixes all pairwise relationships (10 degrees of freedom). The null hypothesis is that this fits just as well as the saturated model. It does not. It is worse. The badness of fit of the independent model is rarely surprising but can serve as a reference. The unidimensional model estimates 10 parameters (5 error variances and 5 loadings), for 15 observations – 10 parameters = 5 degrees of freedom. The provided test shows that relative badness of fit is not due to chance and is worse in the population (not good news, but not uncommon). These chi-square tests are “badness of fit tests” where we want 𝑝>.05 Chi-square tests are seen as ultra-conservative in this literature. There is a ridiculous number of alternative fit statistics that will make you look better! estat gof, stats(all) and google your hearts out. Results differ slightly from the principal factor extraction method. Saturated Model Fitted Model Independent Model Fit Perfect Hypothesized Usually Bad Degrees of Freedom 15 obs – 15 parameters = 0 15 obs – 5errvar – 5load = 5 15 obs – 5 errvar/var = 10 𝜒 2 = 158.49 1079.76 © Andrew Ho, Harvard Graduate School of Education

The sem Command in Stata 12+ (Graphical User Interface) EFA is a special case of a universe of SEM models that can be specified either graphically or on the command line in Stata: sem (ETA1 -> D2 D5 D10 D11 D16), nocapslatent latent(ETA1) standardized These are the “constants” in the latent regression equation and are often ignored. Because of the “standardized” option, the variance of the latent variable and all indicators have been set to 1. The scale of the latent variable MUST be set somehow. This is one way to do it (but not the only way). This is sometimes called the “unit variance constraint” or “unit variance identification.” We also often see “unit loading identification,” where the variance of the latent variable is estimated, but a single loading is set to 1. This choice between UVI and ULI does not change model fit. Estimated error variance (.78). Due to standardization, this can be interpreted as the proportion of variance in D2 left unexplained by the factor structure. Called “uniqueness” in the factor analytic literature and output. The “factor loading,” in this case, standardized, the estimated correlation between the factor and the observed variable. Two individuals that differ by one unit on the unobserved factor differ by 0.47 standard deviation units on D2, 𝐷 2 ∗ =.47 𝜂 1 © Andrew Ho, Harvard Graduate School of Education

Interpreting output directly from the sem command EFA is a special case of a universe of SEM models that can be specified either graphically or on the command line in Stata: sem (ETA1 -> D2 D5 D10 D11 D16), nocapslatent latent(ETA1) standardized Because we capitalize our variables (not conventional), we have to use this nocapslatent option. Otherwise, Stata assumes capitalized variables are latent variables. The standardized command reports standardized output and is often used for reporting when scales are not meaningful. Note, however, that the model is fit to the covariance matrix, and then values are standardized. Endogenous – On the receiving end of the arrows, the outcome variables being “impacted.” Exogenous – At the start of the arrows, the predictor variables “having an impact.” Latent in this case. All the loadings, error variances, (and constants) seen in the previous path diagram. The same chi-square “badness of fit” test from the previous slide. Degrees of freedom equal to 15 observations in the covariance matrix 𝐾 𝐾+1 2 , subtracting the 10 parameters estimated (5 loadings, 5 error variances), makes a 𝜒 2 5 , five df. Rejection means a significantly worse fit than the saturated (perfectly fitting) model. The unidimensional model implies a covariance (in this case, correlation) matrix whose fit is significantly worse than the saturated model.. © Andrew Ho, Harvard Graduate School of Education

Building Models with SEM Model 1: Maybe there are two CORRELATED factors underlying item responses on teacher defensiveness scale. Model 2: Maybe there are two correlated factors that have a “simple structure,” where one factor explains responses on items where teachers take criticism, and one factor explains responses on items where teachers take compliments. Model 3: If the simple structure model holds, how well can we predict your latent “ability to hear compliments” from your latent “ability to hear criticism? © Andrew Ho, Harvard Graduate School of Education

An SEM Progression: Step 1, Correlated Factors, Fully Crossed Loadings sem (ETA1 ETA2 -> D2 D5 D10 D11 D16 D6 D13 D18 D21), /// nocapslatent latent(ETA1 ETA2) standardized 9 10 2 =45 observations, 45 observations – (18 loadings + 9 error variances + 1 correlation) = 17 degrees of freedom. The model does not fit as well as the saturated model, in the population. © Andrew Ho, Harvard Graduate School of Education

An SEM Progression: Step 2, “Simple Structure” Confirmatory FA sem (ETA1 -> D2 D5 D10 D11 D16) (ETA2 -> D6 D13 D18 D21), /// nocapslatent latent(ETA1 ETA2) standardized 9 10 2 =45 observations, 45 observations – (9 loadings + 9 error variances + 1 correlation) = 26 degrees of freedom. The model does not fit as well as the saturated model, in the population. The fit is also significantly worse than the fully crossed model 𝜒 2 8 =413.95. But we will continue as an illustration, or assuming other fit statistics are agreeable: estat gof, stats(all) (they aren’t agreeable, but let’s continue): © Andrew Ho, Harvard Graduate School of Education

An SEM Progression: Step 3, Latent Variable Regression sem (ETA1 -> D2 D5 D10 D11 D16) (ETA2 -> D6 D13 D18 D21) /// (ETA1 -> ETA2), nocapslatent latent(ETA1 ETA2) standardized 9 10 2 =45 observations, 45 observations – (9 loadings + 9 error variances + 1 regression coefficient) = 26 degrees of freedom. The fit is equivalent to the last model and does not fit as well as the saturated model, in the population. I encourage you to see fit as comparative, not absolute, and do not play the “SEM fit game,” where we add and remove loadings, latent factors, and error covariances until we find a model that fits. © Andrew Ho, Harvard Graduate School of Education

Comparing correlations between observed and latent variables 𝑋 1 ∗ 𝑋 2 ∗ -.5891 Correlations between observed variables will always be “attenuated,” smaller in magnitude and closer to 0. The sem command estimates the correlation between the latent variables and takes the lack of reliability of the indicators into account. This “disattenuates” the correlation, attempting to reinflate the observed correlation to account for the measurement error as informed by the intercorrelations between the indicators. © Andrew Ho, Harvard Graduate School of Education

Comparing correlations between observed and latent variables 𝑋 1 ∗ 𝑋 2 ∗ -.5891 Simple regression coefficients between observed variables will always be “attenuated,” smaller in magnitude and closer to 0. In multiple regression, the bias due to measurement error will be unpredictable! The sem command estimates the regression coefficients between the latent variables and takes the lack of reliability of the indicators into account. This corrects the regression coefficients, attempting to account for the measurement error as informed by the intercorrelations between the indicators. This seems important, no? © Andrew Ho, Harvard Graduate School of Education