Jeremy Yorgason Brigham Young University

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Continued Psy 524 Ainsworth
Writing up results from Structural Equation Models
Multiple Regression and Model Building
1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
SEM PURPOSE Model phenomena from observed or theoretical stances
General Structural Equations Week 2 #5 Different forms of constraints Introduction for models estimated in multiple groups.
Structural Equation Modeling Using Mplus Chongming Yang Research Support Center FHSS College.
Structural Equation Modeling
Brief introduction on Logistic Regression
Confirmatory factor analysis GHQ 12. From Shevlin/Adamson 2005:
Structural Equation Modeling Mgmt 291 Lecture 8 – Model Diagnostics And Model Validation Nov. 16, 2009.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Models with Discrete Dependent Variables
Structural Equation Modeling
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
The Simple Regression Model
“Ghost Chasing”: Demystifying Latent Variables and SEM
Structural Equation Modeling
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Relationships Among Variables
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Structural Equation Modeling 3 Psy 524 Andrew Ainsworth.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Estimation Kline Chapter 7 (skip , appendices)
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Multigroup Models Byrne Chapter 7 Brown Chapter 7.
Advanced Statistics Factor Analysis, II. Last lecture 1. What causes what, ξ → Xs, Xs→ ξ ? 2. Do we explore the relation of Xs to ξs, or do we test (try.
Latent Growth Modeling Byrne Chapter 11. Latent Growth Modeling Measuring change over repeated time measurements – Gives you more information than a repeated.
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
CFA: Basics Beaujean Chapter 3. Other readings Kline 9 – a good reference, but lumps this entire section into one chapter.
Estimating and Testing Hypotheses about Means James G. Anderson, Ph.D. Purdue University.
SEM: Basics Byrne Chapter 1 Tabachnick SEM
MTMM Byrne Chapter 10. MTMM Multi-trait multi-method – Traits – the latent factors you are trying to measure – Methods – the way you are measuring the.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Estimation Kline Chapter 7 (skip , appendices)
ALISON BOWLING CONFIRMATORY FACTOR ANALYSIS. REVIEW OF EFA Exploratory Factor Analysis (EFA) Explores the data All measured variables are related to every.
CFA Model Revision Byrne Chapter 4 Brown Chapter 5.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 13 Understanding research results: statistical inference.
The SweSAT Vocabulary (word): understanding of words and concepts. Data Sufficiency (ds): numerical reasoning ability. Reading Comprehension (read): Swedish.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Introduction to Regression Analysis
Correlation, Regression & Nested Models
Multiple Regression Analysis and Model Building
CJT 765: Structural Equation Modeling
Chapter 25 Comparing Counts.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Structural Equation Modeling
Chapter 26 Comparing Counts.
Inferential Statistics
Chapter 26 Comparing Counts.
Causal Relationships with measurement error in the data
Testing Causal Hypotheses
Structural Equation Modeling
Presentation transcript:

Jeremy Yorgason Brigham Young University Troubleshooting problems with SEM models that have “Heywood” cases such as negative variance parameters and non-positive definite covariance matrices Jeremy Yorgason Brigham Young University

Introduction In SEM, it is fairly common to encounter Improper Solutions Non-positive definite covariance matrices Models with negative variance terms Negative PSI matrix Correlation or other standardized values > 1 Model is not identified, you need “x” number of constraints for it to be identified Why is this important? Results from models that have this problem cannot be trusted, and shouldn’t be reported in journal articles. Standard errors of estimates may be affected (Chen et al., 2001) Error messages are Diagnostic tools It’s a good idea to confirm the diagnosis that the computer system is giving you The point is to understand what may be going on with your model/data This often requires that you look at all of the output for your model I’ve probably encountered between 15 and 20 such situations in the past year

Goals for this Segment of the Workshop How do I recognize the problem? How do I fix the problem? Examples

Causes of Improper Solutions in SEM

Causes of Improper Solutions in SEM 1. Specification error in the model Missing a “1” on one of the factor loadings of a latent variable, or on an error term Correlations of variables or errors from IV to DV of a model Excessive error correlations on indicators of a single latent variable Very low factor loadings on a latent variable Omitted paths that should be in a model 2. Model under-identified (negative degrees of freedom) A. V(V+1)/2 minus parms (if estimating means/intercepts use V(V+3)/2) 3. Non-convergence 4. Outliers in the data 5. Too small of sample for the model being estimated Kline, 2011; Kolenikov & Bollen, 2012; Chen et al., 2001; Newsome, 2012

Causes of Improper Solutions in SEM 6. Missing data 7. “Sampling fluctuations” 8. Two indicator latent variables This includes 2nd order latent variables 9. Non-normally distributed outcome or indicator variables in your model Categorical Count, zero-inflated, etc. 10. Empirical under-identification “Positive degrees of freedom, but there is insufficient covariance information in a portion of the model for the computer to generate valid estimates” (Newsome, 2012) May be caused by some of the above issues Kline, 2011; Kolenikov & Bollen, 2012; Chen et al., 2001; Newsome, 2012

Signs that there is a problem Amos: “XX: Default Model” “The following variances are negative.” “This solution is not admissible” “The model is probably unidentified. In order to achieve identifiability, it will probably be necessary to impose 1 additional constraint.” In place of estimates in the Amos output you see “unidentified”

Signs that there is a problem Mplus: THE MODEL ESTIMATION TERMINATED NORMALLY THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS -0.762D-17. PROBLEM INVOLVING PARAMETER 59. MODIFICATION INDICES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED.

More signs that there is a problem Mplus or other programs: Negative variance estimate (remember Variance = stddev2) Find it in your output Correlations above 1 (remember, can’t be larger than 1) Error variance that is really BIG (999 usually indicates a problem in Mplus, although this is ok if something is “constrained” to be a certain number)

How can I fix these problems? Look at a diagram of your model and see if you have miss-specified your model. Check your syntax (e.g., look for missing semi-colons in Mplus) Missing “1” for a factor loading on latent variables Missing “1” on regression path of error term Sometimes Amos creates “GHOST” variables. You can’t see them, but they are there! Sometimes off the screen, sometimes really, really, really, small Sometimes Amos will “double correlate variables” Any correlations across IV/DV lines? Careful, this is something your “modification indices” will suggest to improve model fit. However, don’t ever add parameters that go against theory Make sure you have appropriate regression paths in the model (not too few, in this case) Make sure your measurement model is appropriate Factor loadings > .40 Error correlations – start with none, correlation between items is captured in the latent variables. Typically you’ll use mod indices here

How can I fix these problems? 2. Researchers need to be attentive to model problems when there are latent variables with only 2 indicators (can be unstable) A. Newsom (2012) suggests constraining the two factor loadings to be equal… 3. Caution is also warranted when estimating “higher order” latent variables with only two factors, and certain complex models (e.g., common fate models) that require specific constraints in order for the model to be identified

How can I fix these problems? 4. Either use a large sample , OR check the sample size and compare with the number of parameters being estimated. N/q rule (n = sample, q = parameters in the model; Kline, 2011) Count variances, covariances, and means OR Most programs tell you how many parms are in your model. Amos: Quick Check: 10 people in the sample for every observed (rectangle) variable in the model Number of distinct sample moments: 77 Number of distinct parameters to be estimated: 44 Degrees of freedom (77 - 44): 33

How can I fix these problems? 5. If your model looks to be specified correctly, but you still have a problem with the model, it’s time to start looking at your data Run a frequency on all variables in the model, to see if there is some data entry error or outliers that could be inflating the variance of one or more variables Side note: sometimes SEM models have trouble with variables that have very different (larger or smaller) variance values than the rest of the variables in your model (e.g., income in dollars) If this is the case, you will want to rescale or transform these variables to ensure similar variances Also, in the transfer of data from one program to another, sometimes columns of data are shifted or otherwise corrupted

How can I fix these problems? 6. Do you have any categorical or non-normally distributed dependent variables that are specified as continuous? Amos doesn’t handle dichotomous, count, or zero inflated outcomes Mplus does handle them well, but you have to specify in the syntax that you are working with such distributions You may have specified non-normal variable distributions, but you have small cell sizes (e.g., ordered categorical variable with only 1 or 2 cases on one end of the distribution)

How can I fix these problems? 7. If your model does not “converge” it means that the program went through X number of iterations, but could not find a suitable solution. You can increase iterations from the default number to try to estimate your model. If this doesn’t work, you probably need to change your model or you have a data problem.

Atypical Solutions: Start Values and Iterations 8. A start value is a number assigned to each estimated parameter when “iterations” begin for a model. Amos and Mplus automatically create start values for each parameter to be estimated, yet it is possible to assign start values if the program assigned ones don’t work. Researchers can provide start values for a model, which are essentially any known parameter estimates (e.g., regression weight or coefficient). You can get these by running simple linear regression with the variables in your model, and then plug in the coefficient from the simpler model. How in the world would I know if I have bad start values??? How would I know what variable to look at that might be non-normally distributed? Or be categorical and have small cell sizes?

Greek Alphabet and Mplus Output Nu (Ν/ν)= intercepts or means of observed variables Lambda (Λ/λ)= Factor Loadings Theta (Θ/θ)= error variances and covariances Alpha (Α/α)= means and intercepts of latent variables Beta (Β/β) and Gamma (Γ/γ) = regression coefficients Psi (Ψ/ψ)= residual variances and covariances of continuous latent variables Tau (Τ/τ) = thresholds of categorical observed variables Delta (Δ/δ) = scaling information for observed dependent variables Etc. – see Mplus manual Ask for Tech1 in the output and then when Mplus says there is a problem with, for example, parameter #16, go and find that parameter and see which matrix it is in and then identify the variable and go look at the model/data to see where the problem is. If no variable is identified, need to go back to Model Specification. CAUTION: Specific Parameter warnings are usually a DECOY! They generally are simply letting you know the model is not correctly specified, and no matter what you do to the identified variable it will not make your model work.

Examples: “Message of Death!” From a class assignment with a model involving 56 cases. Mplus error: THE MODEL ESTIMATION TERMINATED NORMALLY THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES MAY NOT BE TRUSTWORTHY FOR SOME PARAMETERS DUE TO A NON-POSITIVE DEFINITE FIRST-ORDER DERIVATIVE PRODUCT MATRIX. THIS MAY BE DUE TO THE STARTING VALUES BUT MAY ALSO BE AN INDICATION OF MODEL NONIDENTIFICATION. THE CONDITION NUMBER IS 0.383D-13. PROBLEM INVOLVING PARAMETER 31. THIS IS MOST LIKELY DUE TO HAVING MORE PARAMETERS THAN THE SAMPLE SIZE IN ONE OF THE GROUPS. WARNING: THE RESIDUAL COVARIANCE MATRIX (THETA) IN GROUP GRAD IS NOT POSITIVE DEFINITE. THIS COULD INDICATE A NEGATIVE VARIANCE/RESIDUAL VARIANCE FOR AN OBSERVED VARIABLE, A CORRELATION GREATER OR EQUAL TO ONE BETWEEN TWO OBSERVED VARIABLES, OR A LINEAR DEPENDENCY AMONG MORE THAN TWO OBSERVED VARIABLES. CHECK THE RESULTS SECTION FOR MORE INFORMATION. PROBLEM INVOLVING VARIABLE DE4.

Atypical Solutions: Sampling Fluctuations Model is specified correctly Don’t have outliers in your data, and you have a large enough sample to estimate the model at hand Model is not identified, although you have positive degrees of freedom 9. Possible tests to confirm you have sampling fluctuations and not some other problem: Confidence interval from standard errors includes a zero Calculate a “z” by taking the ratio – Estimate: Standard Error, and then compare to a z distribution Wald test, take ratio – (Estimate:Standard Error)2 then compare to a chi-square distribution with 1 df Likelihood ratio test statistic Lagrangian multiplier (mod indices when var constrained to 0) Boostrap resampling method (esp. with non-normal data) Scaled chi-square difference test Signed root tests Empirical sandwich estimators

Atypical Solutions: Sampling Fluctuations Model is specified correctly Don’t have outliers in your data, and you have a large enough sample to estimate the model at hand Model is not identified, although you have positive degrees of freedom Fix the negative variance to 0 or to a small positive number This can affect model parameters

Handout Chen et al (2001) suggested decision tree 1. Is your model identified? 2. If so, do you have any negative error variances? 3. If so, do you have any outliers that are a problem? 4. If not, is the model empirically underidentified? 5. If not, do you have sampling fluctuations? 6. If so, constrain the negative variance to be 0, a small positive number, or to be the population variance Newsome (2012) prevention tips Careful specification Use larger samples Model factors with 3 or more indicators Use reliable measures (high loadings) Well conditioned data

Working Example See Amos Program Depending on time, manipulate an example to show what errors commonly occur, what the program tells you, and how to fix the problems

Conclusion Either…. OR Work with perfect data and perfect models Learn to interpret SEM error messages, and how to fix common problems

References Chen, F., Bollen, K. A., Paxton, P., Curran, P., & Kirby, J. (2001).Improper solutions in structural equation models: Causes, consequences, and strategies. Sociological Methods and Research, 29, 468-508. Kline, R. B. (2011). Principles and practices of structural equation modeling (3rd Ed). New York, NY: Guilford Press. Kolenikov, S. & Bollen, K. A. (2012). Testing negative error variances: Is a Heywood case a symptom of misspecification? Sociological Methods and Research, 41, 124-167.