General Structural Equations (LISREL)

Slides:



Advertisements
Similar presentations
Writing up results from Structural Equation Models
Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.
Treatment of missing values
General Structural Equation (LISREL) Models
The Multiple Regression Model.
Path Analysis SAS/Calis. Read in the Data options formdlim='-' nodate pagno=min; TITLE 'Path Analysis, Ingram Data' ; data Ingram(type=corr); INPUT _TYPE_.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Adapting to missing data
Missing Data in Randomized Control Trials
Lecture 6: Multiple Regression
A Simple Guide to Using SPSS© for Windows
How to deal with missing data: INTRODUCTION
Today Concepts underlying inferential statistics
LECTURE 16 STRUCTURAL EQUATION MODELING.
Psych 524 Andrew Ainsworth Data Screening 2. Transformation allows for the correction of non-normality caused by skewness, kurtosis, or other problems.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Structural Equation Modeling Continued: Lecture 2 Psy 524 Ainsworth.
Screening the Data Tedious but essential!.
Objectives of Multiple Regression
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Inference for regression - Simple linear regression
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Path Analysis. Remember What Multiple Regression Tells Us How each individual IV is related to the DV The total amount of variance explained in a DV Multiple.
Regression Analyses. Multiple IVs Single DV (continuous) Generalization of simple linear regression Y’ = b 0 + b 1 X 1 + b 2 X 2 + b 3 X 3...b k X k Where.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Estimation Kline Chapter 7 (skip , appendices)
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
BUSI 6480 Lecture 8 Repeated Measures.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Right Hand Side (Independent) Variables Ciaran S. Phibbs.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Estimation Kline Chapter 7 (skip , appendices)
Tutorial I: Missing Value Analysis
Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Two-Group Discriminant Function Analysis. Overview You wish to predict group membership. There are only two groups. Your predictor variables are continuous.
Assumptions of Multiple Regression 1. Form of Relationship: –linear vs nonlinear –Main effects vs interaction effects 2. All relevant variables present.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
1 General Structural Equation (LISREL) Models Week 4 #1 Non-normal data: summary of approaches Missing data approaches: summary, review and computer examples.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Stats Methods at IC Lecture 3: Regression.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Missing data: Why you should care about it and what to do about it
BINARY LOGISTIC REGRESSION
LINEAR REGRESSION 1.
Inference for Least Squares Lines
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Survey Data Analysis
6-1 Introduction To Empirical Models
Exercise 1 Use Transform  Compute variable to calculate weight lost by each person Calculate the overall mean weight lost Calculate the means and standard.
Clinical prediction models
Presentation transcript:

General Structural Equations (LISREL) Week 3 #5 Missing Data

Missing Data Old approaches: LISTWISE DELETION Default in PRELIS when forming covariance matrix Not the default in AMOS: Must “manually” listwise delete variables if this is desired

Missing Data LISTWISE DELETION Not the default in AMOS: Must “manually” listwise delete variables if this is desired SPSS: COMPUTE NMISS=0. RECODE V1 TO V99 (missing,sysmis=-999). DO REPEAT XX = V1 TO V99. IF (XX EQ -999) NMISS=NMISS+1. END REPEAT. SELECT IF (NMISS EQ 0). SAVE OUTFILE = …. [specifications]

Missing Data What are the major issues with listwise deletion? Why would we want to use LISTWISE DELETION with AMOS when it offers “high tech” solutions (FIML estimation) as its default? better model diagnostics available when FIML not invoked What are the major issues with listwise deletion? Inefficient (loses cases) Bias if pattern of “missingness” is not MAR (prob(Y)miss unrelated to Y after controls for X) [under certain assumptions, could be consistent]

Missing Data Pairwise deletion Some debate on its appropriateness at all. Better when correlations are low; worse when correlations are high Problem of determining the appropriate N (“minimum pairwise”?) Can produce non positive-definite matrices

Missing Data Pairwise deletion An option in PRELIS (check box) Most stats packages will produce pairwise deleted covariances: SPSS: /MISSING=PAIRWISE SAS: default in PROC CORR [PROC CORR COV; VAR ….{var list}] For AMOS, pass “matrix file” from SPSS instead of regular SPSS data file

Missing Data Two terrible approaches: Mean substitution (an option in some SPSS procedures; easy to implement manually in all stats packages) Deflates variances Deflates covariances (cov(a,X) = 0 Converts normal distribution into distribution with “spike”

Missing Data Two terrible approaches: Regression prediction X1 X2 X3 . 3 5  X-hat = b0 + b1X1 + b2X2 2 8 9 6 12 89 etc. Problem: R-square X1 with X2,X3 is perfect for missing cases (no error term) - inflates covariances VAR(X1) = b12 Var(X2) + var(e) [var(e) omitted]

Missing Data Regression prediction: ANOTHER APPROACH: REWEIGHTING Not quite so bad if prediction from “outside the model” (but then, one must argue “predictors” are irrelevant to the model) ANOTHER APPROACH: REWEIGHTING Providing series of weights so distribution more closely represents full sample

EM Algorithm Expectation/maximization X1 X2 X3 X4 . 18 22 15 . 23 19 8 . 18 22 15 . 23 19 8 . 12 12 . 12 . 16 4 23 1 2 4 38 16 12 5 . 22 . 5

X1 X2 X3 X4 . 18 22 15 . 23 19 8 . 12 12 . 12 . 16 4 23 1 2 4 38 16 12 5 . 22 . 5 We want: var-covariance matrix that is based on “complete” dataset Σ (cov) z (means) E-STEP: get start values for S, z - listwise or pairwise OK Compute regression coefficients for all needed subsets Cases 1 & 2: X1 = b0 + b1X2 + b2 X3 + b3 X4 Base the calculation of these coefficients on all cases for which data on X1, X2, X3 and X4 are available Case 3 X1 = b0 + b1 X2 + b2 X3 Base calculation of these coefficients on all cases for which data on X1, x2 and X3 are available Also: X4 = b0 + b1 X2 + b2 X3 Case 7 X1 = b0 + b1 X2 + b2 X4

Imputed cases X1 X2 X3 X4 x* 18 22 15 X* 23 19 8 *=hat (predicted) M-STEP: Re-Calculate means, covariances Means: usual formula Variances: add in residual VAR(X1) = b2 var(X2) + VAR(e1) ^^^^ add in Use new z, Σ to re-calculate imputations Continue E/M steps until convergence

EM Algorithm Advantages: Full information estimation Also imputes cases Can estimate asymptotic covariance matrix for ADF estimation Disadvantages: Assumes normal distribution (could transform data first) Assumes continuous distribution When input into other programs, standard errors biased, usually downward

EM Algorithm: Implementation PRELIS will use the EM algorithm to construct a “corrected” covariance matrix (and/or mean vector). Syntax: EM IT=200 (200 iterations) PRELIS SYNTAX: (title) SY='E:\Classes\ICPSR2004\Week3Examples\RelSexData\File1.PSF' EM CC = 0.00001 IT = 200 TC = 2  NEW LINE OU MA=CM XT XM

EM Algorithm: Implementation Interactive PRELIS: Statistics  Multiple Imputation (check box at bottom: All values missing – probably better to select “delete cases”

EM Algorithm: Implementation Small issue: - if you want to select variables or cases (as opposed to constructing a covariance matrix from the entire file, you cannot exit the dialogue box with the “select” commands intact (must either run or cancel). Worse: If you put case selection commands in PRELIS syntax, these are ignored if there is imputation!

EM Algorithm: Implementation Issue: - if you want to select variables or cases (as opposed to constructing a covariance matrix from the entire file, you cannot exit the dialogue box with the “select” commands intact (must either run or cancel). [information on whether this is an issue with verison 8.7 not presently available] Moreover, case selection specifications will not work if imputation is performed. Solution: select out the variables and cases you want with the stat package first. SPSS: select if (v2 eq 11). save outfile =‘e:\classes\ICPSR2004\Week3Examples\MissingData\RelSexUSA.sav' /keep=v9 v147 v175 v176 v304 to v310 v355 v356 sex.

EM Algorithm: Implementation Steps: In LISREL/PRELIS: File  Import external data in other formats Statistics  multiple imputation

EM Algorithm: Implementation Steps: In LISREL/PRELIS: File  Import external data in other formats Remember to define variables as continuous unless other option required: Variable type

EM Algorithm: Implementation Steps: In LISREL/PRELIS: File  Import external data in other formats Remember to define variables as continuous unless other option required: Variable type

EM Algorithm: Implementation Steps: In LISREL/PRELIS: File  Import external data in other formats Define variables Statistics  multiple imputation

EM Algorithm: Implementation Steps: In LISREL/PRELIS: File  Import external data in other formats Statistics  multiple imputation Select output options then specify location for matrices Return to mult. Imp. Menu, then RUN

EM Algorithm: Implementation ------------------------------- EM Algoritm for missing Data: Number of different missing-value patterns= 67 Convergence of EM-algorithm in 4 iterations -2 Ln(L) = 92782.24999 Percentage missing values= 2.25 Estimated Means BEFORE CORRECTION: Total Sample Size = 1839 Number of Missing Values 0 1 2 3 4 5 6 7 8 Number of Cases 1484 264 55 12 9 4 7 3 1 Listwise Deletion Total Effective Sample Size = 1484

EM Algorithm: Implementation Estimated Covariances EM estimation V9 V147 V175 V176 V304 V9 0.8074 V147 1.3026 6.5725 V175 0.3063 0.7525 0.4854 V176 -1.6002 -3.4273 -1.0734 6.8036 V304 0.3912 1.0714 0.2624 -1.3650 2.9791 Covariance Matrix No correction V9 V147 V175 V176 V304 V305 -------- -------- -------- -------- -------- -------- V9 0.813 V147 1.338 6.494 V175 0.310 0.755 0.476 V176 -1.626 -3.474 -1.088 6.721 V304 0.400 1.056 0.286 -1.496 2.895 V305 0.461 0.980 0.270 -1.436 1.336 3.509

Comparison -------- -------- v9 1.000 - - v147 2.212 - - (0.080) ETA 1 ETA 2 -------- -------- v9 1.000 - - v147 2.212 - - (0.080) 27.649 v175 0.657 - - (0.024) 27.581 v176 -3.262 - - (0.098) -33.348 v304 - - 1.000 v305 - - 1.062 (0.060) 17.763 v307 - - 2.335 (0.122) 19.136 v308 - - 1.736 (0.094) 18.564 ETA 1 ETA 2 -------- -------- v9 1.000 - - v147 2.195 - - (0.086) 25.504 v175 0.657 - - (0.026) 25.353 v176 -3.305 - - (0.107) -31.011 v304 - - 1.000 v305 - - 1.046 (0.063) 16.601 v307 - - 2.185 (0.119) 18.309 v308 - - 1.646 (0.092) 17.865

GAMMA EM GAMMA Regular v355 v356 sex -------- -------- -------- -------- -------- -------- ETA 1 -0.006 0.035 0.177 (0.001) (0.009) (0.038) -4.962 3.959 4.594 ETA 2 -0.008 0.096 0.011 (0.002) (0.013) (0.051) -5.376 7.602 0.209 GAMMA EM v355 v356 sex -------- -------- -------- ETA 1 -0.006 0.034 0.175 (0.001) (0.009) (0.035) -5.582 3.966 5.042 ETA 2 -0.007 0.100 -0.021 (0.001) (0.012) (0.043) -5.555 8.610 -0.477

EM alorithm: SAS implementation PROC MI Form: PROC MI DATA=file OUT=file2; EM OUTEM=file3; PROC CALIS DATA=file3 COV MOD; Lineqs [regular SAS calis specification]

Hot deck /nearest neighbor PRELIS ONLY X1 X2 X3 X4 . 2 8 16 . 3 1 9 2 8 29 32 1 5 6 13 2 9 2 3 6 . 1 4 1st case, look for closest case which does not have missing values for X2, X3, X4 : 1 5 6 13  impute from this case (4th) to missing value  hence X1 for case 1 will be 1

Nearest neighbor Matching variables: more accurate if all cases non-missing Worst case: no non-missing match Special problem arises if small # of discrete values (next slide 

Variables have small # of discrete values X1 X2 X3 X4 . 1 2 4 2 2 1 5 3 1 2 5  Ties! 1 2 2 4  5 2 2 4  0 1 3 4  Impute with the average values across the ties BUT….

Variables have small # of discrete values X1 X2 X3 X4 . 1 2 4 2 2 1 5 3 1 2 5  Ties! 1 2 2 4  5 2 2 4  0 1 3 4  Impute with the average values across the ties BUT…. WHAT if the std. deviation of the imputed values is not less than the overall standard deviation for X1? Then, the imputation almost reduces to imputed = mean of X1 The “variance ratio” : In PRELIS, imputation “fails” if the variance ratio is too large (usually .5, .7, can be adjusted)

Nearest neighbor Advantages: Get “full” data set to work with May be superior for non-normal data Disadvantages Deflated standard errors (imputed values treated as “real” by estimating software) “Failure to impute” a large proportion of missing values is a common outcome with survey data

Multiple Group Approach Allison Soc. Methods&Res. 1987 Bollen, p. 374 (uses old LISREL matrix notation)

Multiple Group Approach Note: 13 elements of matrix have “pseudo” values - 13 df

Multiple group approach Disadvantage: - Works only with a relatively small number of missing patterns

FIML (also referred to as “direct ML”) Available in AMOS and in LISREL AMOS implementation fairly easy to use (check off means and intercepts, input data with missing cases and … voila!) LISREL implementation a bit more difficult: must input raw data from PRELIS into LISREL

FIML

FIML

FIML

(end)