Presentation is loading. Please wait.

Presentation is loading. Please wait.

General Structural Equations (LISREL)

Similar presentations


Presentation on theme: "General Structural Equations (LISREL)"— Presentation transcript:

1 General Structural Equations (LISREL)
Week 3 #5 Missing Data

2 Missing Data Old approaches: LISTWISE DELETION
Default in PRELIS when forming covariance matrix Not the default in AMOS: Must “manually” listwise delete variables if this is desired

3 Missing Data LISTWISE DELETION Not the default in AMOS:
Must “manually” listwise delete variables if this is desired SPSS: COMPUTE NMISS=0. RECODE V1 TO V99 (missing,sysmis=-999). DO REPEAT XX = V1 TO V99. IF (XX EQ -999) NMISS=NMISS+1. END REPEAT. SELECT IF (NMISS EQ 0). SAVE OUTFILE = …. [specifications]

4 Missing Data What are the major issues with listwise deletion?
Why would we want to use LISTWISE DELETION with AMOS when it offers “high tech” solutions (FIML estimation) as its default? better model diagnostics available when FIML not invoked What are the major issues with listwise deletion? Inefficient (loses cases) Bias if pattern of “missingness” is not MAR (prob(Y)miss unrelated to Y after controls for X) [under certain assumptions, could be consistent]

5 Missing Data Pairwise deletion
Some debate on its appropriateness at all. Better when correlations are low; worse when correlations are high Problem of determining the appropriate N (“minimum pairwise”?) Can produce non positive-definite matrices

6 Missing Data Pairwise deletion An option in PRELIS (check box)
Most stats packages will produce pairwise deleted covariances: SPSS: /MISSING=PAIRWISE SAS: default in PROC CORR [PROC CORR COV; VAR ….{var list}] For AMOS, pass “matrix file” from SPSS instead of regular SPSS data file

7 Missing Data Two terrible approaches:
Mean substitution (an option in some SPSS procedures; easy to implement manually in all stats packages) Deflates variances Deflates covariances (cov(a,X) = 0 Converts normal distribution into distribution with “spike”

8 Missing Data Two terrible approaches: Regression prediction X1 X2 X3
 X-hat = b0 + b1X1 + b2X2 etc. Problem: R-square X1 with X2,X3 is perfect for missing cases (no error term) - inflates covariances VAR(X1) = b12 Var(X2) + var(e) [var(e) omitted]

9 Missing Data Regression prediction: ANOTHER APPROACH: REWEIGHTING
Not quite so bad if prediction from “outside the model” (but then, one must argue “predictors” are irrelevant to the model) ANOTHER APPROACH: REWEIGHTING Providing series of weights so distribution more closely represents full sample

10 EM Algorithm Expectation/maximization X1 X2 X3 X4 . 18 22 15 . 23 19 8

11 X1 X2 X3 X4 We want: var-covariance matrix that is based on “complete” dataset Σ (cov) z (means) E-STEP: get start values for S, z - listwise or pairwise OK Compute regression coefficients for all needed subsets Cases 1 & 2: X1 = b0 + b1X2 + b2 X3 + b3 X4 Base the calculation of these coefficients on all cases for which data on X1, X2, X3 and X4 are available Case 3 X1 = b0 + b1 X2 + b2 X3 Base calculation of these coefficients on all cases for which data on X1, x2 and X3 are available Also: X4 = b0 + b1 X2 + b2 X3 Case 7 X1 = b0 + b1 X2 + b2 X4

12 Imputed cases X1 X2 X3 X4 x* 18 22 15 X* 23 19 8 *=hat (predicted)
M-STEP: Re-Calculate means, covariances Means: usual formula Variances: add in residual VAR(X1) = b2 var(X2) + VAR(e1) ^^^^ add in Use new z, Σ to re-calculate imputations Continue E/M steps until convergence

13 EM Algorithm Advantages: Full information estimation
Also imputes cases Can estimate asymptotic covariance matrix for ADF estimation Disadvantages: Assumes normal distribution (could transform data first) Assumes continuous distribution When input into other programs, standard errors biased, usually downward

14 EM Algorithm: Implementation
PRELIS will use the EM algorithm to construct a “corrected” covariance matrix (and/or mean vector). Syntax: EM IT=200 (200 iterations) PRELIS SYNTAX: (title) SY='E:\Classes\ICPSR2004\Week3Examples\RelSexData\File1.PSF' EM CC = IT = 200 TC = 2  NEW LINE OU MA=CM XT XM

15 EM Algorithm: Implementation
Interactive PRELIS: Statistics  Multiple Imputation (check box at bottom: All values missing – probably better to select “delete cases”

16 EM Algorithm: Implementation
Small issue: - if you want to select variables or cases (as opposed to constructing a covariance matrix from the entire file, you cannot exit the dialogue box with the “select” commands intact (must either run or cancel). Worse: If you put case selection commands in PRELIS syntax, these are ignored if there is imputation!

17 EM Algorithm: Implementation
Issue: - if you want to select variables or cases (as opposed to constructing a covariance matrix from the entire file, you cannot exit the dialogue box with the “select” commands intact (must either run or cancel). [information on whether this is an issue with verison 8.7 not presently available] Moreover, case selection specifications will not work if imputation is performed. Solution: select out the variables and cases you want with the stat package first. SPSS: select if (v2 eq 11). save outfile =‘e:\classes\ICPSR2004\Week3Examples\MissingData\RelSexUSA.sav' /keep=v9 v147 v175 v176 v304 to v310 v355 v356 sex.

18 EM Algorithm: Implementation
Steps: In LISREL/PRELIS: File  Import external data in other formats Statistics  multiple imputation

19 EM Algorithm: Implementation
Steps: In LISREL/PRELIS: File  Import external data in other formats Remember to define variables as continuous unless other option required: Variable type

20 EM Algorithm: Implementation
Steps: In LISREL/PRELIS: File  Import external data in other formats Remember to define variables as continuous unless other option required: Variable type

21 EM Algorithm: Implementation
Steps: In LISREL/PRELIS: File  Import external data in other formats Define variables Statistics  multiple imputation

22 EM Algorithm: Implementation
Steps: In LISREL/PRELIS: File  Import external data in other formats Statistics  multiple imputation Select output options then specify location for matrices Return to mult. Imp. Menu, then RUN

23 EM Algorithm: Implementation
EM Algoritm for missing Data: Number of different missing-value patterns= Convergence of EM-algorithm in iterations -2 Ln(L) = Percentage missing values= Estimated Means BEFORE CORRECTION: Total Sample Size = Number of Missing Values Number of Cases Listwise Deletion Total Effective Sample Size =

24 EM Algorithm: Implementation
Estimated Covariances EM estimation V V V V V304 V V V V V Covariance Matrix No correction V V V V V V305 V V V V V V

25 Comparison -------- -------- v9 1.000 - - v147 2.212 - - (0.080)
ETA ETA 2 v v (0.080) 27.649 v (0.024) 27.581 v (0.098) v v (0.060) 17.763 v (0.122) 19.136 v (0.094) 18.564 ETA ETA 2 v v (0.086) 25.504 v (0.026) 25.353 v (0.107) v v (0.063) 16.601 v (0.119) 18.309 v (0.092) 17.865

26 GAMMA EM GAMMA Regular v355 v356 sex -------- -------- --------
ETA (0.001) (0.009) (0.038) ETA (0.002) (0.013) (0.051) GAMMA EM v v sex ETA (0.001) (0.009) (0.035) ETA (0.001) (0.012) (0.043)

27 EM alorithm: SAS implementation
PROC MI Form: PROC MI DATA=file OUT=file2; EM OUTEM=file3; PROC CALIS DATA=file3 COV MOD; Lineqs [regular SAS calis specification]

28 Hot deck /nearest neighbor
PRELIS ONLY X1 X2 X3 X4 1st case, look for closest case which does not have missing values for X2, X3, X4 :  impute from this case (4th) to missing value  hence X1 for case 1 will be 1

29 Nearest neighbor Matching variables: more accurate if all cases non-missing Worst case: no non-missing match Special problem arises if small # of discrete values (next slide 

30 Variables have small # of discrete values
X1 X2 X3 X4  Ties! Impute with the average values across the ties BUT….

31 Variables have small # of discrete values
X1 X2 X3 X4  Ties! Impute with the average values across the ties BUT…. WHAT if the std. deviation of the imputed values is not less than the overall standard deviation for X1? Then, the imputation almost reduces to imputed = mean of X1 The “variance ratio” : In PRELIS, imputation “fails” if the variance ratio is too large (usually .5, .7, can be adjusted)

32 Nearest neighbor Advantages: Get “full” data set to work with May be superior for non-normal data Disadvantages Deflated standard errors (imputed values treated as “real” by estimating software) “Failure to impute” a large proportion of missing values is a common outcome with survey data

33 Multiple Group Approach
Allison Soc. Methods&Res. 1987 Bollen, p. 374 (uses old LISREL matrix notation)

34 Multiple Group Approach
Note: 13 elements of matrix have “pseudo” values df

35 Multiple group approach
Disadvantage: - Works only with a relatively small number of missing patterns

36 FIML (also referred to as “direct ML”)
Available in AMOS and in LISREL AMOS implementation fairly easy to use (check off means and intercepts, input data with missing cases and … voila!) LISREL implementation a bit more difficult: must input raw data from PRELIS into LISREL

37 FIML

38 FIML

39 FIML

40 (end)


Download ppt "General Structural Equations (LISREL)"

Similar presentations


Ads by Google