Multiple Imputation Multiple Regression
Input From SPSS *** Mult-Imput_M-Reg.sas ***; PROC IMPORT OUT= WORK.IntroQuest DATAFILE= "C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav" DBMS=SPSS REPLACE; Run; Use the Import Wizard to bring the data into SAS.
Create Missingness Variable Data Priapus; set IntroQuest; SATM_Miss = 0; If SATM =. then SATM_Miss = 1;
Check For Missing Data proc means n nmiss; run;
Check Correlates of Missingness on SATM proc corr nosimple; var SATM_Miss; with statoph gender ideal nucoph year; run;
Predictorr Ideal-.017 Statoph.084 * Nucoph.007 Year.082* Gender-.057 * p <.05
Oh Crap ! We have a lot of missing data on SATM Missingness on SATM is associated with statophobia and year. It is not missing completely at random. Need to employ multiple imputation.
Create Five Imputations Proc MI seed=69301 out=MIdata; var gender ideal nucoph SATM year; run;
Patterns of Missingness Most frequent pattern of missing data is missing on SATM only. GroupStatophGenderIdealNucophSATMYearFreq 1 XXXXXX540 2 XXXX.X139 3 XXX.XX1 4 XX.XXX2 5 XX.X.X3 6.XXXXX3 7.XXX.X5 8.XX.XX1
Means By Pattern of Missingness. GroupGroup Means StatophGenderIdealNucophSATMYear
Estimated Means & Covariances EM (Posterior Mode) Estimates _TYPE__NAME_StatophGenderIdealNucophSATMYear MEAN COVStatoph COVGender COVIdeal COVNucoph COVSATM COVYear
Analyze the Imputed Data Proc Reg outest = MRbyImput covout; Model Statoph = gender ideal nucoph SATM year / stb; By _Imputation_; run; Proc MIAnalyze; modeleffects intercept gender ideal nucoph SATM year; run; See the complete output herehere In every imputation, Gender, SATM, and Year have significant effects.
Proc MIAnalyze Output Pools the results from the five imputations. The variance in the scores is partitioned between that among imputations and that within imputations. Ideally, little of the variance is due to differences among imputations.
Variance Among/Within Imputations ParameterVariance BetweenWithinTotal intercept gender ideal nucoph SATM year
“Relative Increase in Variance” is the increase in variance due to having missing data imputed (relative to the condition where no data are missing). Low is good. “Fraction of Missing Information,” is an index of how much more precise the parameter estimate would have been if there had been no missing data. Low is good.
“Relative efficiency” tells you how much power you have for the number of imputations you have employed relative to what you would have if you used an uncountably large number of imputations. High is good.
RIV, FMI, & RE ParameterRelative Increase in Variance Fraction Missing Information Relative Efficiency intercept gender ideal nucoph SATM year
ParameterEstimate 95% Confidence Limits DFMinMaxtPr > |t| intercept gender ideal nucoph SATM <.0001 year
Conclusions Women report greater fear of the stats course than do men. Reported Math Aptitude is inversely correlated with fear of stats.