Multiple Imputation using SAS Don Miller 812 Oswald Tower
Introduction Missing values occur often in research: refused/don’t know, attrition, skip patterns… Dropping missing values may bias results (e.g. women and/or overweight tend to disclose their weight less often than others) Attempts are made to impute the data (“fill in” missing values) Single imputation (e.g. with the mean) is biased, doesn’t give measure of uncertainty
Multiple Imputation Simple Procedure For categorical variables: Construct binary dummy variables, throwing out reference category (e.g. Race: 1=“white”, 2=“black”, 3=“other” becomes Black, Other variables) Impute using PROC MI Round off imputed dummies if you want plausible values (this will bias your results) Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in procedure Combine results using PROC MIANALYZE
PROC MI Typical syntax: proc mi data=rawdat seed= out=impdat; var sex black other age drivesfast; run; data= 1 copy of data with missing values out= 5 copies of data with imputed values (will be different across copies) seed= random seed, you can keep same to reconstruct your results var Variables with missing values you need imputed, in model, and those that may be helpful with imputation
PROC MI Sample Output
PROC MI Options nimpute=5 # imputations, default=5 0 gives missing patterns minimum= set min & max, sometimes maximum= doesn’t converge as well round= round off option alpha=0.05 confidence limits mu0= t test null hypothesis μ=μ 0
PROC MI Statements em maxiter=200 out=emdata; EM algorithm, MLE of missing data freq fweight; weighs observations by frequency weight mcmc (options); modify imputation method class sex race; specify categorical variables (don’t need dummies) (new / experimental)
Regression Fit your model as if data had no missing values, using by _imputation_; proc reg data=impdat outest=parmcov covout; model drivesfast=sex black other age; by _imputation_; run; You’ll get nimpute (usually 5) sets of output Estimates, covariances, errors will be combined in MIANALYZE (R² is just mean) Need to generate parameter estimates and covariance data set (varies by procedure)
Parameter Est. & Covariance Matrix proc logistic data=impdat descending; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc mixed data=impdat; model drivesfast=sex black other age /solution covb; by _imputation_; ods output covparms=parmcov; run;
Parameter Est. & Covariance Matrix proc genmod data=impdat; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc glm data=impdat; model drivesfast=sex black other age /inverse; by _imputation_; ods output ParameterEstimates=parmsdat InvXPX=xpxidat; run;
PROC MIANALYZE Syntax depends on what procedure you used in previous step: proc mianalyze data=parmcov; or proc mianalyze parms=parmsdat covb=covbdat; or proc mianalyze parms=parmsdat xpxi=xpxidat; modeleffects intercept sex black other age; run; Note the “var” statement is now “modeleffects” Note that the dependent variable is omitted
PROC MIANALYZE Output