Multiple Imputation using SAS Don Miller 812 Oswald Tower
Introduction Missing values occur often in research: refused/don’t know, attrition, skip patterns… Dropping missing values may bias results (e.g. women and/or overweight tend to disclose their weight less often than others) Attempts are made to impute the data (“fill in” missing values) Single imputation (e.g. with the mean) is biased, doesn’t give measure of uncertainty
Paris datasets Open Windows Explorer (or My Computer) Tools – Map Network Drive Drive P: Folder \\paris\sas_data\\paris\sas_data For help Stat help
Data Setup
Multiple Imputation Simple Procedure 1. Impute using PROC MI 2. Round off, if you want plausible values (caution: this will bias your results) 3. Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in the procedure 4. Combine results using PROC MIANALYZE For categorical variables: Construct binary dummy variables, throwing out reference category (e.g. race: 1=“white”, 2=“black”, 3=“other” becomes black, other variables)
PROC MI Typical syntax: proc mi data=bmx out=impdat seed=33155; var bmxbmi bmxht bmxwt bmxarmc bmxarml; run; data= 1 copy of data with missing values out= 5 copies of data with imputed values (will be different across copies) seed= random seed, you can keep same to reconstruct your results var Variables with missing values you need imputed, in model, and those that may be helpful with imputation
PROC MI Sample Output
PROC MI Options nimpute=5 # imputations, default=5 0 gives missing patterns minimum= set min & max, sometimes maximum= doesn’t converge as well round= round off option alpha=0.05 confidence limits mu0= t test null hypothesis μ=μ 0
PROC MI Statements em maxiter=200 out=emdata; EM algorithm, MLE of missing data freq fweight; weighs observations by frequency weight mcmc (options); modify imputation method class sex race; specify categorical variables (don’t need dummies) (new / experimental)
Output dataset
Regression Fit your model as if data had no missing values, using by _imputation_; proc reg data=impdat outest=parmcov covout; model bmxbmi=bmxht bmxwt bmxarmc bmxarml; by _imputation_; run; You’ll get nimpute (usually 5) sets of output Estimates, covariances, errors will be combined in MIANALYZE (R² is just mean) Need to generate parameter estimates and covariance data set (varies by procedure)
Parameter Est. & Covariance Matrix proc logistic data=impdat descending; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc mixed data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /solution covb; by _imputation_; ods output covparms=parmcov; run;
Parameter Est. & Covariance Matrix proc genmod data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc glm data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /inverse; by _imputation_; ods output ParameterEstimates=parmsdat InvXPX=xpxidat; run;
PROC MIANALYZE Syntax depends on what procedure you used in previous step: proc mianalyze data=parmcov; (or) proc mianalyze parms=parmsdat covb=covbdat; (or) proc mianalyze parms=parmsdat xpxi=xpxidat; (then type this:) modeleffects intercept bmxht bmxwt bmxarmc bmxarml; run; Note the “var” statement is now “modeleffects” Note that the dependent variable is omitted
PROC MIANALYZE Output