Working with missing Data
Missing data General 3 steps for analyzing missing data: Identify patterns/reasons for missing data. Understand the distributions of missing data. Decide on the best method for analysis.
Identify patterns/reasons for missing data Understand your data Are certain groups more likely to have missing values? Are certain responses more likely to be missing?
Method for analysis Deletion methods - List deletion Single Imputation Methods - Mean/mode substitution, dummy variable method, single regression Model based methods - Maximum Likelihood, Multiple imputation, others
Multiple imputation Impute: - Data is “filled in” with imputed values using specified regression model - This step is repeated “m” times, resulting in a separate dataset each time. Analyze: - Analyses performed within each dataset Pooled: - The results pooled into one estimate
Multiple imputation example Plot Rep Treatment Response 1 45 2 3 NA 22 4 18 5 6 34 7 40 8 14 9 10 11 16 12 20
R Studio package (Amelia)