The bane of data analysis Missing values The bane of data analysis
Missing values in computations Totals using arithmetic Totals using statistical function “sum” These methods treat missing values differently
Statistical functions for formulas
Beware: Missing data can introduce bias Depends on why data are missing MCAR – Missing Completely at Random MAR – Missing At Random MNAR – Missing Not At Random … and on how many values are missing
When does missing data introduce bias? When the missing data would have a different distribution than the available data Examples: In your clinical data, patients that are younger and healthier are less likely to have their blood pressures taken (MAR), but you think the BP’s you have are representative of the ones you don’t and you have other variables that can identify the relevant subgroups. Patients who don’t have good insurance are less likely to be in your clinical data because they don’t seek medical care (MNAR) – and you can’t assume the data you have are similar to the missing data.
When does missing data NOT introduce bias? When there’s no pattern to missingness (MCAR) But, if too many values are missing, then the variable is not informative But, in a multivariate analysis, the combination of missing values may kick out too many observations
What happens in data analysis? Computations with missing values result in missing values (with some exceptions) Thus, in most multivariate procedures, all observations with a missing value for any of the variables being utilized are thrown out The result is a “complete case analysis” - which is sometimes OK for MCAR. Beware: different procedures using the same data may be using different observations due to the missing value pattern. You may want to delete the observations that with missing values up front.
Possible Solutions MCAR - complete case analysis or imputation MAR – imputation based on other variables that are related to the missing data pattern based on expert judgment MNAR – be very careful trying to draw an inference
Other considerations Informative missing – the fact of missingness can be an interesting category in itself The loss of observations in a model due to missing values will decrease your power. Are you better off leaving out a problematic variable out of your model? Always check your statistical reports for the actual sample size included Check to see if the missing observations are comparable in terms of your outcome or other important variables.
An ounce of prevention Consider the “Required response” option for crucial variables But this can frustrate the respondent, so test your questions to make sure they’re not confusing Incentivize the respondents to complete the survey
(JMP) Tables>Missing Data Pattern For the 20 PCA items in lei_krupdat_data_for_fellows.jmp
(JMP) Tables>Missing Data Pattern For the 3 Cluster items based on the 20 PCA items in lei_krupdat_data_for_fellows.jmp