1 G Lect 13M Why might data be missing in psychological studies? Missing data patterns Overview of statistical approaches Example G Multiple Regression Week 13 (Monday)
2 G Lect 13M What leads to missing data? Experimental studies equipment failure experimenter error subject noncompliance drop out of subject data entry error »SOLUTION: collect more data
3 G Lect 13M What leads to missing data? Observational (nonexperimental) studies »equipment failure »observer/coder error »subject refusal »subject loss to follow-up »design/measure changes »nested interview questions »SOLUTION: Collect More Data
4 G Lect 13M Missing data patterns Terms suggested by Rubin »Rubin (1976), Little & Rubin (1987) In some cases, the data are MISSING COMPLETELY AT RANDOM (MCAR) »Which data point is missing cannot be predicted by any variable, measured or unmeasured. Prob(M|Y)=Prob(M) »The missing data pattern is ignorable. Analyzing available complete data is just fine.
5 G Lect 13M Missing data patterns In other cases, the data are MISSING AT RANDOM (MAR) »Which data point is missing is systematically related to subject characteristics, but these are all measured Conditional on observed variables, missingness is random Prob(M|Y)=Prob(M|Y observed ) »E.g. Lower educated respondents might not answer a certain question. »Missingness is can be treated as ignorable.
6 G Lect 13M Missing data pattern When data is Not Missing At Random (NMAR) »Data are missing because of process related to value that is unavailable Someone was too depressed to come report about depression Abused woman is not allowed to meet interviewer »Missing data pattern is not ignorable.
7 G Lect 13M Statistical Approaches Listwise deletion »If a person is missing on any analysis variable, he is dropped from the analysis Pairwise deletion »Correlations/Covariances are computed using all available pairs of data. Imputation of missing data values Model-based use of complete data »E-M (estimation-maximization approach) »Illustration in Excel
8 G Lect 13M Classic Cohen & Cohen advice Create dummy code for who has missing data (M) »Find out what variables are related to missingness Insert mean or some other value for missing values in IV and create multiple regression with full data plus variable M. »Procedure has been criticized for underestimating variance Current text reflects compromise
9 G Lect 13M Case Study: Depression Following Miscarriage »Neugebauer et al (1992) American Journal of Public Health, 82, »Neugebauer et al (1992) American Journal of Obstetrics and Gynecology, 166, Neugebauer and his colleagues recruited women who sought treatment for miscarriages and measured their levels of depression at 2 weeks, 6 weeks and 26 weeks post miscarriage. The study built on a case-control study of the causes of miscarriage that successfully recruited nearly 80% of eligible women. Neugebauer and his colleagues enrolled approximately 85% of the women in the initial study. 382 women were initially enrolled.
10 G Lect 13M Neugebauer Missing Data Some women were not available in the first two weeks following miscarriage, and others were not available in the subsequent two-week windows for followup measurement. Only 166 women were measured at all three time points. Missing observations were not related to: »SES, Ethnicity, Parity (# of pregnancies), # of previous miscarriages Those with missing observations were »Somewhat younger, with fewer living children.
11 G Lect 13M Pattern of missing data 2 Wk6Wk26WkN
12 G Lect 13M Means for different groups