Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L. Dancy and Kathleen S. Crittenden University of Illinois at Chicago
Background What are missing data? Non-responses to data collection for quantitative research: (A) refuse to answer, (B) answer that they “don’t know”, (C) have a valid skip or (D) their responses are lost by errors. Missing discussion of Missing Data Analysis How to reconcile methodological and substantive or theoretical concerns in handling missing data empirically
Background Building Blocks of Missing Data Analysis 1. Missing Data Patterns Column: variables Row: cases A. Univariate/ multivariate pattern y1 y2 y3 y4 y5 B. Monotone pattern C. General pattern
Background Building Blocks of Missing Data Analysis 2. Missing Data Mechanisms Suppose there are two possibly related variables X and Y, we have all observations on X, and some missing values on Y, Missingness can be seen as a dummy variable (R) (A) Missing Completely At Random (MCAR) No relationships among R, X and Y. (B) Missing At Random (MAR) R is conditioned on X (C) Missing Not At Random (MNAR) R is conditioned on Y
Background Building Blocks of Missing Data Analysis 3. Implications of Missing Data Mechanisms (A) Missing Completely At Random (MCAR) No harm to the results of our research. (B) Missing At Random (MAR) Possible biases could/have to be adjusted and controlled for (C) Missing Not At Random (MNAR) Real effects are unknown and potentially problematic
Background Procedures for Missing Data Analysis 1. Traditional Approaches (A) Complete-case Analysis (listwise deletion) (B) Available-case Analysis (pairwise deletion) (C) Weight Adjustment for Complete Cases (D) Including Indicator Variables (E) Single Imputation Methods Most of them either will generate biased estimates unless the mechanism is MCAR and the pattern is univariate or monotone, or are very difficult to run and interpret.
Background Procedures for Missing Data Analysis 2. Modern Approaches (A) Maximum Likelihood estimation (ML) (B) Multiple Imputation (MI) Both require large sample, work best under MAR. Advantages of Multiple Imputation Efficiency: 5 to 10 imputations are enough. Not very sensitive to missing data mechanisms.
Empirical Application Research Design and Data (A) HIV/AIDS risk reduction intervention evaluation (B) Community-based, quasi-experimental, and longitudinal (C) Low income African American female adolescents (D) Intervention groups: Mother/Daughter Risk Reduction (MDRR) Health Experts Risk Reduction (HERR) Mother/Daughter Health Promotion (MDHP) (E) Outcome variables: Sexual behaviors etc.
Empirical Application Substantive arguments related to missing data (A) Inconsistency in self-reported risk behaviors of adolescents (B) Conservatively treating missing data in the key outcome as a failure Missing Data Problem (A) Missing responses occur disproportionately in MDHP (B) Girls who declined to answer to the key sexual behavior questions have a higher-risk profile than those who say “Yes”
Empirical Application 1. Missing Data Pattern-1 Mixed (monotone and general) Pattern with five sub-patterns a.The frequency of the first pattern refers to complete cases without any missing value. b.The frequency of the last pattern refers to cases dropped out prior to T2 or T3, respectively.
Empirical Application 1. Missing Data Pattern-2 Mixed (monotone and general) Pattern with five sub-patterns a.The frequency of the first pattern refers to complete cases without any missing value. b.The frequency of the last pattern refers to cases dropped out prior to T2 or T3, respectively.
Empirical Application 1. Missing Data Pattern-3 Mixed (monotone and general) Pattern with five sub-patterns a.The frequency of the first pattern refers to complete cases without any missing value. b.The frequency of the last pattern refers to cases dropped out prior to T2 or T3, respectively.
Empirical Application 2. Missing Data Mechanism (A) Dropped out cases: MAR but very close to MCAR General causes of dropped out may not apply. (B) Missingness in Sexual behavior variables: MNAR Inconsistent reporting, social desirability, etc. (C) Missingness in Self-efficacy variables: MAR Only very small number of cases Require more information from later waves to clarify
Empirical Application 3. Our Missing Data Treatment (A) Conservatively recode the missing values in recent sexual activity variable as affirmative response. (B) Create a categorical variable (response group) to indicate the missingness of sexual experience variable Theoretical grounding Empirical and technical rationales (C) Create another dummy variable for “at least have one missing value across three time points” (D) Exclude the cases that provided missing values in self- efficacy variables
Empirical Application 4. Comparison between MI and our treatment a. All p values are two tailed, + p<.1, *p<.05, **p<.01. b. RR refers to Risk Reduction intervention group (1=RR), reference group is MDHP. c. Missing Response is “have at least one missing value on sexual experience question at T1, T2 or T3” (1=Missing Response), reference group is those provide valid responses at all three time points.
Empirical Application 4. Comparison between MI and our treatment a. All p values are two tailed, + p<.1, *p<.05, **p<.01. b. RR refers to Risk Reduction intervention group (1=RR), reference group is MDHP. c. Missing Response is “have at least one missing value on sexual experience question at T1, T2 or T3” (1=Missing Response), reference group is those provide valid responses at all three time points.
Empirical Application 4. Comparison between MI and our treatment a. All p values are two tailed, + p<.1, *p<.05, **p<.01. b. RR refers to Risk Reduction intervention group (1=RR), reference group is MDHP. c. Missing Response is “have at least one missing value on sexual experience question at T1, T2 or T3” (1=Missing Response), reference group is those provide valid responses at all three time points.
Concluding Remarks 1. Investigating the patterns and interpreting the mechanisms can still be tricky. 2. There are no “best” missing data procedures. 3. Make decisions informed by substantive insights 4. If possible, researchers should apply both traditional and modern procedures in order to acquire more information to maximize the quality of results