Does cognitive ability in childhood predict fertility

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
Handling Missing Data on ALSPAC
Regression and correlation methods
Treatment of missing values
ADVANCED STATISTICS FOR MEDICAL STUDIES Mwarumba Mwavita, Ph.D. School of Educational Studies Research Evaluation Measurement and Statistics (REMS) Oklahoma.
Some birds, a cool cat and a wolf
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.

How to deal with missing data: INTRODUCTION
A Longitudinal Study of Maternal Smoking During Pregnancy and Child Height Author 1 Author 2 Author 3.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Understanding Research Results
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Chapter 13: Inference in Regression
Hypothesis Testing in Linear Regression Analysis
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Methods Inverse probability weighting –Can you predict probability of response? –Difficulties if more than one missing outcome or covariate Joint model.
Understanding Statistics
Scientific question: Does the lunch intervention impact cognitive ability? The data consists of 4 measures of cognitive ability including:Raven’s score.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Introduction Multilevel Analysis
UNDERSTANDING RESEARCH RESULTS: DESCRIPTION AND CORRELATION © 2012 The McGraw-Hill Companies, Inc.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Practical Missing Data Analysis in SPSS (v17 onwards) Peter T. Donnan Professor of Epidemiology and Biostatistics.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Statistics in Applied Science and Technology Chapter14. Nonparametric Methods.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Tutorial I: Missing Value Analysis
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Stats Methods at IC Lecture 3: Regression.
Missing data: Why you should care about it and what to do about it
Handling Attrition and Non-response in the 1970 British Cohort Study
Logistic Regression APKC – STATS AFAC (2016).
Understanding Non Response in the 1958 Birth Cohort
Notes on Logistic Regression
Applied Biostatistics: Lecture 2
How useful is a reminder system in collection of follow-up quality of life data in clinical trials? Dr Shona Fielding.
The Centre for Longitudinal Studies Missing Data Strategy
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Elementary Statistics
STATS DAY First a few review questions.
How to handle missing data values
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Statistical Modelling
Correlation and Regression
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
CH2. Cleaning and Transforming Data
Therefore, the Age variable is a categorical variable.
Statistics II: An Overview of Statistics
Non response and missing data in longitudinal surveys
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Exercise 1: Gestational age and birthweight
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Clinical prediction models
Chapter 2 Examining Your Data
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Does cognitive ability in childhood predict fertility Does cognitive ability in childhood predict fertility? An example of the use and validity of the Missing Data Strategy Brian Dodgeon, Tarek Mostafa & George B. Ploubidis

Outline -major causes of missingness perhaps not included in models? ‘Missing at Random’ assumption underlies many methods for dealing with missing data But can we rule out possibility that NCDS patterns of missingness are MNAR? -major causes of missingness perhaps not included in models? We use an example to test how MI performs when the model is intentionally mis-specified Sensitivity analysis to compare MAR and MNAR scenarios

Example research project Does cognitive function age 11 predict childlessness at age 42? At age 11 we have N=14095 cohort members who did cog. tests By age 42 we have only N=11419 cohort members for whom we know whether or not they’re childless (not all present at age 11) We control for important childhood predictors: Birthweight Breastfed or not Parental social class & education Mother smoking prior to pregnancy Mother working before child aged 5

Example research project – Preliminary analysis First using Complete Case Analysis (N=6947), we see a U-shaped pattern of childlessness at age 42. Those with lowest and highest scores most likely childless So we divide age 11 cog test results into 3 categories: Low: 1SD or more below mean Middle mean +- 1 SD High 1SD or more above mean Both low and high categories have positive assoc’n with childlessness age 42, but is this result biased? (Perhaps those with low cog more likely to drop out by 42?)

Example research project – exploring missingness We attempt four MI specifications for dealing with missing values, using a set of auxiliary variables which predict missingness: Mean replacement (N=11419) Multiple imputation (MI) with outcome at 42 not imputed(N=10280) MI with outcome imputed (N=15800) MI outcome imputed with vars up to age 16 (N=15800) (predictors of missingness from sweeps > age 16 not used)

Objectives We give examples of how the identified predictors of wave- specific non-response should be used Objective 2 We intentionally mis-specify those same approaches as a sensitivity analysis Abandon best predictors of non-response (ie now MNAR?) See how results hold up compared with Objective 1

Cognition at age 11 by childlessness at age 42 Probit coeffs/95% CIs using distinct missing data strategies – Objective 1 CCA assumes cases not present are Missing Completely at Random. But this assumption ignores known patterns of attrition bias. We have a significant positive association with childlessness at 42, for both high and low cognition, compared to ‘middle’ range. Probit coeff (95% CI) Low Cog ability High Cog ability N=6947 MCAR Complete Case Analysis

Cognition at age 11 by childlessness at age 42 Probit coeffs/95% CIs using distinct missing data strategies – Objective 1 Using the simple imputation method of mean replacement (or ‘missing’ category for categorical variables), we gain statistical power (CIs are smaller). But mean replacement ignores more subtle patterns of variation. Probit coeff (95% CI) Low Cog ability High Cog ability N=6947 MCAR Complete Case Analysis N=11419 Mean replacement/ missing category

Cognition at age 11 by childlessness at age 42 Probit coeffs/95% CIs using distinct missing data strategies – Objective 1 Multiple Imputation by Chained Equations (MICE) models more complex patterns of variation. By not imputing the outcome (chidlessness) we have N=10280, the number of cases for whom we have an outcome. Probit coeff (95% CI) Low Cog ability High Cog ability N=6947 MCAR Complete Case Analysis N=11419 Mean replacement/ missing category N=10280 MI Outcome not imputed

Cognition at age 11 by childlessness at age 42 Probit coeffs/95% CIs using distinct missing data strategies – Objective 1 By imputing the outcome (childlessness at 42), we get more statistical power (N=15800). Probit coeff (95% CI) Low Cog ability High Cog ability N=6947 MCAR Complete Case Analysis N=11419 Mean replacement/ missing category N=10280 MI Outcome not imputed N=15800 MI Outcome Imputed

Cognition at age 11 by childlessness at age 42 Probit coeffs/95% CIs using distinct missing data strategies – Objective 1 Using MI with outcome imputed, but restricting the predictors of missingness to those variables in waves up to age 16, we maximise the statistical power (N=16013), obtaining smaller confidence intervals than with the other methods. Probit coeff (95% CI) Low Cog ability High Cog ability N=6947 MCAR Complete Case Analysis N=11419 Mean replacement/ missing category N=10280 MI Outcome not imputed N=15800 MI Outcome Imputed N=16013 MI Outcome Imputed with vars up to age 16

Discussion of results (Objective 1) The imputation process gives us more stat. power and produces similar result to Complete Case Analysis We prefer the ‘MI with outcome imputed’ approach Our ‘benchmark’ for Objective 2 will be MI with outcome imputed, with predictors of non-response from waves up to age 16 This approach embodies fewer assumptions, since population up to age 16 is still high (about 10% reduction from birth population, taking into account deaths)

Cognition at age 11 by childlessness at age 42 Probit coefficients/95% CIs – Objective 2 Probit coeff (95% CI) N=16013 MI Outcome Imputed with vars up to age 16 Low Cog ability High Cog ability

Cognition at age 11 by childlessness at age 42 Probit coefficients/95% CIs – Objective 2 Probit coeff (95% CI) N=16013 MI Outcome Imputed with vars up to age 16 N=16086 MI removing most importnt predictor of response At age 11 Low Cog ability High Cog ability

Cognition at age 11 by childlessness at age 42 Probit coefficients/95% CIs – Objective 2 Probit coeff (95% CI) N=16013 MI Outcome Imputed with vars up to age 16 N=16086 MI removing most importnt predictor of response At age 11 N=16082 MI removing 2 most important predictors of response@42 up to age 16 Low Cog ability High Cog ability

Cognition at age 11 by childlessness at age 42 Probit coefficients/95% CIs – Objective 2 Heckman Probit is a selection model with MNAR assumption Probit coeff (95% CI) N=16013 MI Outcome Imputed with vars up to age 16 N=16086 MI removing most importnt predictor of response At age 11 N=16082 MI removing 2 most important predictors of response@42 up to age 16 N=12038 Heckman Probit Low Cog ability High Cog ability

Conclusion In all scenarios the non-linear trend is confirmed (lowest and highest cognition at age 11 predicts childlessness) Mis-specified MI models return results with a similar substantive interpretation Heckman selection returns somewhat different results But, Heckman sensitive even to subtle departures from assumptions If Heckman is correct, some form of selection is not captured by MI, but most scenarios (even MNAR) do not conform with Heckman results

Future work Formal sensitivity analysis with pattern mixture models to further probe the MAR assumption (Carpenter & Kenward, 2012) Directed Acyclic Graphs – M Graphs to identify “colliders” among the predictors of response Test the performance of different sets of identified predictors of response Carpenter, J.R & Kenward, M.G (2012) Multiple Imputation and its Application. Wiley, Chichester.