Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Non Response in the 1958 Birth Cohort

Similar presentations


Presentation on theme: "Understanding Non Response in the 1958 Birth Cohort"— Presentation transcript:

1 Understanding Non Response in the 1958 Birth Cohort
Tarek Mostafa, Brian Dodgeon & George B. Ploubidis

2 Outline Introduction to the National Child Development Study (NCDS)
Non-response in NCDS The three stages strategy Findings

3 The National Child Development Study (NCDS)
The NCDS follows the lives of all people born in England, Scotland and Wales in one week of March 1958 The target sample was augmented to include immigrants born in the same week, in the first three sweeps (ages 7, 11 and 16) NCDS experienced attrition over time. Attrition is the discontinued participation of some individuals in a longitudinal survey

4 Response in NCDS

5 Non response in NCDS Types of non-response Wave 0 Wave 1 Wave 2 Wave 3
Birth Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 46 Age 50 Age 55 Non-contact 223 1,042 410 786 1,867 1,529 1,832 612 835 664 Not issued 920 542 271 1,415 4,248 3,553 4,698 Refusal 80 797 1,151 1,160 1,776 1,148 1,448 1,214 582 Other unproductive 173 202 295 838 1,399 263 109 332 491 Not issued - emigrant 475 701 799 1,196 1,335 1,268 1,272 1,293 1,287 Not issued - dead 821 840 873 960 1,050 1,200 1,324 1,460 1,503 Ineligible 13 11 81 Total 1,143 3,133 3,221 3,904 6,021 7,089 7,139 9,024 8,768 9,225

6 Monotone vs. Non-monotone response
Response patterns Freq. Percent Monotone 5688 30.64 Non-monotone 8329 44.89 All waves 4,541 24.47 Total 18558 100

7 Non-response bias - implications
Missing data: 1- Smaller samples, fewer transitions, incomplete histories. Loss of statistical power due to smaller samples 2- Biases in results: Disproportionate to some groups (mobile, disadvantaged, men, long working hours)

8 What can we do about attrition?
All principled methods rely on the MAR assumption and on the correct identification of predictors of response => maximise the plausibility of MAR So far selection is arbitrary Mainly birth characteristics, reflecting economic and social circumstances: SES, housing tenure, accommodation type, London vs. rest of the country… Selection of predictors matters more for studies with more waves (more missing data)

9 Missing data strategy Identify predictors of response at each wave of NCDS: Variables from each wave can only predict response on subsequent waves Thousands of variables are available => selection is done in three stages Pre – selection: We exclude routed variables. We exclude dead respondents from each wave – natural evolution of the sample Analysis: Stage 1: univariable regressions within wave Stage 2: multivariable regressions within wave Stage 3: multivariable regressions across waves

10 Stage 1: Univariable regressions
Predictors: waves 0 to 8 (including biomedical) Response outcomes: waves 1 to 9 (including biomedical) Univariable logistic regressions (one predictor at a time) to predict response in subsequent waves Analytical sample=non-missing W0 predictors. N=15,150 N of regressions for: (wave0= 19*10); (wave5=100*5); … The process is automated Retained predictors with a significant impact on response at p<0.01 N of predictors W0 W1 W2 W3 W4 W5 W6 Biomed W7 W8 Birth Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 45 Age 46 Age 50 Stage 1 (input) 19 50 48 58 73 100 276 81 155 194

11 Stage 2: Multivariable regressions within wave.
N of predictors W0 W1 W2 W3 W4 W5 W6 Biomed W7 W8 Birth Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 45 Age 46 Age 50 Stage 1 (input) 19 50 48 58 73 100 276 81 155 194 Stage 2 (input) 11 36 34 47 68 114 39 53 120 N of predictors is reduced after stage 1 Stage 2 consists of multivariable regressions using all predictors from same wave (i.e. 11 for W0, 36 for W1, 120 for W8) that were retained after Stage 1 Variables compete against each other within wave Using log binomial models (since outcome not always rare) Retain variables significant at p<0.05 This results in different subsets of predictors from each wave depending on the response outcome

12 Stage 3: Multivariable regressions across waves.
N of predictors W0 W1 W2 W3 W4 W5 W6 Biomed W7 W8 Birth Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 45 Age 46 Age 50 Stage 1 (input) 19 50 48 58 73 100 276 81 155 194 Stage 2 (input) 11 36 34 47 68 114 39 53 120 Variables entering stage 3: At this stage we let predictors from different waves compete against each other (e.g. subset of predictors from W0 and W1 predict response in W2; subset of predictors from W0, 1, 2, 3, 4 predict response in W5, …) Challenge: predictors from different waves => different levels of missingness (due to non-response over time). Predictors from earlier waves are more complete N of predictors Resp 1 Resp 2 Resp 3 Resp 4 Resp 5 Resp 6 Biomed Resp 7 Resp 8 Resp 9 Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 45 Age 46 Age 50 Age 55 Stage 3 (input) 4 13 23 36 52 54 81 86 88 116

13 Stage 3: Multivariable regressions across waves (continued).
Solution: progressive imputation and alternation of MI and response modelling We impute missing predictors then estimate response models and so on for each wave Example W2 MI with chained equations and response modelling with log binomial models. => Set of predictors of response in each wave (sig at p<0.05). For each response outcome, the predictors can include variables from any previous wave

14 Results from stage 3 The number of predictors of response at each wave is reduced Predictors include all types of variables: Social, economic, health, and survey related (cooperation with sub-studies) N of predictors Resp 1 Resp 2 Resp 3 Resp 4 Resp 5 Resp 6 Biomed Resp 7 Resp 8 Resp 9 Age 7 Age 11 Age 16 Age 23 Age 33 Age 42 Age 45 Age 46 Age 50 Age 55 Stage 3: input 4 13 23 36 52 54 81 86 88 116 Stage 3: output 9 11 17 24 18 40 34 35 39

15 Examples of predictors for Wave 3 (age 16)
Smoking prior to pregnancy (wave 0, birth) Physical exercise problems (balance etc) (wave 2, age 11) Social class of mother's husband (wave 0, birth) Outpatient Clinic attendances (wave 1, age 7) Number of  kids aged under 21 in household, including those living away from home (wave 1, age 7)

16 Examples of predictors for Wave 5 (age 33)
Age at birth of first child (wave 4, age 23) Recoded education apprenticeship and training (wave 4, age 23) Moved between NCDS4 (age 23) and NCDS5 (age 33) (wave 4, age 23) Couples joint social class (wave 4, age 23)

17 Examples of predictors for Wave 6 (age 42)
Type of accommodation (wave 4, age 23) General motor handicap (wave 3, age 16) Couples joint social class (wave 4, age 23) Region at NCDS3 (wave 3, age 16) Current main economic activity (wave 5, age 33)

18 Examples of predictors for Wave 9 (age 55)
Benefits currently received by CM or partner (wave 6, age 42) Housing tenure (wave 6, age 42) If CM participated in NCDS5 (wave 5, age 33) Consent to ESRC archive (wave biomed) Region at NCDS3 (wave 3, age 16)

19 Conclusions It is possible to systematically identify predictors of response (maximise the MAR assumption) Identified predictors of response can be used in any substantive analysis with methods that operate under the MAR assumption The work will be expanded to the other CLS cohort studies (next will be MCS) We will provide a list of predictors for each wave and advise researchers on their use But! We are not suggesting that all these predictors of response should be used Test different sets/blocks of predictors of response Watch this space, but probably ok to use early life predictors only Work in progress- We are planning to augment our “traditional” statistical approach on variable selection with machine learning (SuperLearner) and other procedures Least Absolute Shrinkage and Selection Operator (LASSO)

20 Thank you for your attention!


Download ppt "Understanding Non Response in the 1958 Birth Cohort"

Similar presentations


Ads by Google