Trends in Inequality of Educational Opportunity in the Netherlands : The Effect of Missing Data Maarten L. Buis & Harry B.G. Ganzeboom Department of Social Research Methodology Vrije Universiteit Amsterdam RC28, Oslo, May
Buis & Ganzeboom, Oslo Conclusions (1) A steady trend towards less IEO in the Netherlands remains visible throughout the 20th century. However, on closer scrutiny there appears to be evidence of a slower trend or even stability for the earlier and most recent cohorts. Spline analyses of trends confirms this.
Buis & Ganzeboom, Oslo Conclusions (2) Missing data in father’s occupation vary by education of respondent: MV are about 3 times more prevalent among the lowest educated than among the highest educated. One would hypothesize that this mitigates measures of IEO and the historical trend therein. Multiply imputed data for FISEI: –Level of IEO increases –(Linear) trends in IEO becomes steeper
Buis & Ganzeboom, Oslo Previous research IEO = Inequality of Educational Opportunity = association between father’s occupation and respondent’s education. Previous research: long-term linear trend towards less IEO: –Cohorts : Ganzeboom & De Graaf, 1989, De Graaf & Ganzeboom, 1990a, 1990b. –Cohorts : Ganzeboom & Luijkx, This holds for both linear regression models en sequential logits (first two transitions).
Buis & Ganzeboom, Oslo ISMF Now 51 studies on the Netherlands, collected between 1958 and 2004, N > men and women 25+. Recent additions (since 2002 and Breen 2004): 16 studies, appr. 30% of the N. Father: FISEI – International Socio-economic Index of Occupational Status. Education: level of education scaled relative to benchmarks: primary = 6, highest secondary = 12, university complete = 17.
Buis & Ganzeboom, Oslo 20056
7 Research questions How do trend and level estimates of IEO depend upon data qualities: –Measures used –Quality and nature of the sample –Non-response –Missing values
Buis & Ganzeboom, Oslo Missing values MCAR = Missing Completely at Random MAR = Missing at Random: missingness is random given the values of control (X) variables. NMAR: Not Missing at Random: missingness depends upon values of Y-variable. Rubin 1987, Little & Rubin 2002, Allison Multiple hotdeck imputation in STATA.
Buis & Ganzeboom, Oslo Complete case analysis (listwise deletion) OK, if MCAR. Biased if MAR. Inefficient (too large standard errors – this can be quite dramatic. Linear trend: –EDU = *FIS – 5.3*FIS*COH etc. (Men) (.17) (.08) (.45) –EDU = *FIS – 3.3*FIS*COH etc. (Women) (.17) (.08) (.44)
Buis & Ganzeboom, Oslo
Buis & Ganzeboom, Oslo Hot deck imputation Classify all cases by combinations of predictor variables (COH, FED, MED, ISEI). Stratify the cases by these combinations. Substitute the missing FISEI by valid FISEI of random (nearest) neighbor. Key idea: do not only borrow the systematic (predicted) part, but also the error term.
Buis & Ganzeboom, Oslo Multiple hot deck imputation Do hot deck imputation several times (10- 20). Bootstrap from each stratum a sample (with replacement) of stratum size. Random selection of neighbor varies by imputation cycle. Key idea: Rubin (1987): pp Get the variance-covariance estimation right.
Buis & Ganzeboom, Oslo Key results FISEI predicted by COH (4), FED (7), MED (7), ISEI (8). 10 imputations Linear trend result: –EDU = *FIS – 6.1*FIS*COH etc. (Men) (.33) (.12) (.64) –EDU = *FIS – 3.7*FIS*COH etc. (Women) (.46) (.12) (.64)
Buis & Ganzeboom, Oslo Non-linearities Linear splines Estimates with 1, 2, 3, 4 etc. knots (and a uniform distribution). We were happy with the result with 3 knots. Test of equality of slopes: –Between trajectories –Between men and women
Buis & Ganzeboom, Oslo
Buis & Ganzeboom, Oslo Results Complete case analysis finds: –Decline in IEO occurs between cohorts 1920 and Before 1920 and after 1960, the trend can be assumed to be flat. –There is a constant difference in IEO between men and women: women’s educational attainment appr. 10% less dependent on FIS than men’s.
Buis & Ganzeboom, Oslo
Buis & Ganzeboom, Oslo Multiple hot deck imputed data Finds pattern very similar to complete case analysis. But decline of IEO between 1920 and 1960 is steeper! However, standard errors of effects have increased (despite inclusion of more information).