NCRM Annual Meeting January 2009
People Lorraine Dearden (Dir)John ‘Mac’ McDonald Sophia Rabe-HeskethAnna Vignoles Kirstine HansenNikos Tzavidis James BrownMarcello Sartarelli Francesca FolianoAlfonso Miranda Sarah Patel Fellows Flavio Cunha, Christian Dustmann, Stephen Machin, Barbara Sianesi and Anders Skrondal
ADMIN Remit of ADMIN is to develop and disseminate methodologies for making best use of administrative data by exploiting survey data (and vice versa) Training and capacity building (Mac McDonald)
ADMIN Strength of administrative data is that they have information on almost everyone. Weakness is that they are not rich in covariates – NPD has detailed information on educational outcomes but no information on parental education. We can link richer survey to admin data to enhance the admin data.
ADMIN Weakness of survey data is non-response and attrition. Administrative data is virtually (but not fully) complete. Scope for using linked survey-administrative data to enhance survey data by telling us about those who are missing from survey data.
Aim is to develop methods to.. make inferences when covariates or responses are missing in administrative data. use administrative data to overcome measurement error in survey variables (e.g. recalled event histories) and vice versa (e.g. ethnicity) to tackle bias due to attrition in longitudinal surveys. using administrative data to improve small-area estimates of the means and quantiles of survey variables.
Programme 1 (Vignoles) Using survey data to enhance methods for the analysis of administrative data – Measuring the effect of family background and ethnicity on pupil attainment – To what extent does school attendance reflect real school preferences?
Programme 2 (Brown) Using administrative data to enhance methods for the analysis of survey data – Attrition, non-response and the determinants of school outcomes at 16 – Enhancing event history analysis of social surveys with administrative data
Examples of linked data Linked administrative data schools data (NPD/PLASC), FE data (ILR) and higher education data (HESA) – Complete administrative data on entire cohort all the way through the education system NPD/PLASC linked to survey data – LSYPE – MCS
Contribution to work on segregation Measuring segregation – Socio-economic segregation – Ethnic segregation Modelling causes of segregation – Parental school choice Examples drawn from schools but could be applied more broadly
Measuring socio-economic segregation Currently measured by FSM binary status Problematic measure (Hobbs and Vignoles, 2008) – Only picks up bottom 16% of distribution at best – Measurement error in FSM status E.g. children who do not eat at school not recorded as FSM – Changing FSM status in recession
Measuring socio-economic segregation Linked data has already provided an assessment of the extent to which FSM really proxies socio-economic disadvantage Can provide alternative measures of socio- economic background from surveys – Parental income/ high low income – Parental education/ high low education Can assess use of alternative proxies from administrative data e.g. geographic data
Measuring socio-economic segregation Linked data can test robustness of segregation work that uses FSM Need to be aware of issues raised by Becky Allen on using samples to measure segregation
Ethnic Minority Project Originally conceived of as a study of ethnic differences in outcomes – i.e. focusing on missing covariates in model of ethnic achievement – See work by Wilson et. al., 2005 Data - PLASC/NPD data linked to LSYPE (cohort born 1990/91)
Ethnic Minority Project Do we get estimates of ethnic differences in outcomes wrong if we just rely on administrative data ? Do ethnic classifications capture what we are interested in? e.g. example of recent migrants versus long standing populations What are differences by ethnicity once we take account of language (EAL)?
KS3 Results for Pakistani Males No Controls NPD Controls LSYPE Controls NPD + LSYPE Controls NPD sample (0.019) (0.018) LSYPE sample (0.068) (0.070) (0.068) (0.069) NB: Results show differences in standardized score outcomes NPD controls include gender, ethnicity, age, EAL, SEN, FSM, KS2 score
Measuring ethnicity and EAL Is there measurement error in the ethnicity or EAL variables in PLASC? If so, are there implications for measuring ethnic segregation and ethnic differences in outcomes ? – see Aspinall and Jacobson, 2007; Battistin and Sianesi, 2006
Measurement error in ethnicity and EAL Multiple measures from LSYPE – Ethnicity and ethnic origin self report – Ethnicity and ethnic origin parents – Language spoken at home – Frequency of English spoken at home Measures from PLASC – Ethnicity – EAL binary indicator
Measurement Error Misclassification of ethnicity not huge – Sub sample for whom we have full data and who live with both natural parents 7814 – 136 individuals recorded as white British in PLASC but are not according to LSYPE – 57 individuals recorded white British in LSYPE but not in PLASC Evidence of misclassified EAL – 7.2% young people labelled EAL in PLASC but appear not to be from LSYPE
Ethnicity of those “wrongly” coded EAL in PLASC
Correlations with error EAL/non white sample
Modelling causes of segregation Linked MCS data with NPD/PLASC Details of current school, ranked school choices, reasons for school choice Can provide missing covariates in models of causes of segregation e.g. attitudes to school choice Currently project investigating school choice in MCS (Burgess, Greaves, Vignoles and Wilson)
Short Courses Introduction to Data Linkage: The Value of Data Linkage for Research Data Linkage – Methodological and Statistical Issues Enhancing Longitudinal Surveys by Linking to Administrative Data: Longitudinal Data Analysis Event History Analysis Using Longitudinal Data Linkage to Evaluate Area- Based Interventions Data Linkage with the NPD