Towards the 2011 UK Census Editing Strategy Heather Wagstaff and Steven Rogers Methodology Directorate Office for National Statistics, U.K.
Overview The presentation is structured as follows: Overview of edit & imputation in UK Census CANCEIS: fitness for 2011 UK Census Development of 2011 Census Edit Strategy Summary
Overview of UK Census Edit & Imputation EDIS hard coded leading to flexibility problems: UK countries all had slightly differing requirements; 1999 Rehearsal Data too late for system testing; problems during live running: late changes to question set not tested; complex filter questions not followed by number of respondents.
Statistical Modernisation Programme Main Focus:to deliver standard statistical infrastructure, methodologies and tools; Main Aim:to apply recognised standards and practices in highly efficient way. CANCEIS endorsed as corporate edit and imputation tool: implement where data are mainly nominal; implemented on household surveys; and Civil Registration now endorsed for 2011 Census.
CANCEIS CANadian Census Edit and Imputation System generalised edit and imputation system specify edits in decision logic tables nearest neighbour imputation methodology computationally efficient simultaneous imputation of numeric and categorical variables
CANCEIS - ensuring fitness for 2011 Census 2001 UK Census processed about 27 million forms in relation to circa. 60 million people. Demonstrate evidence of robustness for 2011 Census: 1. provide proof of concept; 2. replicate 2001 Census Editing Strategy; 3. recover statistical properties of data.
CANCEIS - ensuring fitness for 2011 Census Stage 1:Provide proof of concept purpose:to access whether CANCEIS will produce complete and consistent census data UK Census data processed by Administration Areas convert edit rules to DLT’s apply CANCEIS to census data replicate in SAS to QA edit process outcome:CANCEIS produced complete and consistent dataset in under 2 hours.
CANCEIS - ensuring fitness for 2011 Census Stage 2:Replicate 2001 Census Editing Strategy purpose:to assess range of functionality
CANCEIS - ensuring fitness for 2011 Census Stage 3:Recover statistical properties of the data purpose:to assess whether CANCEIS contained an imputation process of acceptable quality; micro-simulation environment; 170K households and 400K individuals; stochastic process - apply CANCEIS in multiple runs; measure distributional and predictive accuracy.
CANCEIS - ensuring fitness for 2011 Census Step 3:Recover statistical properties of the data
Towards the 2011 Census Edit Strategy Develop in two parts integral to Census Quality Strategy: Part 1:Specification of comprehensive and cohesive edit rules Part 2:Research imputation methodology including partitioning person variables large household sizes Communal Establishments (collectives) differing area types
Towards the 2011 Census Edit Strategy Example of importance of single cohesive set of edit rules: dependence of filter rules on 100% accuracy of date of birth at data capture.
Towards the 2011 Census Edit Strategy Example of methodology: partitioning person variables : 2001 Census question set partitioned in 6 main topics Labour Market contained 4 subsets 2011 question set currently unendorsed
Concluding Remarks Benefits of applying a generalised system (CANCEIS) in the 2011 UK Census include: significant cost savings and efficiency gains (inc. time to process large datasets); flexibility and transparency; allow time and resource to address difficult methodological issues.