2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT
AGENDA 1.Background 2.Coverage in the 2001 Census Methodology overview 4.Key changes 5.Summary
WHAT IS THE PROBLEM? Despite best efforts, census won’t count every household or person It will also count some people twice Users need robust census estimates - counts not enough In 2001: –One Number Census (ONC) methodology was developed to measure undercount –estimated 1.5 million households missed –3 million persons missed (most from the missing households but some from counted households) –Subsequent studies estimated a further 0.3 million missed In 2011 we want to build on the ONC, as broadly it was successful
2001 CENSUS UNDERCOUNT BY AGE-SEX
RESPONSE RATES BY LOCAL AUTHORITY
COVERAGE ASSESSMENT PROCESS OVERVIEW Estimation Matching Adjustment 2011 Census Quality Assurance Census Coverage Survey
AREAS OF IMPROVEMENT Elements of CCS Design Estimation methodology Measuring overcount Adjustments for bias in DSE Imputation Motivated by: –lessons learnt from 2001 –2011 Census design e.g. use of internet
THE CCS DESIGN Similar to 2001 CCS: –300,000 Households –Sample of small areas (postcodes) –6 weeks after Census Day –Fieldwork almost identical Improvements: –Designed at LA level, not for LA groups –Refined Hard to Count index (5 levels) using up to date data sources –Use Output Areas as PSUs –Select 3 postcodes per OA –Revised allocation of sample (using 2001 patterns)
THE CCS DESIGN (2) What does this mean? –Each LA will have its own sample – at least 1 OA for each hard to count level –Sample is more skewed to LAs with ‘hardest to count’ populations (with an upper limit of 60 OAs) More LAs will have estimates based on their own data Especially in London and for big cities –HtC index will be ‘up to date’ –Most LAs will have 3 HtC levels Most London areas only had one in 2001 Looking at a 40%, 40%, 10%, 8%, 2% distribution
ESTIMATION Obtained lots of data from 2001 to be able to explore whether improvements can be made One key issue was whether we should group LAs by geography or by ‘type’ Improvements: Confirmed that using DSE at OA level is sensible Confirmed that we should group LAs by geography Use simple Ratio estimator Confirmed that LA estimation method is still best
ESTIMATION (2) What does this mean? –The estimation methodology is much the same as it was –Should be slightly easier to explain –We will group LAs that don’t have enough sample with their neighbours until that group has enough sample –More LAs will have enough sample to produce direct estimates
OVERCOUNT In 2001, estimated around 0.4% overcount (duplication) –No adjustments made –Not integrated into methodology For 2011, expecting overcount to be higher –More complex population –Use of internet in 2011 Census Strategy is to: –A) identify and remove obvious cases (multiple response resolution) –B) measure and make net adjustments on the remainder –i.e. for the latter we are NOT removing duplicates
OVERCOUNT (2) Methodology: –Select targeted samples of census records Second residences Students Children –Very large sample (~600,000k records) –Automatic matching algorithm to identify duplicates –Clerical checking of matches expect to see ~13,000 duplicates Also use the LS to QA the estimates –Estimation of duplication rates by GOR and characteristics estimating which is the correct record –Why not do whole database and remove them? High risk of making false positives and thus removing too many!
OVERCOUNT (3) What does this mean? –Population estimates will be reduced where there is overcount –We will be able to say how much adjustment was made due to overcount –The duplicates will still be in the data, we just won’t impute as much for undercount
DSE BIAS ADJUSTMENTS Assumptions underpinning DSE: –Homogeneity –Independence –Accurate Matching –Closure DSEs usually have some bias, mostly due to failure of homogeneity assumption In 2001 Census we made a ‘dependence’ adjustment This showed that we need to have a strategy for measuring this
DSE BIAS ADJUSTMENTS (2) Mitigate as much as possible: i.Post-stratify DSE so heterogeneity is minimised ii.Independence in CCS field processes iii.Design Matching to get accuracy iv.Collect CCS on same basis as Census Measure remaining bias –Specific adjustments – e.g. Movers, Overcount –Residual biases global adjustment Improved adjustment using Census address register Looking at improving age-sex distribution
DSE BIAS ADJUSTMENTS (3) What does this mean? –We will be making adjustments to the estimates based on plausible external data Household counts Sex ratios –This will be part of the methodology –Also can be used if QA determines estimates are implausible
COVERAGE ADJUSTMENT Imputation methodology had problems converging –Sometimes resulted in poor quality results Improvements: –Model characteristics at higher geographies –Allows more details to be modelled –Some additional topics in the CCS included in models: Migration variable (internal, international) Country of birth (UK and non-UK) –Non-controlled variables imputed by CANCEIS What does this mean? –Better Imputation quality –Characteristics of imputed improved
SUMMARY Coverage assessment is an integral part of the 2011 Census It will again define the key census outputs (estimates at LA level by age and sex) and adjust the database We learnt a lot of lessons in 2001 and have been working to address them
Questions?