Working Group on Population and Housing Censuses

1 Working Group on Population and Housing Censuses
Luxembourg, September 2014 Item 2 of the Agenda Quality review of the data and metadata

2 Summary Legal obligations for Eurostat to conduct a quality check of the 2011 census data Overview of Eurostat's strategy for quality check Overview of the results of census quality check Future steps

3 Legal obligations on data quality under Regulation (EC) 763/2008
Quality monitoring by Eurostat of the data transmitted for the 2011 census is explicitly foreseen by EU census legislation Article 6(3) states that: In applying the quality assessment […] The Commission (Eurostat) shall assess the quality of the data transmitted

4 Legal obligations on data quality under Regulation (EC) 763/2008
This quality check is not strictly speaking a step of "data validation" Validation of census data is under the sole responsibility of Member States Article 5(2) states that: Member States shall provide the Commission (Eurostat) with final, validated and aggregated data and with metadata […]

5 Strategy for quality check
Check of census data Plausibility of the reported population count Coherence among the counts of the different statistical units covered by the census Analysis of the joint distribution of selected pairs of census variables Check of census metadata Completeness of the information provided Compliance with the requirements of the EU census legislation

6 Plausibility of the population count
Population estimates in the intercensal period: known difficulty in measuring migration flows (especially emigration) The size (and the age profile) of the post-census revision of population estimates shall be coherent with the observed discrepancies in the so-called "mirror statistics" of migration flows

7 Plausibility of the population count
Example Romania population at 1 January 2011 Light colour Pre-census Solid line Post-census

8 Coherence among counts of different statistical units
Some hypercubes allow an evaluation of the coherence among the statistical units enumerated in the census: HC49 – Private households by size (NUTS3) HC51 – Family nuclei by size (NUTS3) HC54 – Occupied conventional dwellings by number of occupants (NUTS2)

9 Coherence among counts of different statistical units
Country/region Population living in a conventional dwelling (reported figure) Figure implied by the distribution of dwellings by number of occupants Difference (%) Italy 58,921,309 58,930,087 0.0 Nord-Ovest 15,616,405 15,617,763 Piemonte 4,315,119 4,315,380 Valle D'Aosta 125,499 Liguria 1,554,507 1,554,596

10 Checks on the joint distribution of census variables
Main tool: interactive data visualization Major obstacle: high level of dimensionality of census hypercubes Check the joint distribution of two variables at a time Census data are frequency distributions (a.k.a. contingency tables)  mosaic plots Checks could be carried out on limited data samples from census hypercubes

11 Mosaic plots The area of each rectangle is proportional to the value (i.e. # of units) in each cell Mosaic plots are best viewed in two dimensions (tri-dimensional mosaic plot exist but are very difficult to interpret) Mosaic plot reveal clearly the typical pattern behind each bivariate distribution

12 Data samples from hypercubes
Example: hypercube HC02 EDU vs HST AGE vs EDU EDU vs CAS EDU vs POB EDU vs COC

14 Quality checks on census metadata
Requirements concerning census metadata contained in Reg. (EU) 1151/2010 Eurostat collected census metadata via three routes: Textual metadata following the ESMS standard, collected via the ESS Metadata Handler Supplementary on-line questionnaire for metadata information not fitting the ESMS structure Quantitative metadata or quality indicators, collected in "quality hypercubes" via the Census Hub

15 Quality checks on textual metadata
Checking the completeness of the information provided Verifying that each metadata item is informative enough to be published on-line Checking that the census operations as described in the metadata are compliant with the requirements of the EU census legislation: definitions used for the different variables data processing generation of census hypercubes

16 Quality checks on quality HC
Check that the information provided was in the correct format Additional checks may be put in place at a later stage Quantitative metadata to be used to assess of the census methodologies in Member States input for the foreseen revision of EU census legislation for the 2021 census round

17 Results of the quality checks
The overall evaluation of data/metadata problems allowed to divide countries into three groups: Countries with problems that could result in lack of compliance with the EU census legislation Countries with inconsistencies or missing information not in contrast with the EU legislation Countries without significant problems A detailed report on the results of the quality checks was sent to the NSI census teams

18 Results of the quality checks
Countries were asked to provide by 20 August 2014 a work plan to implement the corrections in their data/metadata The vast majority of countries presented significant problems either in data or in metadata As a consequence, Eurostat decided to refrain from widely publicising the Census Hub for the moment Request from some Member States to share the detailed results of the quality assessments among NSIs

19 Future steps The quality assessment implemented so far is not exhaustive New issues might arise, for instance, following user feedback Eurostat will implement additional quality checks for a small subset of census data that will be disseminated via Eurostat online database (using the EDIT validation software) Decisions about when publicizing the Census Hub will depend on the outcome of the ongoing quality assessment process Follow-up with selected Member States Problems encountered during the 2011 census might impact on decisions about future census rounds

