Download presentation
Presentation is loading. Please wait.
Published byArabella Owens Modified over 9 years ago
1
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of a sampling strategy in the 2011 Italian Population Census Giancarlo Carbonetti, Mariangela Verrascina Istat – Italian National Institute of Statistics Division for General Censuses Geneva, October 29th 2009
2
Joint UNECE Eurostat Meeting 2 Why do we adopt sampling techniques in the Italian Census? Sampling is crucial for the new census strategy. The main solutions proposed are related to: use of population registers; census forms mail out; mixed mode of data collection. A high response rate is needed. The 2011 population census has been planned in order to: improve the efficiency of the survey operations; reduce the workload of the municipalities; minimize the statistical burden for the people.
3
Joint UNECE Eurostat Meeting 3 Which effects by adopting a sampling technique? To keep high level of quality (reducing non-sampling error sources) → this is an opportunity → this is a constraint Timeliness for a smaller amount of data to process (hypercubes must be delivered to Eurostat by 1 April 2014) → this is a constraint Advantages Disadvantages Introduction of sampling error → an evaluation of accuracy of the sampling estimates is required
4
Joint UNECE Eurostat Meeting 4 The framework POPULATION: private households. LISTS: population registers managed by the municipalities. VARIABLES: non-demographic variables. DOMAINS: Census Areas. DIFFERENT STRATEGY: municipality demographic thresholds. A simulation study has been conducted in order to define which methodology performs the most accurate estimation.
5
Joint UNECE Eurostat Meeting 5 Simple Random Sampling of HOUseholds (SRSHOU) from population registers. Area Frame Sampling where reliable population registers are not available. Calibrated Estimators. Definition of Census Areas of about 15,000 inhabitants. Sampling Ratio of 33%. Some results of the simulation study
6
Joint UNECE Eurostat Meeting 6 Distribution of average and maximum cv% for classes of cell counts for three tested sampling ratios (SRSHOU design) - for cells of 1,000 units cv is about 4% - for cells of 100 units cv is about 13% - for cells of 10 units cv is about 40%
7
Joint UNECE Eurostat Meeting 7 Curves of sampling errors drawn by the simulation results
8
Joint UNECE Eurostat Meeting 8 Relevant issue Which is the impact of the sampling error on the dissemination hypercubes? The answer is the core of this presentation where the impact of the sampling strategy on the final results will be carefully explained.
9
Joint UNECE Eurostat Meeting 9 “When can the quality of the statistical table be considered acceptable?” Example 1: if less than 1/3 of cell counts have a cv>12.5% Example 2: if less than 10% of persons are classified in cell counts where cv>12.5% Impact of sampling errors on dissemination hypercubes For a fixed cv (for instance, a critical level should be 12.5%), the global quality of a dissemination hypercube can be acceptable: → if the percentage of cell counts estimated with a cv higher than the critical value is low; → if the percentage of persons classified in those cells is low. Having chosen the sampling strategy (SRSHOU; calibrated estimator), for an area and a dissemination hypercube:
10
Joint UNECE Eurostat Meeting 10 Evaluation of the sets of estimates with critical accuracy by means of a sampling errors curve cv_max 12.5% TSTS Set of estimates with cv>12.5% High sampling errors Absolute frequencies estimated with a critical quality Set of estimates with cv<12.5% Absolute Frequencies T cv critical threshold sampling error The lower the amount of information estimated with high levels of cv (referred to persons classified in cells with absolute frequencies lower than the threshold T S ), the higher the quality of the related dissemination hypercubes.
11
Joint UNECE Eurostat Meeting 11 Evaluations are related to 8 Eurostat hypercubes crossing demographic variables with one or more long form variables and referred to NUTS level 2. The considered hypercubes contain topics with breakdowns used in 2001 Italian Census dissemination, close (in terms of number and information content) to breakdowns to be provided for the next census round. The number of cells goes from 1,000 to more than 20,000 depending on the complexity of the statistical table. Quality evaluations for hypercubes at NUTS level2
12
Joint UNECE Eurostat Meeting 12 Each non-demographic variable has been individually crossed with sex and age (single ages). Hypercubes at NUTS level2 considered in the study (draft version, April 2009) More than one non-demographic variables have been crossed with sex and age (age classes). Hypercube computations are simulated with 2001 Census data Long Form variables
13
Joint UNECE Eurostat Meeting 13 Number of potential cells and acceptable cells for each hypercube considered in the study Number of potential cells = the product of the number of categories Number of acceptable cells = the number of potential cells without “structural zeros”
14
Joint UNECE Eurostat Meeting 14 Indicators of global accuracy Two indicators are proposed to measure the global accuracy of census data produced by adopting a sampling strategy and referred to a dissemination hypercube: 1) Percentage of critical cells = number of cell counts (>0) lower than the critical threshold T s / number of acceptable cells 2) Percentage of persons in critical cells = persons classified in critical cells / total of persons In particular, the second indicator quantifies the percentage of people classified in cells which will be estimated with a low accuracy (10% could be considered a tolerable limit).
15
Joint UNECE Eurostat Meeting 15 Example 1: Hypercube H.B1.E1.R3. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Sicilia MoliseMarcheSicilia Sampling ratio threshold Ts % of critical cells % of persons in critical cells threshold Ts % of critical cells % of persons in critical cells threshold Ts % of critical cells % of persons in critical cells 10%10079.210.725078.86.950075.94.2 20%5071.05.810068.43.025068.72.1 33%3063.63.45059.81.510059.41.0 Hypercube H.B1.E1.R3: sex (2) by age (21) by current activity status (6) by industry (17) by educational attainment (7). Number of acceptable cells = 5,574 (no structural zeros). The cells are critical if the related absolute frequency is lower than the threshold T S observed in correspondence of cv_max =12.5%.
16
Joint UNECE Eurostat Meeting 16 Example 2: Hypercube H.B1.E1.R4. Quality indicators related to NUTS2 areas of Italy: Molise, Marche and Sicilia MoliseMarcheSicilia Sampling ratio threshold Ts % of critical cells % of persons in critical cells threshold Ts % of critical cells % of persons in critical cells threshold Ts % of critical cells % of persons in critical cells 10%10091.914.925091.111.250091.87.3 20%5086.59.310084.46.025087.64.5 33%3081.36.55077.13.410079.42.2 Hypercube H.B1.E1.R4: sex (2) by age (13) by occupation (10) by industry (17) by educational attainment (7). Number of acceptable cells = 26,350 (no structural zeros). The cells are critical if the related absolute frequency is lower than the threshold T S observed in correspondence of cv_max =12.5%.
17
Joint UNECE Eurostat Meeting 17 Expected quality for hypercubes at NUTS level2 Distribution of all 20 Italian Nuts2 areas by percentage of persons classified in critical cells for the Eurostat hypercubes considered in the study and the three tested sampling ratios. * * * * * * * * *
18
Joint UNECE Eurostat Meeting 18 Concluding remarks The adoption of a sampling strategy doesn’t seem to bring a reduction of accuracy. The sampling error could have a considerable impact only to estimate very small frequencies. NUTS2 hypercubes with different complexity could be estimated with good accuracy even for lower sampling ratios. The revised version of the hypercubes considered in the work seems to be less detailed. This will hopefully bring more accuracy.
19
Joint UNECE Eurostat Meeting 19 Some solutions to enhance accuracy Adopting small area estimators. Increasing the set of variables to be observed on the whole population, reducing the set of variables that have to be surveyed on samples of households: adoption of a medium/long form. Enhancements of estimates regarding rare events and small domains in order to increase their efficiency and to reduce the number of critical cells.
20
Joint UNECE Eurostat Meeting 20 Thank you for your attention. carbonet@istat.it - verrasci@istat.it
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.