Download presentation
Presentation is loading. Please wait.
Published byDeborah Eaton Modified over 6 years ago
1
Editing and Imputing Income Data in the Integrated Census prepared by Yael Klejman Israel Central Bureau of Statistics UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Hague, April 2017
2
The 2008 Population Census Integrated Census:
Use of administrative files for demographic data, mainly the Central Population Registry Large sample survey (17% of households) Statistical correction of the addresses provided by the CPR Collecting socio-economic information
3
Income Data Administrative sources: Advantages:
Income Tax Authority for work income National Insurance Institute for allowances Advantages: Reduced burden on population Improved accuracy of data
4
Census scheme Group A Group B
5
Group A: reported in census as being in annual workforce but not found in Tax File
Missing employers Missing occupations (caretakers, military personnel) Rest of population
6
Imputation for Missing Employer
condition Employer is completely missing from 2008 Tax File Employee was employed by same employer in 2007 process Find employee in 2007 Tax file Adjust salary from 2007 to 2008 by industry result Insert into Socio Economic File
7
Missing occupations: caretakers and military personnel
Alternatives dismissed for caretakers: Cold-deck from Income Survey: absent; Nearest neighbor from Tax File: socio-economic characteristics not suitable. Alternative chosen for both: statistical imputation based on the Income Survey
8
Statistical imputation for caretakers and military personnel
Job extent, occupation Job extent, occupation, (industry) Job extent, occupation, (industry), age group
9
Rest of Population “Nearest neighbor” method using Canceis program developed in Canada At individual level Socio-economic variables: job extent, occupation, industry, highest education degree, gender, age group, residence locality, number of children in household, marital status Separate imputation for institution residents
10
Donor population Donor population: individuals in yearly workforce.
NOT included: kibbutz members, caretakers, military personnel Individuals earning highest income percentile of each occupation (2 digits)
11
Group B: reported as not in annual workforce but found in Tax File
73% income from salary 23% non-work related income Reason: Irregular employment pattern or response by proxy Decision: include work income Status added: “has income from work but reported as not in annual workforce”
12
Topcoding procedure Calculate interquartile range at locality level
Define threshold Calculate interquartile range at locality level Multiply by factor (urban=4, rural=3) ID records Identify records above threshold Minimum 3 records per locality Edit Calculate average of all top earners Replace income for those records
13
Allowances Based on Personal Identification Number, allowances were received from National Insurance Institute Eight types of allowances Number of months received Side file used to calculate variables in SEF at individual and household level
14
Results Records in workforce with income in income file 84.9%
Percent Imputation Type 84.9% Records in workforce with income in income file 5.3% Records with income in 2007 Income File 1.1% Imputation based on Income Survey for military personnel 1.6% Imputation based on Income Survey for caretakers 1.0% Nearest neighbor imputation for institution residents- Canseis 6.2% Nearest neighbor imputation – Canseis 100.0% Records in workforce
15
Evaluation Difference
Average income of records from Tax Authority (NIS) Average income of imputed records (NIS) Age group 29% 2167 2785 Under 20 11% 4870 5400 20-29 2%- 9185 8962 30-39 -2% 10493 10312 40-49 -5% 11164 10623 50-59 -8% 10529 9636 60 +
16
Maintain distribution Maintain statistics
Difference Average income of Records from Tax Authority (NIS) Average income of Imputed records (NIS) Occupation -7% 14416 13457 Academic professionals -2% 8280 8077 Associate professionals and technicians 1% 18927 19051 Managers 7% 7255 7755 Clerical workers 19% 5478 6531 Agents, sale workers and services workers 5% 7541 7945 Skilled agricultural workers -4% 7913 7561 Mechanics, electricians 6404 6257 Painters, tailors, printing workers, workers in food processing -10% 6864 6171 Drivers, ship deck crews, packaging machine operators, potters and glass makers 4703 4745 Unskilled workers Maintain distribution Maintain statistics
17
Evaluation (cont.) Maintain distribution Maintain statistics
Records from Tax Authority Imputed records Statistic 8696 8845 Mean 6031 6185 Median 4000 Mode 99 97 CV 8626 8618 Standard deviation 3.3 3.2 Skewness Maintain distribution Maintain statistics
18
Future Plans Examination of Multiple Imputation Method (MI): simultaneous imputation on several variables. The method maintains distribution. Foreigners living in Israel: inquiring administrative sources and develop models for income data.
19
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.