Editing and Imputing Income Data in the 2008 Integrated Census prepared by Yael Klejman Israel Central Bureau of Statistics UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Hague, April 2017
The 2008 Population Census Integrated Census: Use of administrative files for demographic data, mainly the Central Population Registry Large sample survey (17% of households) Statistical correction of the addresses provided by the CPR Collecting socio-economic information
Income Data Administrative sources: Advantages: Income Tax Authority for work income National Insurance Institute for allowances Advantages: Reduced burden on population Improved accuracy of data
Census scheme Group A Group B
Group A: reported in census as being in annual workforce but not found in Tax File Missing employers Missing occupations (caretakers, military personnel) Rest of population
Imputation for Missing Employer condition Employer is completely missing from 2008 Tax File Employee was employed by same employer in 2007 process Find employee in 2007 Tax file Adjust salary from 2007 to 2008 by industry result Insert into Socio Economic File
Missing occupations: caretakers and military personnel Alternatives dismissed for caretakers: Cold-deck from Income Survey: absent; Nearest neighbor from Tax File: socio-economic characteristics not suitable. Alternative chosen for both: statistical imputation based on the Income Survey
Statistical imputation for caretakers and military personnel Job extent, occupation Job extent, occupation, (industry) Job extent, occupation, (industry), age group
Rest of Population “Nearest neighbor” method using Canceis program developed in Canada At individual level Socio-economic variables: job extent, occupation, industry, highest education degree, gender, age group, residence locality, number of children in household, marital status Separate imputation for institution residents
Donor population Donor population: individuals in yearly workforce. NOT included: kibbutz members, caretakers, military personnel Individuals earning highest income percentile of each occupation (2 digits)
Group B: reported as not in annual workforce but found in Tax File 73% income from salary 23% non-work related income Reason: Irregular employment pattern or response by proxy Decision: include work income Status added: “has income from work but reported as not in annual workforce”
Topcoding procedure Calculate interquartile range at locality level Define threshold Calculate interquartile range at locality level Multiply by factor (urban=4, rural=3) ID records Identify records above threshold Minimum 3 records per locality Edit Calculate average of all top earners Replace income for those records
Allowances Based on Personal Identification Number, allowances were received from National Insurance Institute Eight types of allowances Number of months received Side file used to calculate variables in SEF at individual and household level
Results Records in workforce with income in income file 84.9% Percent Imputation Type 84.9% Records in workforce with income in income file 5.3% Records with income in 2007 Income File 1.1% Imputation based on Income Survey for military personnel 1.6% Imputation based on Income Survey for caretakers 1.0% Nearest neighbor imputation for institution residents- Canseis 6.2% Nearest neighbor imputation – Canseis 100.0% Records in workforce
Evaluation Difference Average income of records from Tax Authority (NIS) Average income of imputed records (NIS) Age group 29% 2167 2785 Under 20 11% 4870 5400 20-29 2%- 9185 8962 30-39 -2% 10493 10312 40-49 -5% 11164 10623 50-59 -8% 10529 9636 60 +
Maintain distribution Maintain statistics Difference Average income of Records from Tax Authority (NIS) Average income of Imputed records (NIS) Occupation -7% 14416 13457 Academic professionals -2% 8280 8077 Associate professionals and technicians 1% 18927 19051 Managers 7% 7255 7755 Clerical workers 19% 5478 6531 Agents, sale workers and services workers 5% 7541 7945 Skilled agricultural workers -4% 7913 7561 Mechanics, electricians 6404 6257 Painters, tailors, printing workers, workers in food processing -10% 6864 6171 Drivers, ship deck crews, packaging machine operators, potters and glass makers 4703 4745 Unskilled workers Maintain distribution Maintain statistics
Evaluation (cont.) Maintain distribution Maintain statistics Records from Tax Authority Imputed records Statistic 8696 8845 Mean 6031 6185 Median 4000 Mode 99 97 CV 8626 8618 Standard deviation 3.3 3.2 Skewness Maintain distribution Maintain statistics
Future Plans Examination of Multiple Imputation Method (MI): simultaneous imputation on several variables. The method maintains distribution. Foreigners living in Israel: inquiring administrative sources and develop models for income data.
Thank you! yaelkl@cbs.gov.il www.cbs.gov.il