Administrative Data and statistical matching

Slides:



Advertisements
Similar presentations
Eurostat T HE E UROPEAN PROCESS OF ENHANCING ACCESS TO E UROSTAT DATA A LEKSANDRA B UJNOWSKA E UROSTAT.
Advertisements

Access to fiscal data for scientific uses in France Lorraine Aeberhardt (DGFiP bureau GF-3C) Michel Isnard (DwB – Insee)
HOW TO MEASURE EXTREME POVERTY IN THE EU SECONDARY ANALYSIS 22 September 2009.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
S T A T I S T I C S A U S T R I A May 13th – 15th Register Based Census “The Austrian Principles of Redundancy” UNECE/Eurostat.
USAGE OF ADMINISTRATIVE DATA IN EU-SILC SURVEY Signe Bāliņa University of Latvia.
Estimation of preliminary unemployment rates by means of multiple imputation UN/ECE-Work Session on Data Editing Vienna, April 2008 Thomas Burg, Statistics.
Using microsimulation model to get things right: a wage equation for Poland Leszek Morawski, University of Warsaw Michał Myck, DIW - Berlin Anna Nicińska,
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
Overview and challenges in the use of administrative data in official statistics IAOS Conference Shanghai, October 2008 Heli Jeskanen-Sundström Statistics.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
Johannesburg, October 2010 THE CHILEAN INTEGRATED SOCIAL INFORMATION SYSTEM (SIIS)
The Targeted Negative Income Tax (TNIT) in Germany: Evidence from a quasi-experiment European Econonomic Association Amsterdam, 27 August 2005 Alexander.
4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.
Looking for statistical twins
Budapest, Hungary, 21 October, 2016 CESS 2016
Evaluating the potential for moving away from a traditional census Becky Tinsley Office for National Statistics (ONS), UK.
DISSEMINATION AND FUEL POVERTY STATISTICS
Peter Linde, Interviewservice Statistics Denmark
Deriving a reliable measure of household income – DWP
Living Standards & Inequality
Development of Strategies for Census Data Dissemination
Dominik Rozkrut Central Statistical Office of Poland
Unit 1 Introduction to Business
Redesigning French structural business statistics, using more administrative data ICESIII, Montréal, june 2007.
Statistics Netherlands Division Social and Spatial Statistics
The Development of Statistical Business Registers in
Conducting of EU - SILC in the Republic of Macedonia, 2010
11/13/2018 Poverty and Deprivation in Central Europe: Concepts, Measurement and Application Frank (FH) Flinterman Faculty of Spatial Sciences University.
Chapter Eight: Quantitative Methods
Development of a framework for use of administrative data
CAPACITY DEVELOPMENT THROUGH SYSTEMS USE, RESULTS AND sustainable development goals Workshop on New Approaches to Statistical Capacity Development,
1.2 Sampling LEARNING GOAL
All expressed opinions are of the authors
Quality Aspects and Approaches in Business Statistics
Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017
The European Statistical Training Programme (ESTP)
Jonathan Bradshaw, Antonia Keung and Yekaterina Chzhen
Do local social problems need centralized statistics?
Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia
Innovative Public Services Group (IPSG) – 2nd Meeting
European Econonomic Association Amsterdam, 27 August 2005
Richard Heuberger, Nadja Lamei Statistics Austria
Motseoa Molahlehi 24-28th September, 2018
Italian situation in the following areas:
2009 Household Income Survey
Michal Horváth, Zuzana Siebertová Meeting of the Network of EU IFIs Workshop on Microsimulation Rome, 4th May, 2018.
Point 2.1 of the agenda: net monthly income of the household
Technical Coordination Group for the next Census round in South East Europe EUROSTAT PREPARATION FOR CENSUS 2020 MONTENEGRO Budapest Jun 2017.
Matching and Industry Coding
DEVELOPMENT OF IMPUTATION MODEL FOR SMALL ENTERPRISES
EU-SILC: The reference for income distribution Boyan GENEV
The change of data sources in the Spanish SILC
ESDS Workshop on best practices
LAMAS Working Group 6-7 December 2017
Member States' starting points for modernisation: results of a survey
Dr. Richard K. Mugambe Makerere University School of Public Health
Environmental Protection Expenditure Accounts
Better regulation working group
Annelies De Schrijver 16/10/ Warsaw
Directors of Social Statistics (DSS) 1-2 Mars 2018
Effectiveness of Minimum Income Schemes in the reduction of poverty
Business architecture
Multi-Mode Data Collection
Chapter 5: The analysis of nonresponse
Public Finance: Expenditures and Taxes
Compliance for statistics
Modernization of Social statistics: integrated use of survey and
Workshop on best practices for EU-SILC revision, −
Stratification, calibration and reducing attrition rate in the Dutch EU-SILC Judit Arends.
Presentation transcript:

Administrative Data and statistical matching A. Wegscheider-Pichler Unit Analysis Brussels 13th March 2019 Study: Environmental conditions and environmental behavior with respect to household income

2018 Project by Statistics Austria Income is a crucial factor for environmental conditions and behaviour but the data base micro-census has no income variable (except unemployment income) 2014 pilot study with data for 2011 generated total household income by statistical matching with data from EU-SILC 2018/19 follow up study with data for 2015 to validate the 2014 results BUT now around 87 % of the income components were collected from administrative sources (register data) The missing income parts were again filled by „Statistical Matching“ with data from EU-SILC

Register Data at Statistics Austria Each register that has to provide data for Statistics Austria delivers them with a so-called "branch-specific personal identification number for official statistics” (bPIN OS) Responsible for the bPIN system is the Austrian Ministry of Digitalization (not Statistics Austria itself) The bPIN OS can only be decrypted by Statistics Austria and does not allow for drawing any conclusions about the persons themselves  Statistics Austria receives completely anonymized data sets that can be linked with each other with the help of the bPIN OS

Administrative Data (87%) Around 87% of household income data stem from register sources Income components for wages, pensions, unemployment payment…. are generated that way to ensure compliance with the confidentiality guidelines, the register data is assigned with an encrypted 172-digit personal key, (decrypted 28 digits) the so-called "branch-specific personal identification number for official statistics“ - bPIN OS Procedure for linking administrative data follows that of EU-SILC: Income tax, payroll tax, social security, housing subsidies, Gross income (unemployment, pension...), other income components

Matching Process (13%) Still missing: income from self-employed, housing assistance… Identifying relevant socio-demographic matching variables and harmonizing them in the data sets Administrative income variables were used for generating income in the micro-census dataset AND used for Statistical Matching Random Forest Modell - Machine-learning algorithm in which several (in this case 1,500) decision trees were created randomly Each of these decision trees provided one estimate of income, these estimates were then averaged The decision trees used random samples of data from EU-SILC

Matching Process (13%) - Problem For very low and very high income there were relevant deviations from the data generated compared to the actual income! But: one focus of the study is on households at-risk-of-poverty = households with low income Problem: how get a better fit for these incomes?

Matching Process (13%) - Solution In a second run, first only the households at-risk-of-poverty were estimated for the micro-census data (dichotomous variable) This variable then was used as an additional regressor for the regression trees This led to a better fit on the case level Substantial improvements for the poor and the rich households (equivalised household income) were achieved Part of the deviations can be explained by the different structure of the two samples

Income Variable – Data Check

Results - verifying the matching (Example) Quality of life by equivalised household income S: Micro Census Environment 2015

Further descriptive outcomes Households with low income report higher disturbance by noise than households with high income Households with high income report more often to buy organic food Public transport is less attractive for households with high income Households with low income use private transport (by car) less often than households with medium or high income

Findings Use of administrative data to generate income components improves data quality, BUT: not all relevant data is available Statistical matching is fast, reduces the response burden and therefore the costs Mixed use of both methods is the best way to generate new variables such as the household income: - get as much variables as possible from register sources - close the gap for the missing income components What always matters: structure of the samples