IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.

Slides:

Advertisements

Similar presentations

Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University

Advertisements

ICES III Montreal, June 18-21, 2007 A new Approach for Disclosure Control in the IAB Establishment Panel Multiple Imputation for Better Data Access Jörg.

Innovation data collection: Methodological procedures & basic forms Regional Workshop on Science, Technology and Innovation (STI) Indicators.

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.

Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,

Innovation Surveys: Advice from the Oslo Manual National training workshop Amman, Jordan October 2010.

Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:

Employment transitions over the business cycle Mark Taylor (ISER)

SERG Universidad de Huelva FACTORS OF BUSINESS SUCCESS IN ANDALUSIA.

Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.

9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.

Brian A. Harris-Kojetin, Ph.D. Statistical and Science Policy

Sampling Strategy for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.

Riku Salonen Regression composite estimation for the Finnish LFS from a practical perspective.

Building better dissemination systems for national development indicators Differences between national and international reported indicators Prepared by.

Increasing Survey Statistics Precision Using Split Questionnaire Design: An Application of Small Area Estimation 1.

GS/PPAL Section N Research Methods and Information Systems A QUANTITATIVE RESEARCH PROJECT - (1)DATA COLLECTION (2)DATA DESCRIPTION (3)DATA ANALYSIS.

Do Friends and Relatives Really Help in Getting a Good Job? Michele Pellizzari London School of Economics.

UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.

Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.

Improving Quality in the Office for National Statistics’ Annual Earnings Statistics Pete Brodie & Kevin Moore UK Office for National Statistics.

Active labour market instruments focusing on women Martina Maurer Department of active labour market policy for women PES Austria 15. November 2012.

Tax Subsidies for Out-of-Pocket Healthcare Costs Jessica Vistnes Agency for Healthcare Research and Quality William Jack Georgetown University Arik Levinson.

Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.

Multiple Imputation Approaches for Right-Censored Wages in the German IAB Employment Register European Conference on Quality in Official Statistics 2008,

12th Meeting of the Group of Experts on Business Registers

© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.

Anna Lovász Institute of Economics Hungarian Academy of Sciences June 30, 2011.

Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

1 Multiple Imputation : Handling Interactions Michael Spratt.

User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences

Influence of vocational training on wages and mobility of workers - evidence from Poland Jacek Liwiński Faculty of Economic Sciences, University of Warsaw.

ICT, Corporate Restructuring and Productivity Laura Abramovsky Rachel Griffith IFS and UCL ZEW – November 2007 Workshop on Innovative Capabilities and.

Alternative Methods of Unit Nonresponse Weighting Adjustments: An Application from the 2003 Survey of Small Business Finances * Lieu N. Hazelwood, Traci.

1 S T A T A U S E R S G R O U P M E E T I N G SEPTEMBER Multiple Imputation for households surveys A comparison of methods Stata Users Group Meeting.

Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

HAOMING LIU JINLI ZENG KENAN ERTUNC GENETIC ABILITY AND INTERGENERATIONAL EARNINGS MOBILITY 1.

Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.

National design, fieldwork and data harmonization for Labour Force Survey Irena Svetin Statistical Office of the Republic of Slovenia September 2014.

Using administrative registers in sample surveys European Conference on Quality in Official Statistics 3-6 May 2010 Kaja Sõstra Statistics Estonia.

Some aspects concerning analytical validity and disclosure risk of CART generated synthetic data Hans-Peter Hafner and Rainer Lenz Research Data Centre.

SAMPLE SELECTION in Earnings Equation Cheti Nicoletti ISER, University of Essex.

Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.

Panel Analysis of NPOs in Germany Design and Preliminary Results Lutz Bellmann Christian Hohendanner André Pahnke Third International Conference on Establishment.

Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling and Sampling Distributions.

The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.

Anonymization of longitudinal surveys in the presence of outliers Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences

Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.

Employment and Unemployment in the Recent Recession: Some German Institutions Revisited ICRIER Workshop New York University’s Stern School of Business.

Improving of Household Sample Surveys Data Quality on Base of Statistical Matching Approaches Ganna Tereshchenko Institute for Demography and Social Research,

Item-Non-Response and Imputation of Labor Income in Panel Surveys: A Cross-National Comparison ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL.

© Statistisches Bundesamt, VI A Statistisches Bundesamt The new method of the next german Population census Johann Szenzenstein, Federal Statistical Office,

Forecasting the labor market needs of workforce skills Budapest 26 February 2014.

Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.

R&D statistics in Denmark organization of data collection, and dissemination of R&D statistics.

September 2005Winterhager/Heinze/Spermann1 Deregulating Job Placement in Europe: A Microeconometric Evaluation of an Innovative Voucher Scheme in Germany.

Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.

11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester

Multiple Imputation using SOLAS for Missing Data Analysis

Labour Price Index Labour Market Statistics (LAMAS) Working Group

Ben Kriechel Economix Research & Consulting München

Representative sampling Overview of the questions received by the ESF Data Support Centre Alphametrics Ltd. & Applica Sprl. Brussels, 13 March 2015.

Federal Statistical Office Germany Research Data Centre

Implementation of the Bayesian approach to imputation at SORS Zvone Klun and Rudi Seljak Statistical Office of the Republic of Slovenia Oslo, September.

Fractional-Random-Weight Bootstrap

Jerome Reiter Department of Statistical Science Duke University

Presentation transcript:

IAB homepage: Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the IAB Establishment Panel – Multiple Imputation for a Better Data Access Jörg Drechsler Competence Center for Empirical Methods Institute for Employment Research of the Federal Employment Agency, Germany UNECE Work Session on Statistical Data Editing Bonn

Slide 2 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Slide 3 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel  Annually conducted Establishment Survey (generally face- to-face interviews)  Since 1993 in Western Germany, since 1996 in Eastern Germany  Population: All establishments with at least one employee covered by social security  Source: Official Employment Statistics  Response rate of repeatedly interviewed establishments more than 80%

Slide 4 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel: Sample/Weighting  Sample of more than establishments in the last wave  Stratified sample: 20 economic branches x 10 size classes  Oversampling of large establishments  Yearly additional samples: newly founded firms and replacements for panel attrition  Weighting: -inverse sampling probabilities -adjustment to exogenous values -probabilities to stay in the sample

Slide 5 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel: Contents  Annual: employment structure, changes in employment, business policies, investment, training, remuneration, working hours, collective wage agreements, works councils  Bi- or triennial: innovations, government aid, further training, flexibility of working hours, business activities, contact with employment offices  Focus: 2001 innovation and modern technologies 2002 elderly employees and contact to the labour offices  Kölling, A. (2000): The IAB-Establishment Panel, Journal of Appl. Social Science Studies, 120: 2,

Slide 6 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Slide 7 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (1)Fully Synthetic Data  Proposed by Rubin (1993)  Idea:-Treat all the units from the population not included in the sample as missing data and impute them multiply -Take random samples from the imputed population and release these samples to the public. Y exc Y inc X X variables available for all units in the population Yvariables available only for units in the survey Y inc units included in the survey Y exc units not included in the survey

Slide 8 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (2)Imputation of Selected Variables  Only for variables that bear a high risk of disclosure (key variables) observed values are replaced by imputed values  Proposal: Replace only parts of each key variable in every imputation round and combine the imputed parts to achieve fully imputed variables.  Example: 3 variables and 3 imputation rounds

Slide 9 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (3)Selective Multiple Imputation of Key Variables (SMIKe)  Suggested by Liu and Little (2002)  Only selected units of key variables are multiply imputed  Assume, the dataset can be divided in a set of categorical key variables X and a set of continuous variables Y  Cross tabulation of X yields the vector x containing cell counts for all combinations of x  Cell counts lower than a previously defined sensitivity threshold possibly allow re-identification  These cells combined with some non sensitive cells, closely related to the sensitive cells in regard to Y, are replaced by imputed values

Slide 10 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Slide 11 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Generating a synthetic data set  Create a synthetic data set for selected variables from the wave 1997 from the Establishment Panel  Imputation for the whole population is not feasible  Draw a new sample from the Official Employment Statistics using the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region)  Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel  Additional Information from the German Social Security Data (GSSD) for the imputation missing data data from the new sample data from the IAB Establishment Panel Y exc Y inc X

Slide 12 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The German Social Security Data (GSSD)  Contains information on all employees covered by social security  Since 1973 all employers are required to notify the social security agencies about all employees covered by social security.  The GSSD represents about 80% of the German workforce  Information from the GSSD is aggregated on the establishment level and is matched to the IAB Establishment Panel via establishment identification number  Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…

Slide 13 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Imputation procedure  For simplicity new founded establishments are excluded from the sampling frame and from the panel  8 new samples are drawn  The number of observations in each sample equals the number of observations in the panel n s =n p =7332  Every sample is imputed five times using chained equations  Number of variables in X=24  Number of variables in Y=48  Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)

Slide 14 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Slide 15 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 A regression by T. Zwick (2005) as a means of evaluation  Zwick analyses the productivity effects of different continuing vocational training forms in Germany  Results: vocational training is one of the most important measures to gain and keep productivity  Probit regression to explain, why firms offer vocational training  13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected…  2 variables, based on the 1998 wave of the panel, are dropped for the evaluation

Slide 16 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Binary variables in the original and in the synthetic data set Variablesurvey mean synthetic data mean Deviation Training Yes/No % Redundancies expected % Many employees are expected to be on maternity leave % High qualification needs expected % Establishment size % Establishment size % Establishment size % Establishment size % Collective wage agreement % Apprenticeship training reaction on skill shortages % Training reaction on skill shortages % State-of-the-art technical equipment % Apprenticeship training %

Slide 17 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Continuous variables in the original and in the synthetic dataset Variable Survey mean synthetic data mean Deviation Share of qualified employees % number of employees % number of employees that participated in training measures %

Slide 18 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Results from the regression Regression as performed by T. Zwick (n=6,258) Exogenous variablesCoefficientsz-value Redundancies expected Emp. exp. on maternity leave High qualification needs expected Appr. tr. react. on skill shortages Tr. reaction on skill shortages Establishment size Establishment size Establishment size Establishment size Share of qualified employees State-of-the-art tech. equipment Collective wage agreement Apprenticeship training Regression with all missing data imputed (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected Emp. Exp. on maternity leave High qual. needs expected Appr. tr. react. on skill shortages Tr. reaction on skill shortages Establishment size Establishment size Establishment size Establishment size Share of qualified employees State-of-the-art tech. equipment Collective wage agreement Apprenticeship training

Slide 19 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Complete data set and synthetic data set Regression with all missing data imputed (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected Emp. exp. on maternity leave High qual. needs expected Appr. tr. react. on skill shortages Tr. reaction on skill shortages Establishment size Establishment size Establishment size Establishment size Share of qualified employees State-of-the-art tech. equipment Collective wage agreement Apprenticeship training Regression on the synthetic data (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected Many emp. exp. on maternity leave High qualification needs expected Appr. tr. react. on skill shortages Training reaction on skill shortages Establishment size Establishment size Establishment size Establishment size Share of qualified employees State-of-the-art technical equipment Collective wage agreement Apprenticeship training

Slide 20 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Slide 21 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Proceedings/Open Questions  Use non parametric approaches  Replace only selected variables  Measure the disclosure risk after imputation  Generate weights for the synthetic sample?

Slide 22 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Thank you for the attention!

Slide 23 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Rubin’s adjusted combining rules Imputation yields m different data sets Information from the data sets has to be combined to get valid estimates Point Estimate: Average of the point estimates from the different data sets Variance estimate as a combination of the variance within the data sets (W) and the variance between the data sets (B) ( not ) with Additional sampling step necessary, when creating synthetic data sets variance B already reflects the variance within each population

Slide 24 Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Information from the two data sets