IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.

IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the IAB Establishment Panel – Multiple Imputation for a Better Data Access Jörg Drechsler Competence Center for Empirical Methods Institute for Employment Research of the Federal Employment Agency, Germany UNECE Work Session on Statistical Data Editing Bonn 25.09.2006-27.09.2006

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Overview  The IAB Establishment Panel  Three approaches for disclosure control via multiple imputation  Application of the full MI approach to the IAB Establishment Panel  First results  Proceedings/open questions

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel  Annually conducted Establishment Survey (generally face- to-face interviews)  Since 1993 in Western Germany, since 1996 in Eastern Germany  Population: All establishments with at least one employee covered by social security  Source: Official Employment Statistics  Response rate of repeatedly interviewed establishments more than 80%

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel: Sample/Weighting  Sample of more than 16.000 establishments in the last wave  Stratified sample: 20 economic branches x 10 size classes  Oversampling of large establishments  Yearly additional samples: newly founded firms and replacements for panel attrition  Weighting: -inverse sampling probabilities -adjustment to exogenous values -probabilities to stay in the sample

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The IAB Establishment Panel: Contents  Annual: employment structure, changes in employment, business policies, investment, training, remuneration, working hours, collective wage agreements, works councils  Bi- or triennial: innovations, government aid, further training, flexibility of working hours, business activities, contact with employment offices  Focus: 2001 innovation and modern technologies 2002 elderly employees and contact to the labour offices  Kölling, A. (2000): The IAB-Establishment Panel, Journal of Appl. Social Science Studies, 120: 2, 291-300.

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (1)Fully Synthetic Data  Proposed by Rubin (1993)  Idea:-Treat all the units from the population not included in the sample as missing data and impute them multiply -Take random samples from the imputed population and release these samples to the public. Y exc Y inc X X variables available for all units in the population Yvariables available only for units in the survey Y inc units included in the survey Y exc units not included in the survey

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (2)Imputation of Selected Variables  Only for variables that bear a high risk of disclosure (key variables) observed values are replaced by imputed values  Proposal: Replace only parts of each key variable in every imputation round and combine the imputed parts to achieve fully imputed variables.  Example: 3 variables and 3 imputation rounds

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 (3)Selective Multiple Imputation of Key Variables (SMIKe)  Suggested by Liu and Little (2002)  Only selected units of key variables are multiply imputed  Assume, the dataset can be divided in a set of categorical key variables X and a set of continuous variables Y  Cross tabulation of X yields the vector x containing cell counts for all combinations of x  Cell counts lower than a previously defined sensitivity threshold possibly allow re-identification  These cells combined with some non sensitive cells, closely related to the sensitive cells in regard to Y, are replaced by imputed values

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Generating a synthetic data set  Create a synthetic data set for selected variables from the wave 1997 from the Establishment Panel  Imputation for the whole population is not feasible  Draw a new sample from the Official Employment Statistics using the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region)  Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel  Additional Information from the German Social Security Data (GSSD) for the imputation missing data data from the new sample data from the IAB Establishment Panel Y exc Y inc X

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 The German Social Security Data (GSSD)  Contains information on all employees covered by social security  Since 1973 all employers are required to notify the social security agencies about all employees covered by social security.  The GSSD represents about 80% of the German workforce  Information from the GSSD is aggregated on the establishment level and is matched to the IAB Establishment Panel via establishment identification number  Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Imputation procedure  For simplicity new founded establishments are excluded from the sampling frame and from the panel  8 new samples are drawn  The number of observations in each sample equals the number of observations in the panel n s =n p =7332  Every sample is imputed five times using chained equations  Number of variables in X=24  Number of variables in Y=48  Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 A regression by T. Zwick (2005) as a means of evaluation  Zwick analyses the productivity effects of different continuing vocational training forms in Germany  Results: vocational training is one of the most important measures to gain and keep productivity  Probit regression to explain, why firms offer vocational training  13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected…  2 variables, based on the 1998 wave of the panel, are dropped for the evaluation

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Binary variables in the original and in the synthetic data set Variablesurvey mean synthetic data mean Deviation Training Yes/No0.70690.72292.25% Redundancies expected0.22390.1880-16.01% Many employees are expected to be on maternity leave0.06440.081125.84% High qualification needs expected0.15510.175212.95% Establishment size 20-1990.39730.40923.00% Establishment size 200-4990.13480.14507.57% Establishment size 500-9990.07450.07774.29% Establishment size 1000+0.09420.09915.17% Collective wage agreement0.76430.7562-1.06% Apprenticeship training reaction on skill shortages0.36320.37252.58% Training reaction on skill shortages0.44900.46934.52% State-of-the-art technical equipment0.65130.70958.94% Apprenticeship training0.61410.63984.17%

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Continuous variables in the original and in the synthetic dataset Variable Survey mean synthetic data mean Deviation Share of qualified employees0.67410.6236-7.49% number of employees365.6238356.1432-2.59% number of employees that participated in training measures110.294488.2385-20.00%

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Results from the regression Regression as performed by T. Zwick (n=6,258) Exogenous variablesCoefficientsz-value Redundancies expected0.26104.58 Emp. exp. on maternity leave0.25162.49 High qualification needs expected0.64078.1 Appr. tr. react. on skill shortages0.17633.4 Tr. reaction on skill shortages0.597411.91 Establishment size 20-1990.682715.19 Establishment size 200-4991.351415.71 Establishment size 500-9991.398411.75 Establishment size 1000+1.97259.15 Share of qualified employees0.766310.28 State-of-the-art tech. equipment0.17554.16 Collective wage agreement0.24505.46 Apprenticeship training0.41999.31 Regression with all missing data imputed (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected0.24914.62 Emp. Exp. on maternity leave0.26572.82 High qual. needs expected0.64838.76 Appr. tr. react. on skill shortages0.11422.05 Tr. reaction on skill shortages0.52709.92 Establishment size 20-1990.686616.01 Establishment size 200-4991.355517.22 Establishment size 500-9991.347512.78 Establishment size 1000+1.962210.13 Share of qualified employees0.779311.21 State-of-the-art tech. equipment0.16944.3 Collective wage agreement0.25355.82 Apprenticeship training0.484111.24

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Complete data set and synthetic data set Regression with all missing data imputed (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected0.24914.62 Emp. exp. on maternity leave0.26572.82 High qual. needs expected0.64838.76 Appr. tr. react. on skill shortages0.11422.05 Tr. reaction on skill shortages0.52709.92 Establishment size 20-1990.686616.01 Establishment size 200-4991.355517.22 Establishment size 500-9991.347512.78 Establishment size 1000+1.962210.13 Share of qualified employees0.779311.21 State-of-the-art tech. equipment0.16944.3 Collective wage agreement0.25355.82 Apprenticeship training0.484111.24 Regression on the synthetic data (n=7,332) Exogenous variablesCoefficientsz-values Redundancies expected0.27644.71 Many emp. exp. on maternity leave0.23732.78 High qualification needs expected0.63089.15 Appr. tr. react. on skill shortages0.14422.66 Training reaction on skill shortages0.556610.69 Establishment size 20-1990.546612.65 Establishment size 200-4991.031314.37 Establishment size 500-9991.142510.40 Establishment size 1000+1.23319.89 Share of qualified employees0.86929.98 State-of-the-art technical equipment0.20415.00 Collective wage agreement0.31177.10 Apprenticeship training0.465510.81

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Proceedings/Open Questions  Use non parametric approaches  Replace only selected variables  Measure the disclosure risk after imputation  Generate weights for the synthetic sample?

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Thank you for the attention!

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Rubin’s adjusted combining rules Imputation yields m different data sets Information from the data sets has to be combined to get valid estimates Point Estimate: Average of the point estimates from the different data sets Variance estimate as a combination of the variance within the data sets (W) and the variance between the data sets (B) ( not ) with Additional sampling step necessary, when creating synthetic data sets variance B already reflects the variance within each population

Institut für Arbeitsmarkt- und Berufsforschung/Institue for Employment Research Jörg Drechsler 26. September 2006 Information from the two data sets

IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.

Similar presentations

Presentation on theme: "IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.

Similar presentations

Presentation on theme: "IAB homepage: www.iab.de Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the."— Presentation transcript:

Similar presentations

About project

Feedback