ICES III Montreal, June 18-21, 2007 A new Approach for Disclosure Control in the IAB Establishment Panel Multiple Imputation for Better Data Access Jörg.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…

AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory

& dding ubtracting ractions.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University

Variance Estimation in Complex Surveys Third International Conference on Establishment Surveys Montreal, Quebec June 18-21, 2007 Presented by: Kirk Wolter,

Linearization Variance Estimators for Survey Data: Some Recent Work

Web Design Issues in a Business Establishment Panel Survey Third International Conference on Establishment Surveys (ICES-III) June 18-21, 2007 Montréal,

Improved Questionnaire Design Yields Better Data: Experiences from the UKs Annual Survey of Hours and Earnings Jacqui Jones, Pete Brodie, Sarah Williams.

UNITED NATIONS Shipment Details Report – January 2006.

HEART TRANSPLANTATION Pediatric Recipients ISHLT 2007 J Heart Lung Transplant 2007;26:

NTTS conference, February 18 – New Developments in Nonresponse Adjustment Methods Fannie Cobben Statistics Netherlands Department of Methodology.

Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

606 CMR 14.00: Background Record Checks What you need to know!

You will need some paper!

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLICATION EQUATIONS 1. SOLVE FOR X 3. WHAT EVER YOU DO TO ONE SIDE YOU HAVE TO DO TO THE OTHER 2. DIVIDE BY THE NUMBER IN FRONT OF THE VARIABLE.

Year 6 mental test 5 second questions

Year 6 mental test 10 second questions

Overview of Lecture Factorial Designs Experimental Design Names

Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.

Who Wants To Be A Millionaire?

Who Wants To Be A Millionaire? Decimal Edition Question 1.

Welcome Youth Conference – Monday 19 th March 2007.

Solve Multi-step Equations

Break Time Remaining 10:00.

The basics for simulations

GENERATIONS AND GENDER SURVEY IN RUSSIA: Parents and Children, Men and Women in Family and Society 2 nd wave IWG, 13 May, 2008 Oxana Sinyavskaya, IISP.

ABC Technology Project

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Squares and Square Root WALK. Solve each problem REVIEW:

Chapter 1: Expressions, Equations, & Inequalities

Name of presenter(s) or subtitle Canadian Netizens February 2004.

Adding Up In Chunks.

Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.

Sets Sets © 2005 Richard A. Medeiros next Patterns.

Chapter 5 Test Review Sections 5-1 through 5-4.

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.

Before Between After.

2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.

Addition 1’s to 20.

Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M

25 seconds left…...

Subtraction: Adding UP

Equal or Not. Equal or Not

1 Atlantic Annual Viewing Trends Adults 35-54, Total TV, By Daypart Average Minute Audience (000) Average Weekly Reach (%) Average Weekly Hours Viewed.

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Chapter 8 Estimation Understandable Statistics Ninth Edition

Clock will move after 1 minute

PSSA Preparation.

Select a time to count down from the clock above

Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.

UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.

IAB homepage: Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.

Presentation transcript:

ICES III Montreal, June 18-21, 2007 A new Approach for Disclosure Control in the IAB Establishment Panel Multiple Imputation for Better Data Access Jörg Drechsler Institute for Employment Research (IAB)

2 Overview Background Statistical disclosure control with fully synthetic data sets Application to the IAB-Establishment Panel First results Proceedings/open questions

3 The IAB Establishment Panel Annually conducted Establishment Survey Since 1993 in Western Germany, since 1996 in Eastern Germany Population: All establishments with at least one employee covered by social security Source: Official Employment Statistics Response rate of repeatedly interviewed establishments more than 80% Sample of more than establishments in the last wave Contents: employment structure, changes in employment, business policies, investment, training, remuneration, working hours, collective wage agreements, works councils

4 Overview Background Statistical disclosure control with fully synthetic data sets Application to the IAB-Establishment Panel First results Proceedings/open questions

5 Y synthetisch Generating Synthetic Data Sets (Rubin 1993) Advantages: - Data are fully synthetic - no re-identification of single units possible - all variables are still fully available Y observed X Y not observed Y synthetic

6 Overview Background Statistical disclosure control with fully synthetic data sets Application to the IAB-Establishment Panel First results Proceedings/open questions

7 Generating synthetic data sets for the IAB Establishment Panel Create a synthetic data set for selected variables from the wave 1997 from the Establishment Panel Imputation for the whole population is not feasible Draw a new sample from the Official Employment Statistics using the same sampling design as for the Establishment Panel (Stratification by economic branch, size, and region) Each stratum cell contains the same number of observations as the wave 1997 from the Establishment Panel Additional Information from the German Social Security Data (GSSD) for the imputation

8 The German Social Security Data (GSSD) Contains information on all employees covered by social security Since 1973 all employers are required to notify the social security agencies about all employees covered by social security. The GSSD represents about 80% of the German workforce Information from the GSSD is aggregated on the establishment level and is matched to the IAB Establishment Panel via establishment identification number Information on: number of employees by gender, schooling, mean of the employees age, mean of the wages of the employees…

9 Y synthetisch Synthetic Establishment Panels The IAB Establishment Panel GSSD EP synthetic

10 Imputation Procedure For simplicity new founded establishments are excluded from the sampling frame and from the panel 10 new samples are drawn The number of observations in each sample equals the number of observations in the panel n s =n p =7332 Every sample is imputed ten times using chained equations Number of variables from the GSSD: 24 Number of variables from the establishment panel: 48 Imputations are generated using IVEware by Raghunathan, Solenberger and Hoewyk (2001)

11 Overview Background Statistical disclosure control with fully synthetic data sets Application to the IAB-Establishment Panel First results Proceedings/open questions

12 First Results Compare regression results from the original data with results from the synthetic data Zwick (2005) analyses the productivity effects of different continuing vocational training forms in Germany Results: vocational training is one of the most important measures to gain and keep productivity Probit regression to explain, why firms offer vocational training 13 Explanatory variables including: Share of qualified employees, establishment size, region, collective wage agreement, high qualification needs expected… 2 variables, based on the 1998 wave of the panel, are dropped for the evaluation

13 Descriptive comparison of the original and in the synthetic data set Variable survey mean synthetic data mean Deviation Training Yes/No % Redundancies expected % Many employees are expected to be on maternity leave % High qualification needs expected % Establishment size % Establishment size % Establishment size % Establishment size % Collective wage agreement % Apprenticeship training reaction on skill shortages % Training reaction on skill shortages % State-of-the-art technical equipment % Apprenticeship training % Share of qualified employees % number of employees %

14 Results from the regression original data setsynthetic data set Exogenous variablescoeff.p-valuecoeff.p-value Redundancies expected0.2503*** *** Emp. exp. on maternity leave0.2657** * High qual. needs expected0.6480*** *** Appr. tr. react. on skill shortages0.1130* * Tr. reaction on skill shortages0.5273*** *** Establishment size *** *** Establishment size *** *** Establishment size *** *** Establishment size *** *** Share of qualified employees0.7776*** *** State-of-the-art tech. equipment0.1690*** *** Collective wage agreement0.2541*** *** Apprenticeship training0.4838*** *** *** significant on the 0.1% level, ** significant on the 1% level, * significant on the 5% level

15 Overview Background Statistical disclosure control with fully synthetic data sets Application to the IAB-Establishment Panel First results Proceedings/open questions

16 Proceedings/open questions More detailed evaluation Replace only selected variables Generate weights for the synthetic sample Imputation of more than one wave maintaining the panel structure References Drechsler, J., Dundler, A., Bender, S., Rässler, S., Zwick, T. (2007). A New Approach for Disclosure Control in the IAB Establishment Panel - Multiple Imputation for a Better Data Access, IAB Discussion Paper No.11/2007 Reiter, J. und Drechsler, J. (2007). Releasing Multiply-Imputed, Synthetic Data Generated in Two Stages To Protect Confidentiality, submitted

17 Thank you for your attention

18 Information from the two data sets

19 Disclosure is possible, if… An establishment is included in the original data set and in at least on of the newly drawn samples The original values and the imputed values for this establishment are nearly the same

20

21 How often are establishments included in the IAB- Establishment Panel drawn in the new samples? Occurrence in … sample(s) NumberPercentage 04, % 11, % % % % % % % % % % Total7,332100%

22

23 Comparing original and imputed values Binary variables: probability of identical values: 60-90% Multiple response questions: - with four categories: 57% - with 13 categories:6% Numerical variables: - average relative difference: 21% - outliers