Disclosure scenario and risk assessment: Structure of Earnings Survey

Slides:



Advertisements
Similar presentations
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Advertisements

Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
2006 August Labour statistics The usage of administrative data sources for Lithuanian data of earnings Milda Šličkutė-Šeštokienė Statistics Lithuania.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Census Bureau – Fernando Casimiro, Coordinator Lisboa IPUMS - Portugal Country Report.
Overview of the International classification of occupations (ISCO) A case for Uganda Ssennono vincent.
National Employment Survey Unit Methodology Division, CSO Project Team: Kevin McCormack, Dr. Mary Smyth, Sinead Phelan, Ann O’Dwyer.
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
26 April 2010 The unadjusted gender pay gap in the EU Didier Dupré, Eurostat unit F2 UNECE Work Session on Gender Statistics.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Anonymisation of the EU-SILC database: result of the work of the EU-SILC TF Jean-Marc Museux The Statistical Office of the European Communities Unit F3:
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim.
Labour Cost Index (LCI) Known issues. LCI – Known issues Hours worked Bonus Weights Enterprises with less than 10 employees Sources.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Understanding your paycheck
ScWk 298 Quantitative Review Session
Schedule E – Employment Income
Employment Rights for One Housing Group
Labour and Employment Definitions
On building statistical indicators at Labour Market Area level
Statistics Netherlands Division Social and Spatial Statistics
Employment Rights Greatfields School
Informal Sector Statistics
Census Bureau – Fernando Casimiro, Coordinator
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Conducting of EU - SILC in the Republic of Macedonia, 2010
Development of a framework for use of administrative data
Labour accounts THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION.
Unit 7: Statistics Key Terms
Harmonisation process of anonymisation of microdata
Use of the business register in the Dutch labour statistics
Earnings and labour cost statistics as exist in EUROSTAT’s website’s
Working Group on Labour Statistics for MEDSTAT countries October 2013
Labour accounts Robin Lynch
ESSnet on common tools and harmonized methodology for statistical data confidentiality Daniela Ichim, Luisa Franconi.
WORKSHOP ON THE DATA COLLECTION OF OCCUPATIONAL DATA Luxembourg, 28 November 2008 Occupation as a core variable in social surveys Sylvain Jouhette
A new fantastic source for updating the Statistical Business Register
Labour Price Index Labour Market Statistics (LAMAS) Working Group
Remediation for Part Time and Variable hourly paid employees
Agenda Item 2.1 SES 2014: follow-up
David Hunter Representing the ILO Department of Statistics
SES 2014 IN SLOVENIA Miran Žavbi, SURS.
LAMAS October 2016 Agenda Item 2.1
Disclosure Avoidance: An Overview
LAMAS January 2016 Agenda Item 2.1 Structure of Earnings Survey (SES) Eusebio Bezzina Jean Thill.
High-level Working Group on Statistical Confidentiality
AES Anonymisation agreement
LAMAS Working Group June 2015
Session 7 – Eurostat 2017 SBR User Survey
LAMAS Working Group October 2018
Istat - Structural Business Statistics
Gathering and Organizing Data
GUIDELINES FOR THE COLLECTION OF PESTICIDE USAGE STATISTICS A summary
Strategies to achieve SDC harmonisation at European level: multiple countries, multiple files, multiple surveys Daniela Ichim and Luisa Franconi Istat,
HELLENIC STATISTICAL AUTHORITY
New wage and hour regulations
Outcome Opportunity to meet participants from other countries
Item 2.2 Scientific Use Files for the Time Use Survey
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Disclosure scenario and risk assessment: Structure of Earnings Survey Daniela Ichim, Luisa Franconi Istat – DCMT – Methodology ichim@istat.it, franconi@istat.it

Outline 1. Objectives of the anonymisation 2. Disclosure scenarios 3. Risk assessment 4. Confidentiality protection 5. Information content analysis

Objectives Requirements: Member States Users Dissemination policy (Nace, Citizenship, Number of Employees, etc.) Coherence Users High-priority variables: NACE, NUTS, ISCO Minimum level of detail (NACE 2digits, Nuts1, ISCO 2digits …) Kinds of analysis Estimating the difference on Annual Earnings between two categories of the regional detail (estimating differences between regional politics) Weighted totals variation MICRODATA FILE FOR RESEARCH

Disclosure scenarios Mimic the intruder knowledge and interest. POSSIBLE INTRUDER = RESEARCHER. No external register scenario No nosy colleague scenario MICRODATA FILE FOR RESEARCH ONLY SPONTANEOUS IDENTIFICATION

Enterprise spontaneous identification Key variables Structural variables: NACE, NUTS, SIZE A sampled enterprise is considered at risk when both population and sample frequencies are simultaneously below the given threshold.

Enterprise protection Structural key variables are all categorical. Protection is achieved by recoding classes of the categorical key variable with the lowest priority: 1. Nace 2-digits 2. NUTS1 3. SIZE a) Recoding with respect to the population frequencies generates a lower information loss. b) If needed, recode another variable.

Employees spontaneous identification information on the enterprise (Nace x Nuts x Size) social variables (Gender x Age) extremely high earnings related to large enterprises MICRODATA FILE FOR RESEARCH

Employees at risk (use the scenario!) High AnnualEarnings: greater than the 99% quantile (T) for each combination of Nace, Nuts, Size, Gender, Age, AnnEarn the number of sampled employees with earnings greater than T was counted. If there was a single employee with such characteristics, it was considered at risk of identification.

Employees: selective protection Only records of employees at risk of identification ought to be perturbed. Only numerical key variables are perturbed. MICRODATA FILE FOR RESEARCH

Constrained regression Controlled perturbation Weighted total variation inferior to 0.5%. Can be easily adapted to whatever stratification.

Information content User requirements: Information preservation Weighted totals Sampling weights Only key and confidential variables are modified. Information loss Statistical indicators (correlations, summary statistics) Order relationships

Code Variable Status A.1.1 A.1.2 B.3.0 A.1.3 B.3.1 A.1.4 B.3.1.1 A.1.5 Geographical location not changed A.1.2 Size of enterprise changed B.3.0 Average gross hourly earnings in the representative month A.1.3 Principal economic activity B.3.1 Total gross earnings for a representative month A.1.4 Form of economic and financial control B.3.1.1 Earnings related to overtime A.1.5 Existence of collective pay agreements B.3.1.2 Special payment for shift work A.1.6 Total number of employees removed B.3.2 Total gross annual earnings in the reference year A.4.1 Enterprise sample weights B.3.2.1 Number of weeks to which the gross annual earnings relate B.2.1 Gender B.3.2.2 Total annual bonuses B.2.2 Employee’s age B.3.2.2.2 Annual bonuses based on productivity B.2.3 Occupation B.3.4 Number of paid hours during the representative month B.2.4 Management position or supervisory position B.3.4.1 Number of overtime hours paid in the reference month B.2.5 Education B.3.5 Annual days of absence B.2.6 Length of service in the enterprise B.3.5.1 Annual days of holiday leave B.2.7 Full-time or part-time B.3.5.1.1 Holiday entitlement or number of holidays B.2.7.1 Share of a full-time B.4.2 Employee sample weights B.2.8 Type of contract of employment

CONCLUSIONS Consider the dissemination features. Consider the data features. Confidentiality ensured, minimize the information loss.