Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disclosure scenario and risk assessment: Structure of Earnings Survey

Similar presentations


Presentation on theme: "Disclosure scenario and risk assessment: Structure of Earnings Survey"— Presentation transcript:

1 Disclosure scenario and risk assessment: Structure of Earnings Survey
Daniela Ichim, Luisa Franconi Istat – DCMT – Methodology

2 Outline 1. Objectives of the anonymisation 2. Disclosure scenarios
3. Risk assessment 4. Confidentiality protection 5. Information content analysis

3 Objectives Requirements: Member States Users
Dissemination policy (Nace, Citizenship, Number of Employees, etc.) Coherence Users High-priority variables: NACE, NUTS, ISCO Minimum level of detail (NACE 2digits, Nuts1, ISCO 2digits …) Kinds of analysis Estimating the difference on Annual Earnings between two categories of the regional detail (estimating differences between regional politics) Weighted totals variation MICRODATA FILE FOR RESEARCH

4 Disclosure scenarios Mimic the intruder knowledge and interest.
POSSIBLE INTRUDER = RESEARCHER. No external register scenario No nosy colleague scenario MICRODATA FILE FOR RESEARCH ONLY SPONTANEOUS IDENTIFICATION

5 Enterprise spontaneous identification
Key variables Structural variables: NACE, NUTS, SIZE A sampled enterprise is considered at risk when both population and sample frequencies are simultaneously below the given threshold.

6 Enterprise protection
Structural key variables are all categorical. Protection is achieved by recoding classes of the categorical key variable with the lowest priority: 1. Nace 2-digits 2. NUTS1 3. SIZE a) Recoding with respect to the population frequencies generates a lower information loss. b) If needed, recode another variable.

7 Employees spontaneous identification
information on the enterprise (Nace x Nuts x Size) social variables (Gender x Age) extremely high earnings related to large enterprises MICRODATA FILE FOR RESEARCH

8 Employees at risk (use the scenario!)
High AnnualEarnings: greater than the 99% quantile (T) for each combination of Nace, Nuts, Size, Gender, Age, AnnEarn the number of sampled employees with earnings greater than T was counted. If there was a single employee with such characteristics, it was considered at risk of identification.

9 Employees: selective protection
Only records of employees at risk of identification ought to be perturbed. Only numerical key variables are perturbed. MICRODATA FILE FOR RESEARCH

10 Constrained regression
Controlled perturbation Weighted total variation inferior to 0.5%. Can be easily adapted to whatever stratification.

11 Information content User requirements: Information preservation
Weighted totals Sampling weights Only key and confidential variables are modified. Information loss Statistical indicators (correlations, summary statistics) Order relationships

12 Code Variable Status A.1.1 A.1.2 B.3.0 A.1.3 B.3.1 A.1.4 B.3.1.1 A.1.5
Geographical location not changed A.1.2 Size of enterprise changed B.3.0 Average gross hourly earnings in the representative month A.1.3 Principal economic activity B.3.1 Total gross earnings for a representative month A.1.4 Form of economic and financial control B.3.1.1 Earnings related to overtime A.1.5 Existence of collective pay agreements B.3.1.2 Special payment for shift work A.1.6 Total number of employees removed B.3.2 Total gross annual earnings in the reference year A.4.1 Enterprise sample weights B.3.2.1 Number of weeks to which the gross annual earnings relate B.2.1 Gender B.3.2.2 Total annual bonuses B.2.2 Employee’s age B Annual bonuses based on productivity B.2.3 Occupation B.3.4 Number of paid hours during the representative month B.2.4 Management position or supervisory position B.3.4.1 Number of overtime hours paid in the reference month B.2.5 Education B.3.5 Annual days of absence B.2.6 Length of service in the enterprise B.3.5.1 Annual days of holiday leave B.2.7 Full-time or part-time B Holiday entitlement or number of holidays B.2.7.1 Share of a full-time B.4.2 Employee sample weights B.2.8 Type of contract of employment

13 CONCLUSIONS Consider the dissemination features.
Consider the data features. Confidentiality ensured, minimize the information loss.


Download ppt "Disclosure scenario and risk assessment: Structure of Earnings Survey"

Similar presentations


Ads by Google