Presentation is loading. Please wait.

Presentation is loading. Please wait.

Item 2.1 Report on statistical confidentiality

Similar presentations


Presentation on theme: "Item 2.1 Report on statistical confidentiality"— Presentation transcript:

1 Item 2.1 Report on statistical confidentiality
5 April 2017 Working Group on Methodology Aleksandra Bujnowska

2 Outline Access to ESS microdata
Expert Group on Statistical Disclosure Control European Business Statistics Manual Results of the Centre of Excellence on statistical disclosure control Development of public use files for EU-SILC and EU-LFS for all countries - follow up of the Centre of Excellence project

3 1. Access to ESS microdata
Since July 2013 12 microdata colections 2 modes of access Every year more than 300 research proposals More than 1000 research proposals on-going Around 660 research entities recognized

4 ESS microdata for scientific purposes
European Community Household Panel European Union Labour Force Survey Community Innovation Survey European Union Statistics on Income and Living Conditions Structure of Earnings Survey Adult Education Survey European Road Freight Transport Survey European Health Interview Survey Continuing Vocational Training Survey Community Statistics on Information Society Micro-Moments Dataset Household Budget Survey 2010 (since September 2016)

5 Use of microdata sets

6 Modes of access

7 Data anonymisation Statistical use files Secure use files CIS SES MMD
Scientific use files All data collections excluding MMD Public use files EU-SILC EU-LFS (5 countries) De-identification Partial anonymisation Full anonymisation

8 Microdata for scientific purposes
Data anonymisation Microdata for scientific purposes Statistical use files Secure use files CIS SES MMD Scientific use files All data collections excluding MMD Public use files EU-SILC EU-LFS (5 countries) De-identification Partial anonymisation Full anonymisation

9 Scientific use files (all except MMD)
Types of microdata Secure use files (CIS, SES, MMD) Scientific use files (all except MMD) Public use files (EU-SILC, EU-LFS) Criteria Approved research proposal _ Access In Eurostat safe centre At researchers' workplace Public Data preparation Only direct identifiers removed Partial anonymisation (variables grouped together, rounded, swapped or suppressed) Data fully anonymised Identification Possible Possible but difficult Impossible

10 Research entities

11 Eligible institutions are:
Doing research Making their results public Independent Secure 660 entities recognized on 3/04/2017

12 Recognised research entities, by entity type

13 Research proposals

14 Eligible research proposal
Scientific purpose well described Need for microdata justified Research results made public Data security measures in place

15 Assessment of research project proposal (RPP)
Eurostat: Microdata access team: initial check Technical units: is the microdata appropriate for the planned research MS: 4 weeks

16 Number of research proposals received (07.2013-03.2017)

17 Recent achievements and plans for 2017

18 Achievements 2016 (1): new data
Public use files for EU-SILC ( ) and EU-LFS (2013) for DE, FI, HU, NL, SI Topics, A-Z, Access to microdata, Public use files for Eurostat microdata) Household Budget Survey 2010 available for scientific purposes

19 Achievements 2016 (2): process improvements
New contract templates for non-EU entities Self-study material for microdata users On CROS portal: Newsletters Database with publications written using ESS microdata

20 Achievements 2016 (3): IT Workflow tool and webforms replacing Word microdata applications in (since January 2017) Pilot on SFTP (secure file transfer protocol) transmission of LFS 2015 data

21 Plans for 2017 PUFs: all countries EU-SILC and EU-LFS
First meeting of the microdata access network group – 20 June 2017 (Luxembourg) Please let us know if you are interested in participating! Collaboration with Council of European Social Science Data Archives - CESSDA (entry point for national microdata access systems) Analysis of the impact of the entry into force of the new General Data Protection Regulation (in May 2018)

22 Medium term objectives:
Microdata transmission based on eDAMIS 4.0 Table builder (confidentiality on the fly) Network of safe centres in NSIs (DARA: Decentralized And Remote Access to European microdata) New microdata collections

23 2. Expert Group on Statistical Disclosure Control (EGSDC)
Recommendations for confidentiality management in business statistics in the ESS Discussion on waivers implementation and confidentiality on the fly SDC tools: Governance structure Inventory of functionalities user support and user group organisation. European Business Statistics manuals

24 3. European Business Statistics Manual
Chapters on: Statistical Disclosure Control and on Microdata access Your comments welcome!

25 4. Centre of Excellence on statistical disclosure control (SDC)
Title Period Status Harmonized methodology for public use files (for EU-SILC and EU-LFS data Jan 2015 – Dec 2015 Finished User support for and maintenance of SDC tools April April 2018 On-going Harmonized protection of census data in the ESS Sept 2016 – Aug 2017

26 5. Public use files EU-SILC and LFS
Centre of Excellence (7 countries) produced public use files EU-SILC synthetic files LFS traditional methods Published on the CROS portal:

27 More years, other countries
All Member States (not including files from other countries) Where feasible 2004 to 2013

28 One modification necessary for LFS
The link between quarterly and annual files was broken: Countries with subsampling for the annual variables: annual file treated separately Countries without subsampling: annual file created from quarterly files and annual variables missing Countries can be in different situations for different years

29 Next step (soon after WG meeting)
Eurostat will send to each member of the WG: The public use files of their country Documentation A form to grant Eurostat permission to publish The difficulty for you will be to identify the responsible for granting permission. Please be pro-active.

30 Presentation at the Working Group Methodology in Luxembourg, 5-4-17
Harmonised protection of census data in the ESS Presentation at the Working Group Methodology in Luxembourg,

31 Outline Census 2011 and confidentiality Details of the project
Content of the project ‘Harmonised protection of census data in the ESS’ Operational phase Definition of test scenarios Cell key method Questions

32 Census 2011 and confidentiality (1)
European Census 2011 data represent an essential source of vital statistical information ranging from the lowest small-area geographical divisions to national and international levels Harmonised census tables of 32 European countries are available via the Census Hub ( Census data are detailed and confidential; protecting the census data is the responsibility of the member states

33 Census 2011 and confidentiality (2)
In spite of the output harmonisation international comparisons of census data are hampered by different statistical disclosure control approaches In this Specific Grant Agreement (SGA) best practices for the Census 2021 are being defined and tested

34 Details of the project (1)
Start: 1 September 2016 End: 31 August 2017 Four WPs: WP 1 Management (7 deliverables) WP 2 Questionnaire (2 deliverables) WP 3 Development and testing of the recommendations; identification of best practices (4 deliverables) WP 4 Dissemination (5 deliverables)

35 Details of the project (2)
Six countries involved: CBS (Eric Schulte Nordholt, Peter-Paul de Wolf), INSEE (Maël-Luc Buron), Destatis (Sarah Gießing, Tobias Enderle), HCSO (László Antal, Beata Nagy, Peter Kristof), Statistics Finland (Annu Cabrera) and SURS (Andreja Smukavec)

36 Content of the project ‘Harmonised protection of census data in the ESS’
Review the country specific data protection regulations and methods Provide a harmonised approach to the protection of the Census 2021 (taking the national constraints into account) Recommend to Member States appropriate statistical disclosure control methods for hypercubes Recommend how to handle efficiently confidential cells in grid squares and regional breakdowns (risk of disclosure due to differencing)

37 Operational phase Development of the recommendations for the treatment of statistical confidentiality (by means of best practices) Testing of the recommended approach(es) Support to other NSIs that are willing to test Adaptation of the recommended approach taking into account the feedback (after testing) Reports to the relevant ESS bodies Met het opleveren van de gegevens aan Eurostat en het verschijnen van de CBS-publicatie zijn de werkzaamheden voor de Volkstelling niet voorbij. In 2015 zal gewerkt worden beveiligde microdata bestanden, zodat (internationale) onderzoekers met onze volkstellingsgegevens kunnen werken. IPUMS project (Integrated Public Use Microdata Series) stelt beveiligde volkstellingsgegevens op microniveau van landen over de gehele wereld beschikbaar aan onderzoekers. En de voorbereidingen voor de Volkstelling 2021 zijn ook al van start. Allereerst worden de ervaringen met de VT2011 gebruikt voor internationale aanbevelingen voor de VT2021. Op basis van die aanbevelingen zullen nieuwe, aanvullende Europese verordeningen worden opgesteld. Daarnaast zullen ook binnen het CBS voorbereidende projecten worden uitgevoerd, bijvoorbeeld op het gebied van de schattingsmethoden en bijbehorende software en het gebruik van een combinatie van register- en enquêtegegevens over opleidingsniveau.

38 Definition of test scenarios (1)
Restrictions: No global recodes (lay-out of hypercubes fixed in implementing census regulation) No cell suppressions (very difficult for linked hypercubes and otherwise no European total can be calculated) Complications: 1 km2 grid cells lead to many small cell values 1 km2 grid cells  administrative regions (risk of disclosure due to differencing)

39 Definition of test scenarios (2)
The Statistical Disclosure Control solution should not alter the spatial distribution of the grid data too much: Zer0 frequencies grids should not too often be changed to positive frequencies Rare non-zero frequencies in an area should not be changed much Usual disclosure risks: Small counts (may lead to direct identification) Attribute disclosure (a positive frequency may lead to disclosing information from a hypercube)

40 Definition of test scenarios (3)
Flexible method that can be adapted to national needs by the member states: Pre-tabular method of record swapping Post-tabular method of cell key method Record swapping and cell key method: Enhanced variant of cell key method developed by the Australian Bureau of Statistics (ABS) Provided by the Office for National Statistics (ONS) and adapted in this SGA

41 Cell Key Method (1) 1 2 3 4 Assign each record a random number
For each cell, sum rkey and apply a function to get a cell key Age by sex Male Female 0-15 . 16-24 4 25-34 Record Rkey r2 → 4 r4 → 61 r56 → 7 r72 → 90 Sum = 162 Record Rkey r1 → 54 r2 → 4 r3 → 93 rN → 26 e.g. take last two digits → Ckey = 62 3 Use a look up table to get perturbation value 4 Apply perturbation value to cell Cell Key Age by sex Male Female 0-15 . 16-24 5 25-34 1 2 3 61 62 63 99 +1 -1 4 5 Cell Value

42 Cell key method (2) Cell key method is primarily used for protecting against differencing Additional to record swapping (that is considered the primary approach) Considering the need to retain 1s and 2s in outputs Structural zeroes can be taken into account Introduces another layer of uncertainty for intruder Consistency in same cell across tables After restoring additivity some small inconsistencies in breakdowns of different hypercubes may appear

43 Questions Do you have any questions or remarks?

44 Your feedback welcome! Participation in the meeting of Microdata Access Network Group – 20 June 2017 (Luxembourg) EBS Manual Centre of Excellence projects, in particular census project Public use files

45 Thank you for your attention!


Download ppt "Item 2.1 Report on statistical confidentiality"

Similar presentations


Ads by Google