Federal Statistical Office Germany Research Data Centre New Techniques and Technologies for Statistics - 2009 Brussels, 18 - 20 February 2009 Special Session on Access to Microdata "An informational infrastructure for the E-Science Age - On the way to remote data access for business data " Maurice Brandt Federal Statistical Office Germany Research Data Centre
Overview Introduction 2. Current situation at the research data centres 3. Content of the project “InfinitE” 4. Production of data structure files 5. Result-based confidentiality 6. Summary
1. Introduction Development in (business-) microdata request goes to microdata without data perturbing methods Ideally original microdata more and more researcher ask for remote data execution or safe centre This leads to a huge amount of tables, which have to be checked for confidentiality The development on a national level will propably also happen on EU level The researchers require more data preferably non anonymised microdata
2. Current situation in the RDC‘s output checking: right now the output of the researcher is checked by two persons (4 eyes principle) only publication of absolute anonymous tables allowed construction of combined and integrated datasets for business microdata difficult to anonymise One person in the RDC and one person in the statistical devision Combined dataset: first the data are enriched with other data for anonymisation reasons this information have to be deleted or strongly anonymised
2. Current situation in the RDC‘s Why this project: - still reservations from science concerning the data perturbing methods for economic microdata - amount of work of manual output checking - increasing request for original microdata
3. Content of the Project „InfinitE“ “An informational infrastructure for the E-Science Age - On the way to remote data access for business data” deals with the improvement of remote access in the Federal Statistical Office Germany project aims to find solutions for a better remote access in Germany through so called data structure files and (automatic) output checking procedures data structure files: - goal: semantic and syntactic correct data structure files - application to original data without any adaptations
4. Production of data structure files Methods to produce data structure files: - stochastic noise - multidimensional microaggregation - sythetic data multiple imputation Test of confidentiality and measurement of reidentification risk - Development of new procedures to measure reidentification risk of syntetic data Joerg Drechsler: „Disclosure Control in Business Data” on this conference Judgement about utility and applicability of data structure files
5. Result-based confidentiality output checking procedures Classification of outputs in „safe“ and „unsafe“ output Identification of output where anonymisiation procedures are necessary Evaluation and development of practicable anonymisation methods for „unsafe output“ The project evaluates also the analytical validity of the anonymised output
5. Result-based confidentiality Confidentiality methods for tables and (regression) output - (rounding, controlled tabular adjustment, stochastic noise) - evaluation of automatic output checking procedures feasibility study to change the legal frame for researcher to publish tables - More responsibility to the researcher - This leads to less anonymisation and suppression in the output
6. Summary change is observable in user needs and requirements on microdata access with this national project the data infrastructure in Germany is going to improve to consider these developments time for change in remote data execution procedure - otherwise the amount of output is not manageable anymore National and ESSnet projects can benefit from each other
Thank you for your attention Maurice Brandt Research Data Centre Federal Statistical Office Germany Tel. +49 611/75 4349 maurice.brandt@destatis.de http://www.forschungsdatenzentrum.de http://www.destatis.de