User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences

Slides:



Advertisements
Similar presentations
Microdata access in practice Felix Ritchie. Overview Concerns Conceptual and practical concerns International practice UK experience Key lessons.
Advertisements

Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:
Balancing Access and Confidentiality Jenny Telford Australian Bureau of Statistics September 2008.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
ONS Research Data Access Strategy AGENDA Background and context Confidentiality The Strategy.
Eurostat T HE E UROPEAN PROCESS OF ENHANCING ACCESS TO E UROSTAT DATA A LEKSANDRA B UJNOWSKA E UROSTAT.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
Operationalising ‘safe statistics’ the case of linear regression Felix Ritchie Bristol Business School, University of the West of England, Bristol.
Developing a Statistical Disclosure Standard for Europe Tanvi Desai LSE Research Laboratory Data Manager Research Laboratory IASSIST 2010: Cornell.
Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,
Decentralised and Remote Access to Confidential Data in the ESS (ESSnet DARA) Overview and State of the Art Maurice Brandt Destatis FIRST EUROPEAN DATA.
UNECE Workshop on Confidentiality Manchester, December 2007 Comparing Fully and Partially Synthetic Data Sets for Statistical Disclosure Control.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Beyond 2011 – A new paradigm for population statistics? Pete Benton, Beyond 2011 Programme Director Office for National Statistics, UK.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Changing the perspective: users’involvement in prioritising needs in the Italian National Statistical Programme Monica Attias, ISTAT Q2014 Vienna, June.
FORMS OF COOPERATION BETWEEN NATIONAL STATISTICAL INSTITUTES AND DATA ARCHIVES Sebastian Kočar (ADP, UL) First Regional Workshop – Microdata Access in.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Globalisation processes in the field of statistics Discussion DGINS, Budapest, 2007 Irena Križman Director-General of the Statistical Office of the Republic.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 Analytical validity and confidentiality protection of anonymised longitudinal.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Accreditation practices at the Hungarian Central Statistical Office Zoltán Vereczkei Methodology Department Hungarian Central Statistical Office
Discussion From Republic of Science to Audit Society, Irwin Feller S. Charlot ASIRPA, Paris, France; June 13, 2012.
Luisa Franconi Integration, Quality, Research and Production Networks Development Department Unit on microdata access ISTAT Essnet on Common Tools and.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Developing Survey Handbooks as Educational Tools for Data Users Presented at the European Conference on Quality in Official Statistics May 2010 Deborah.
Access to sensitive data in the UK: a principles-based approach Felix Ritchie.
Access to Microdata Felix Ritchie Business Data Linking.
Some aspects concerning analytical validity and disclosure risk of CART generated synthetic data Hans-Peter Hafner and Rainer Lenz Research Data Centre.
IAB homepage: Institut für Arbeitsmarkt- und Berufsforschung/Institute for Employment Research A New Approach for Disclosure Control in the.
1 Joint Research Centre (JRC) Annual Meeting of the EGTC Platform The EGTC-ready to use - Beyond cohesion policy Ulla Engelmann Interinstitutional and.
© Federal Statistical Office, Research Data Centre, Maurice Brandt Folie 1 ESSnet Projects “Decentralised Access to EU microdata” Maurice Brandt Research.
User-centred, evidence-based, risk- managed access to data Hans-Peter Hafner 1, Rainer Lenz 1,2, Felix Ritchie 3, Richard Welpton 4 1 Technical University.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Anonymization of longitudinal surveys in the presence of outliers Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Access to environmental microdata in Germany IAOS Conference, Chile, 2010 Markus Zwick Federal Statistical Office Germany.
UK INNOVATION SURVEY 2005 CIS4 – Introduction and Guide A brief introduction to the survey Some description of the data and analytical results, special.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata.
Giving research access to official microdata through the facilities of a National Statistical Institute: INE in Portugal Jose A. PINTO MARTINS - D ISSEMINATION.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
USE OF E- COMMERCE DATA International comparisons and a micro-perspective Michael Polder, OECD-STI/EAS Business Statistics User Event: How E-commerce is.
Estimation of the Probit Model From Anonymized Micro Data Gerd Ronning and Martin Rosemann Universität Tübingen & IAW Tübingen UNECE Work Session on Statistical.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
Development of UK Virtual Microdata Laboratory Felix Ritchie Shanghai, March 2010.
Joint Eurostat Unece Worksession on Statistical Data Confidentiality 2011, Tarragona Initial analyses on comparable dissemination from the Essnet project.
Державна служба статистики України Statistical confidentiality assurance framework in State Statistics Service of Ukraine Anton Tovchenko head of mathematical.
Joint UNECE/Eurostat work session on statistical data confidentiality October 2015 Helsinki, Finland Circle of trust Maurice Brandt DESTATIS.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
ESSNET Data Integration - Rome, January 2010 ESSNET on Statistical Disclosure Control Daniela Ichim.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
1 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Spontaneous recognition: Risk or distraction
Disclosure scenario and risk assessment: Structure of Earnings Survey
Development of UK Virtual Microdata Laboratory
Data Confidentiality and the Common Good.
Measures for Information Loss in Protected Data
Harmonisation process of anonymisation of microdata
DDI-RDF Discovery Vocabulary _ Use Cases and Vocabularies
Federal Statistical Office Germany Research Data Centre
Treatment of statistical confidentiality Part 3: Generalised Output SDC Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK.
SAFE – a method for anonymising the German Census
Presentation transcript:

User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Felix Ritchie University of the West of England Rainer Lenz Technical University of Dortmund Conference of European Statistics Stakeholders Rome, 24 November

Motivation User-focused threat identification 2 Production of anonymised data sources for the scientific community as a key task of National Statistics Institutes (NSI) Conservative risk averse approach (data protection) Release data only if it can be shown they are safe (defensive) vs Alternative user oriented approach Release data unless it presents a disclosure risk (cooperative)

Overview Common approaches to anonymisation Critique of common perspective  Focus data protection  Worst-case scenarios Evidence-based risk assignment (Case study: CIS 2010) Impact of new strategy Conclusion Overview User-focused threat identification 3

Common approach to anonymisation ESSNET Handbook on SDC (Statistical Disclosure Control) Microdata protection should be based on  Knowledge of the use of the data  Access requirements  Potential to match external datasets  Structure of the data itself Risk scenarios are based on  Spontaneous recognition  Actively searching (record linkage) Common approach to anonymisation User-focused threat identification 4

Critique 1: Focus on data protection Assumption: Existence of intruders who want to identify companies / persons in the data. But: There are no known cases of malicious misuse of data. Only some mistakes or some efforts to circumvent procedures to make life easier are known. Problem not anonymisation but accreditation procedures! Critique of common perspective 1 User-focused threat identification 5

Worst-case Scenarios Scenario often: Anonymised data vs. Original data (Record matching) Not realistic: Large differences between official statistics and commercial databases Total protection is not required by law: De facto anonymity (Germany): Reidentification allowed as far as effort / costs greater than benefit Critique of common perspective 2 User-focused threat identification 6

Evidence-Based Risk Assignment: Case Study CIS 2010 CIS (Community Innovation Survey) Survey about the innovation activities of enterprises in countries of the European Union Conducted every 2 years For some countries census, for others only sample survey; but large companies are always included Many categorical variables, only 9 continuous attributes Case Study 1 User-focused threat identification 7

Case Study CIS 2010 – to be continued Risk Scenario Step 1: Identify user needs Analysis of research papers + Google Scholar search  Linear and nonlinear regression are most frequently used methods Step 2: Identify user risks Spontaneous recognition of outliers  No risk since no disclosure to unauthorized person Group disclosure from categorical variables  No risk since focus not on descriptive statistics Case Study 2 User-focused threat identification 8

Case Study CIS 2010 – to be continued Case Study 3 User-focused threat identification 9 Risk Evaluation Spontaneous recognition  Very unlikely because of large differences between data sources Matching on categorical variables  Uncertain since statistical business register and classification of economic activity in commercial databases differ (main activity vs main turnover)  Moreover: Matching is prohibited by licence agreements Remaining risks  Magnitude tables with 1 or 2 observations in a cell  Dominance of one unit in cell / dataset

Impact of new strategy Impact User-focused threat identification 10 Consequence of risk evaluation Small cell count (< 3) or dominance problem in cell:  Determination of records at risk in these cells  Only records at risk are perturbed (individual microaggregation of metric variables) Consequence for the quality of the anonymised datasets  For less than 1% of all records microaggregation was performed  Small impact on regression coefficients

Conclusion User-focused threat identification 11 Change of perspective from total data protection to a realistic user-oriented approach that takes into account user needs, quality of external databases, accreditation procedures and statistical legislation leads to datasets with higher analytical potential for the scientific community!

User-focused threat identification 12 THANK YOU FOR YOUR ATTENTION