Download presentation
Presentation is loading. Please wait.
Published byFelix Ross Modified over 9 years ago
1
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences Hans-Peter.Hafner@htwsaar.de Felix Ritchie University of the West of England Rainer Lenz Technical University of Dortmund Conference of European Statistics Stakeholders Rome, 24 November 2014 1
2
Motivation User-focused threat identification 2 Production of anonymised data sources for the scientific community as a key task of National Statistics Institutes (NSI) Conservative risk averse approach (data protection) Release data only if it can be shown they are safe (defensive) vs Alternative user oriented approach Release data unless it presents a disclosure risk (cooperative)
3
Overview Common approaches to anonymisation Critique of common perspective Focus data protection Worst-case scenarios Evidence-based risk assignment (Case study: CIS 2010) Impact of new strategy Conclusion Overview User-focused threat identification 3
4
Common approach to anonymisation ESSNET Handbook on SDC (Statistical Disclosure Control) Microdata protection should be based on Knowledge of the use of the data Access requirements Potential to match external datasets Structure of the data itself Risk scenarios are based on Spontaneous recognition Actively searching (record linkage) Common approach to anonymisation User-focused threat identification 4
5
Critique 1: Focus on data protection Assumption: Existence of intruders who want to identify companies / persons in the data. But: There are no known cases of malicious misuse of data. Only some mistakes or some efforts to circumvent procedures to make life easier are known. Problem not anonymisation but accreditation procedures! Critique of common perspective 1 User-focused threat identification 5
6
Worst-case Scenarios Scenario often: Anonymised data vs. Original data (Record matching) Not realistic: Large differences between official statistics and commercial databases Total protection is not required by law: De facto anonymity (Germany): Reidentification allowed as far as effort / costs greater than benefit Critique of common perspective 2 User-focused threat identification 6
7
Evidence-Based Risk Assignment: Case Study CIS 2010 CIS (Community Innovation Survey) Survey about the innovation activities of enterprises in countries of the European Union Conducted every 2 years For some countries census, for others only sample survey; but large companies are always included Many categorical variables, only 9 continuous attributes Case Study 1 User-focused threat identification 7
8
Case Study CIS 2010 – to be continued Risk Scenario Step 1: Identify user needs Analysis of research papers + Google Scholar search Linear and nonlinear regression are most frequently used methods Step 2: Identify user risks Spontaneous recognition of outliers No risk since no disclosure to unauthorized person Group disclosure from categorical variables No risk since focus not on descriptive statistics Case Study 2 User-focused threat identification 8
9
Case Study CIS 2010 – to be continued Case Study 3 User-focused threat identification 9 Risk Evaluation Spontaneous recognition Very unlikely because of large differences between data sources Matching on categorical variables Uncertain since statistical business register and classification of economic activity in commercial databases differ (main activity vs main turnover) Moreover: Matching is prohibited by licence agreements Remaining risks Magnitude tables with 1 or 2 observations in a cell Dominance of one unit in cell / dataset
10
Impact of new strategy Impact User-focused threat identification 10 Consequence of risk evaluation Small cell count (< 3) or dominance problem in cell: Determination of records at risk in these cells Only records at risk are perturbed (individual microaggregation of metric variables) Consequence for the quality of the anonymised datasets For less than 1% of all records microaggregation was performed Small impact on regression coefficients
11
Conclusion User-focused threat identification 11 Change of perspective from total data protection to a realistic user-oriented approach that takes into account user needs, quality of external databases, accreditation procedures and statistical legislation leads to datasets with higher analytical potential for the scientific community!
12
User-focused threat identification 12 THANK YOU FOR YOUR ATTENTION
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.