Anonymisation: Theory and Practice Mark Elliot University of Manchester
Outline UKAN – who we are What anonymisation is The Anonymisation decision making framework Risk assessment techniques A look to the future
The UK Anonymisation Network
Motivation Transparent Government, Not Transparent Citizens (O’Hara 2011) Anonymisation is better understood theoretically than empirically Need a centre of excellence to study it Lack of capacity within the UK Lack of connectedness between different perspectives
Aims 1. To establish a mutual understanding of differences in perspective on anonymisation across sectors, disciplines and components 2. To synthesise key concepts into a common framework 3. To agree best practice principles 4. To create a strategy for network sustainability
4 Tier Structure Hub: Operations Partners: Management Group Core Network: Strategy Extended Network: The Community
The Partners
The Core Network 30 representatives from Academia: Computer Science, Law, Statistics Government Commercial sector Third sector NHS
From UKAN to UKAS Delivery of full service Website: www.ukanon.net Clinics Consultancy Engagement Accreditation Dissemination of best practice via case studies Anonymisation Decision Making Framework Secured new funding from Access to Data Fund Move to sustainable full-cost recovery
What is Anonymisation?
Starting with the legal Anonymisation is process by which personal data are rendered non personal.
Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms: “anonymised”
Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms “anonymised” or worse “truly anonymised”
Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms “anonymised” or worse “truly anonymised” And “really truly anonymised” is right out
Anonymisation and de-identification Deal with different parts of the DPDs definition of personal data De-identification tackles: “Directly from those data” Anonymisation tackles: “Indirectly from those data and other information which is in the in the possession of, or is likely to come into the possession of, the data controller…”
Some tenets of our approach Anonymisation is not about the data about data situations Data situations arise from data interacting with data environments
Some tenets Data environments are: the set of formal and informal structures, processes, mechanisms and agents that: act on data; provide interpretable context for those data and/or define, control and/or interact with those data. Elliot and Mackey (2014)
Anonymisation types Absolute Anonymisation Formal Anonymisation Zero possibility of re-identification under any circumstances Formal Anonymisation De-identification (including pseudonymisation) Statistical Anonymisation Statistical Disclosure Control Functional Anonymisation
Unintended Disclosure Consists of two processes: Identification: the (correct) association of a population unit and a data unit Attribution: the (correct) association or disassociation of an item of data with a population unit Can occur independently Without the latter there is no disclosure
The Anonymisation Decision Making Framework
What is the ADMF? A system for developing anonymisation policy A practical tool for understanding your data situation Not a checklist
The data controller’s responsibility Understand how a privacy breach might occur Understand the possible consequences of the breach Reduce the risk of a breach occurring to a negligible level
10 step process for functional anonymisation Know your data Understand the use case Understand the legal issues and governance Understand the issue of consent and your ethical obligations Know the processes you will need to go through to assess the risk of re-identification/disclosure Know the processes you will need to go through to anonymise your data Understand the environment into which you share or release the data Know your audience and how you will communicate Know what to do if things go wrong What happens next once you have shared and or release data
10 step process for functional anonymisation Know your data Understand the use case Understand the legal issues and governance Understand the issue of consent and your ethical obligations Know the processes you will need to go through to assess the risk of re-identification/disclosure Know the processes you will need to go through to anonymise your data Understand the environment into which you share or release the data Know your audience and how you will communicate Know what to do if things go wrong What happens next once you have shared and or release data
Know the processes you will need to go through to assess the risk of re-identification Data environment analysis Scenario analysis Statistical disclosure risk assessment Penetration tests Comparative data situation analysis
Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables
Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables
Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables
Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables
Understand the data environment Global Always important but especially for open data.
Understand the data environment Global Local Data Agents Governance Security Infrastructure
The Disclosure Risk Problem: Identification The identification file Name Address Sex DOB .. Sex Age .. Income .. .. The target file ID variables Key variables Target variables
Risk is present At the variable level At the case level Power Differentiation Skew Quality: susceptibility to divergence Availability At the case level Outliers
Risk is present At the case level Outliers Vulnerable person’s
Risk is present At the attribute value level Unusual characteristics Sensitive attribute values
The Disclosure Risk Problem II: Attribution
The Disclosure Risk Problem III: Subtraction
The Disclosure Risk Problem III: After Subtraction
SAP: Example output
Many other attack forms Table linkage Stream linkage Mash attacks Cross dataset linkage enhancement Match and search Repetitive queries Data hiding Data manipulation Response knowledge Etc etc.
How much risk is negligible? Policy decision based on risk appetite Mature Risk management triangulates across the data situation: Disclosiveness Sensitivity Environment Agents Data Governance Security
Some Futurology Given the current trends and likely future technological change Anonymisation will become a mostly meaningless concept within 15-20 years If we care about privacy (and democracy) then we need a radically different solution. Personal data stores (and no central databases) Economic models
Summary Anonymisation done correctly is a functional process which turns personal data to non-personal data. There a variety of techniques and tools which can be used to asses the likelihood of an attack occurring and the likelihood of it succeeding once it has occurred. Functional anonymisation requires an evaluation of the totality of a data situation not just the data in question. Anonymisation will have a lifespan.