Anonymisation: Theory and Practice

Slides:



Advertisements
Similar presentations
Guide to statistics in European Commission Development Co-operation
Advertisements

A centre of expertise in digital information managementwww.ukoln.ac.uk Approaches To E-Learning: Developing An E-Learning Strategy Brian Kelly UKOLN University.
Eurostat T HE E UROPEAN PROCESS OF ENHANCING ACCESS TO E UROSTAT DATA A LEKSANDRA B UJNOWSKA E UROSTAT.
Agenda Problem Existing Approaches The e-Lab Is DRM the solution?
Assessing student learning from Public Engagement David Owen National Co-ordinating Centre for Public Engagement Funded by the UK Funding Councils, Research.
INTERNATIONAL UNION FOR CONSERVATION OF NATURE. 2 Implemented in 12 countries of Africa, Asia, Latin America and the Middle East, through IUCN regional.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
Introducing the Administrative Data Research Network Tanvi Desai.
1Comprehensive Disaster Risk Management Framework The Role of Local Actors 111 Safer Cities Session 1 World Bank Institute Fouad Bendimerad, Ph.D., P.E.
First Evaluation of Good Governance for Medicines Programme Brief Summary of Findings.
Data-Sharing and Governance Consultation ANALYSIS OF RESPONSES.
Professional Development in INTOSAI – a whitepaper Jan van Schalkwyk (SAI SA) INTOSAI Capacity Building Committee - Meeting in Lima, Peru 9-11 September.
Principles of Analysis and Dissemination Country Course on Analysis and Dissemination of Population and Housing Census Data with Gender Concern October.
Benefits for using a standardised risk management framework to risk assess Infection Prevention and Control Sue Greig Senior Project Officer National.
The Nuffield Council on Bioethics Report : The collection, linking and use of data in biomedical research and health care: ethical issues. Martin Richards.
A Measure of Disclosure Risk for Fully Synthetic Data Mark Elliot Manchester University Acknowledgements: Chris Dibben, Beata Nowak and Gillian Raab.
Risk Management and NSQHS Standards, Standard 3 – Preventing and Controlling Healthcare Associated Infections Sue Greig Senior Project Officer National.
Shelter Training 08b – Belgium, 16 th –18 th November, 2008 based on content developed by p This session describes the benefits of developing a strategic.
Experiences in Undergraduate Studies in the University of Zaragoza LEFIS Undergraduate studies Oslo, 19 th -20 th May 2006.
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
Research Quality Assessment following the RAE David Sweeney Director, Research, Innovation, Skills.
Dr. David Mowat June 22, 2005 Federal, Provincial & Local Roles Surveillance of Risk Factors and Determinants of Chronic Diseases.
ALCIC - Advanced Level Computing & ICT Courses 1 Year 11 Advanced VCE/GCE Selection Advanced Level Computing and Information & Communications Technology.
Introducing the Administrative Data Research Network Tanvi Desai.
Shaping a Health Statistics Vision for the 21 st Century 2002 NCHS Data Users Conference 16 July 2002 Daniel J. Friedman, PhD Massachusetts Department.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
By: Dr. Mohammed Alojail College of Computer Sciences & Information Technology 1.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Brussels Privacy Symposium on Identifiability
Putting market outcomes on the centre stage The LSBs efforts to evaluate market change Robert Cross PSA 11 March 2016.
Developing the role of Clinical Research Practitioners in the NHS:
Health Promotion & Aging
Privacy Education Session CMHA-WECB/CCHC Volunteers/Students
Reconciling Public Policy with New Theories of Privacy
The Role of Financial Leadership in a Challenging Economic Climate “Financial Management from a Provincial Perspective” Presented by: Bruce L. Bennett.
MODULE 23 – CITY-TO-CITY COOPERATION (C2C)
THE SELF SUSTAINING NON-PROFIT Golden Lessons From the Development and Corporate Sectors 14th Eastern Africa Resource Mobilization Workshop Paper.
GEF governance reforms to enhance effectiveness and civil society engagement Faizal Parish GEC, Central Focal Point , GEF NGO Network GEF-NGO Consultation.
Summit 2017 Breakout Group 2: Data Management (DM)
Data Sharing Consultation Event
Monitoring and Evaluation Systems for NARS Organisations in Papua New Guinea Day 3. Session 7. Managers’ and stakeholders’ information needs.
Technical Cooperation Section SEDI- Executive Office
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
CARE 2020 Program Strategy Gender Equality and Women’s Voice
Information Technology (IT)
Ethical questions on the use of big data in official statistics
Species at Risk (SAR) Legislation & Program Renewal Project
CRUE – The Way Forward Vicki Jackson
A new “pre-graduation expectation” for graduating seniors
Accountability Issues in Proposal Writing
Objective of the workshop
Anonymisation: what is it and how do I do it
National Statistician’s Data Ethics Advisory Committee
New Data Innovation Projects: Data Privacy and Data Protection
United Nations Statistics Division
Considerations in Development of the SBSTA Five Year Programme of Work on Adaptation Thank Mr. Chairman. Canada appreciates this opportunity to share.
Privacy and Data Confidentiality Methods
Directorate General Information Society & Media
Developing a shelter strategy
Policy Change Department of Veterans Affairs
Expert Group Meeting on SDG Economic Indicators in Africa
Beyond Personal & Professional Decision Making
The Use and Impact of FTA
Are we being held back? An exploration of how evidence is used to address complex social problems Professor Kristy Muir Superu Evidence.
Formulation and Development of National IP Strategy
Evaluating regional decision making in the UK
Presentation to Primary Health Alliance 7 June 2019
Data at the Speed of Trust:
Presentation transcript:

Anonymisation: Theory and Practice Mark Elliot University of Manchester

Outline UKAN – who we are What anonymisation is The Anonymisation decision making framework Risk assessment techniques A look to the future

The UK Anonymisation Network

Motivation Transparent Government, Not Transparent Citizens (O’Hara 2011) Anonymisation is better understood theoretically than empirically Need a centre of excellence to study it Lack of capacity within the UK Lack of connectedness between different perspectives

Aims 1. To establish a mutual understanding of differences in perspective on anonymisation across sectors, disciplines and components 2. To synthesise key concepts into a common framework 3. To agree best practice principles 4. To create a strategy for network sustainability

4 Tier Structure Hub: Operations Partners: Management Group Core Network: Strategy Extended Network: The Community

The Partners

The Core Network 30 representatives from Academia: Computer Science, Law, Statistics Government Commercial sector Third sector NHS

From UKAN to UKAS Delivery of full service Website: www.ukanon.net Clinics Consultancy Engagement Accreditation Dissemination of best practice via case studies Anonymisation Decision Making Framework Secured new funding from Access to Data Fund Move to sustainable full-cost recovery

What is Anonymisation?

Starting with the legal Anonymisation is process by which personal data are rendered non personal.

Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms: “anonymised”

Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms “anonymised” or worse “truly anonymised”

Starting with the legal Anonymisation is process by which personal data are rendered non personal. Avoid using success terms “anonymised” or worse “truly anonymised” And “really truly anonymised” is right out

Anonymisation and de-identification Deal with different parts of the DPDs definition of personal data De-identification tackles: “Directly from those data” Anonymisation tackles: “Indirectly from those data and other information which is in the in the possession of, or is likely to come into the possession of, the data controller…”

Some tenets of our approach Anonymisation is not about the data about data situations Data situations arise from data interacting with data environments

Some tenets Data environments are: the set of formal and informal structures, processes, mechanisms and agents that: act on data; provide interpretable context for those data and/or define, control and/or interact with those data. Elliot and Mackey (2014)

Anonymisation types Absolute Anonymisation Formal Anonymisation Zero possibility of re-identification under any circumstances Formal Anonymisation De-identification (including pseudonymisation) Statistical Anonymisation Statistical Disclosure Control Functional Anonymisation

Unintended Disclosure Consists of two processes: Identification: the (correct) association of a population unit and a data unit Attribution: the (correct) association or disassociation of an item of data with a population unit Can occur independently Without the latter there is no disclosure

The Anonymisation Decision Making Framework

What is the ADMF? A system for developing anonymisation policy A practical tool for understanding your data situation Not a checklist

The data controller’s responsibility Understand how a privacy breach might occur Understand the possible consequences of the breach Reduce the risk of a breach occurring to a negligible level

10 step process for functional anonymisation Know your data Understand the use case Understand the legal issues and governance Understand the issue of consent and your ethical obligations Know the processes you will need to go through to assess the risk of re-identification/disclosure Know the processes you will need to go through to anonymise your data Understand the environment into which you share or release the data Know your audience and how you will communicate Know what to do if things go wrong What happens next once you have shared and or release data

10 step process for functional anonymisation Know your data Understand the use case Understand the legal issues and governance Understand the issue of consent and your ethical obligations Know the processes you will need to go through to assess the risk of re-identification/disclosure Know the processes you will need to go through to anonymise your data Understand the environment into which you share or release the data Know your audience and how you will communicate Know what to do if things go wrong What happens next once you have shared and or release data

Know the processes you will need to go through to assess the risk of re-identification Data environment analysis Scenario analysis Statistical disclosure risk assessment Penetration tests Comparative data situation analysis

Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables

Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables

Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables

Elliot and Dale Scenario Framework INPUTS OUTPUTS Motivation Means Opportunity Target Variables Attack Type Effect of Data Divergence Likelihood of Success Goals achievable by other means? Consequences of attempt Likelihood of attempt Key/Matching Variables

Understand the data environment Global Always important but especially for open data.

Understand the data environment Global Local Data Agents Governance Security Infrastructure

The Disclosure Risk Problem: Identification The identification file Name Address Sex DOB .. Sex Age .. Income .. .. The target file ID variables Key variables Target variables

Risk is present At the variable level At the case level Power Differentiation Skew Quality: susceptibility to divergence Availability At the case level Outliers

Risk is present At the case level Outliers Vulnerable person’s

Risk is present At the attribute value level Unusual characteristics Sensitive attribute values

The Disclosure Risk Problem II: Attribution

The Disclosure Risk Problem III: Subtraction

The Disclosure Risk Problem III: After Subtraction

SAP: Example output

Many other attack forms Table linkage Stream linkage Mash attacks Cross dataset linkage enhancement Match and search Repetitive queries Data hiding Data manipulation Response knowledge Etc etc.

How much risk is negligible? Policy decision based on risk appetite Mature Risk management triangulates across the data situation: Disclosiveness Sensitivity Environment Agents Data Governance Security

Some Futurology Given the current trends and likely future technological change Anonymisation will become a mostly meaningless concept within 15-20 years If we care about privacy (and democracy) then we need a radically different solution. Personal data stores (and no central databases) Economic models

Summary Anonymisation done correctly is a functional process which turns personal data to non-personal data. There a variety of techniques and tools which can be used to asses the likelihood of an attack occurring and the likelihood of it succeeding once it has occurred. Functional anonymisation requires an evaluation of the totality of a data situation not just the data in question. Anonymisation will have a lifespan.