Anonymisation – a promise now broken?

Slides:

Advertisements

Similar presentations

NIGB Legal requirements for use of personal data in research OnCore UK / NRES Training workshop Ethical Principles relating to consent for use of samples.

Advertisements

What Should You Know and Be Doing About Genome Privacy? Ellen Wright Clayton, MD, JD Center for Biomedical Ethics and Society Vanderbilt University.

Data Protection: Health. Data Protection & Health Data Data on physical or mental health or condition or sexual life are ‘sensitive personal data’ with.

Online patient records – safety and privacy Ross Anderson Cambridge University London, April

Health Information Supplier Forum ‘Open data, a platform for change’ Garry Coleman, Health & Social Care Information Centre.

Where’s the Equilibrium? Ross Anderson Cambridge.

UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.

XX Conference Seattle 2005 Consent and other legalities MedLawconsult.

NHS Databases – The Big Opt Out Ross Anderson Cambridge University and Foundation for Information Policy Research.

The Nuffield Council on Bioethics Report : The collection, linking and use of data in biomedical research and health care: ethical issues. Martin Richards.

Pilot survey on on-line patient registries Go Yoshizawa (Osaka University)

Why anonymity fails Ross Anderson Cambridge University Open Data Institute, 4/4/2014.

Dealing with confidential research information and consent agreements in research Louise Corti Associate Director UK Data Archive University of Glamorgan.

Healthcare Group: The 12 Stories Peng (group lead), Paul, Bhavani, Le, Gail, Prabhakaran, Khan, Murat Feb 19-20, 2009 NSF Data & Application Security Workshop.

Personal Privacy Ross Anderson Professor of Security Engineering Cambridge University.

English NHS IT – The Three Big Mistakes Ross Anderson Cambridge University Edinburgh, February

Care.data: listening to you Robin Burgess Regional Head of Intelligence

EVIDENCE BASED MEDICINE Health economics Ross Lawrenson.

Privacy Impact Assessments Iain Bourne, Group Manager, Policy Delivery Information Commissioner’s Office, UK Workshop on data protection and the internet:

Library Services welcomes Postgraduate Researchers Gerald Watkins Library Services Subject Advisor (Government and Society; Social Policy)

Data, Security and Human Subjects Research Deborah Barnard, MS.

MY ONLINE CODE In the last 2 years of digital literacy I have learned the proper ways to act and why to act that way online, I have also learned other.

ANONYMISATION Research Data Management. c Research Data Management Sensitive Data Sensitive Data is information covering: The racial or ethnic origin.

Easy Read Summary Mental Capacity Act Mental Capacity Act A Summary The Mental Capacity Act 2005 will help people to make their own decisions.

An Introduction to the Privacy Act Privacy Act 1993 Promotes and protects individual privacy Is concerned with the privacy of information about people.

1Copyright Jordan Lawrence. All rights reserved. U. S. Privacy and Security Laws DELVACCA INAUGURAL INHOUSE COUNSEL CONFERENCE April 1, 2009 Marty.

Privacy & Confidentiality in Internet Research Jeffrey M. Cohen, Ph.D. Associate Dean, Responsible Conduct of Research Weill Medical College of Cornell.

Copyright © 2015 by Saunders, an imprint of Elsevier Inc. All rights reserved. Chapter 3 Privacy, Confidentiality, and Security.

Human Subjects Protection Program Office of Research Compliance Navigating through the current HSPP and IRB Presented by: Danielle Griffin, M.S. Research.

Privacy, data protection and connected cars Lilian Edwards, Professor of Internet Law University of Strathclyde Researcher in Residence, Digital Catapult.

Introduction to the Australian Privacy Principles & the OAIC’s regulatory approach Privacy Awareness Week 2016.

“Translational research includes two areas of translation. One (T1) is the process of applying discoveries generated during research in the laboratory,

BMJ’s research misconduct survey Sara Schroter 1, Fiona Godlee 2, Elizabeth Wager 3, Malcolm Green 4 1 Senior Researcher, BMJ4 Former Vice Principal,

Business Challenges in the evolution of HOME AUTOMATION (IoT)

Commissioning Services: with the DPA in mind South Yorkshire Information and Data Sharing Group Sheffield 14 th August 2014 Lynne Shackley Lead Policy.

An agency of the European Union Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070 Industry.

Information Governance A refresher for all staff who have previously gone through the full course.

Ross Anderson Cambridge

EU Data Protection Reform: An ICO Perspective

REFLECT: Recovery Following Intensive Care Treatment

Brussels Privacy Symposium on Identifiability

University of Texas at El Paso

Big Data Considerations

Reconciling Public Policy with New Theories of Privacy

What’s going wrong in the UK

Amandine Jambert - IT Experts Department

Graduate Search Clinics

Take-home quiz due! Get out materials for notes!

Please rate and review on TES.com

Working with Sensitive or Confidential Data John Southall Bodleian Data Librarian Subject Consultant for Economics, Sociology, Social Policy and.

General Data Protection Regulations: what you really need to know

Anonymisation: Theory and Practice

NO The Right to Say NO by Steven Powe

CPRD: An introduction to the Clinical Practice Research Datalink in Cambridge Rupert Payne.

Big Data Considerations

The session will commence at Please mute your microphone

Inference and Flow Control

Information Governance

G.D.P.R General Data Protection Regulations

Patient information: Research study taking place today

General Data Protection Regulation

Mathew Norman, Policy & Public Affairs Officer, RLA Wales

D3 Confidentiality.

care.data: listening to you

NO The Right to Say NO by Steven Powe

TRACE INITIATIVE: Confidentiality, Data Security, and Procedures for Protocol Violation or Adverse Event.

IT and Society Week 2: Privacy.

Differential Privacy (1)

Getting Ready For GDPR Simon Marks Director

Should we also regulate non-personal data?

Presentation transcript:

Anonymisation – a promise now broken? Ross Anderson Cambridge 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS Outline of talk How the statistician’s view of inference control differs from the security engineer’s Trackers Context and threat evolution Anonymity sets and privacy sets Active attacks Regulatory Failure Industry views and ways forward 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS Common ground Over the past 30 years we’ve both used stuff like: Query set size controls Perturbation Trimming outliers (e.g. the one guy with HIV in Chichester in 1996) Using larger scales for rarer attributes (strokes by medical practice, liver transplants nationally) The main difference: security engineers specialise in system-level adversarial thinking 13/11/2014 Cathie Marsh Lecture, RSS

The security researcher’s view The 1980 US census moved from static totals and samples to online queries Dorothy Denning wondered whether she could work out her boss’s salary Basic idea: to find the salary of the female professor of computer science, ask “average salary professors?” then “average salary male professors?” (or even non-professors) Such ‘tracker attacks’ are absolutely pervasive 13/11/2014 Cathie Marsh Lecture, RSS

The security researcher’s view (2) Contextual knowledge is really hard to deal with! For example in the key UK law case, Source Informatics (sanitised prescribing data): Week 1 Week 2 Week 3 Week 4 Doctor 1 17 21 15 19 Doctor 2 20 14 3 25 Doctor 3 18 26 13/11/2014 Cathie Marsh Lecture, RSS

The security researcher’s view (3) Context enables deductive reidentification Postcode sector plus year of birth plus almost anything else (e.g. date of discharge) ‘Show me all patients who’ve had an umbilical hernia repair and a left leg fracture tibia only’ ‘Show me all 42-yo women with 9-yo daughters where both have psoriasis’ If you link episodes into longitudinal records, most patients can be re-identified 13/11/2014 Cathie Marsh Lecture, RSS

The security researcher’s view (3) The dynamic aspects are really important! While a census is a snapshot every ten years, a medical database is constantly updated If you see the anonymised version but have access to some real-world context, fine-grained timing provides a fat covert channel E.g. the DeCode database in Iceland in 1998 had near-real-time updates 13/11/2014 Cathie Marsh Lecture, RSS

The security researcher’s view (4) The context is also dynamic! The scale of the data available for cross-referencing is growing by orders of magnitude Many websites are going ‘social’ Medical records are acquiring genomic data, as are ‘hobby’ websites like 23andme The move from PCs/laptops to phones/tablets makes movement and location data pervasive 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS Some terminology My anonymity set is the set of data subjects from whom I cannot be distinguished Depending on the application I might want an anonymity set size of 2, 3, 6, 50, 10000… My privacy set is the set of people whose opinion I care about This might be the Dunbar number, my Facebook friends, or the whole population if I’m a celeb 13/11/2014 Cathie Marsh Lecture, RSS

Privacy for mathematicians All I need is that no-one in my privacy set has an anonymity set for me that’s below the threshold for the sensitive attribute Problem 1: the privacy set is dynamic. If my kid gets kidnapped I’m an instant celeb … Problem 2: with every successive query, the anonymity set shrinks until all the headroom – the privacy budget – is spent (idea behind “Differential privacy” from Microsoft) 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS Privacy for lawyers For 35 years, we statisticians and security engineers have known that anonymity is hard (albeit with different terminologies and different research literatures) Lawyers and policymakers don’t read math or CS, so happy ignorance prevailed Then Paul Ohm “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization” UCLA Law Review 2010 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS Privacy for CEOs PCAST (Presidential Committee of Advisers on Science & Technology) report on Big Data The spread of audio and gesture interfaces mean microphones and cameras in all inhabited spaces, reporting to ‘the cloud’ Anonymisation? “PCAST does not see it as a useful basis for policy” Wanted usage controls only; ECJ disagreed! The firms processing the data carry liability 13/11/2014 Cathie Marsh Lecture, RSS

Privacy for politicians First DP Act: “Maggie doesn’t approve of data privacy. It’s a German idea, like ID cards. But we have to do some to keep the Euros happy” Second DP Act: failure to transpose recital 26 ICO Anonymisation Code of Practice Ever harder pushback from Europe: Article 29 working party opinion 05/2014 DP regulation, currently in trilogue S8 ECHR, precedents such as I v Finland 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS What goes wrong (1) Cameron policy announced January 2011: make ‘anonymised’ data available to researchers, both academic and commercial, but with opt-out We’d already had a laptop stolen in London with 8.63m people’s ‘anonymised’ records on it In September 2012, CPRD went live – a gateway for making anonymised data available from (mostly) secondary care (now online in the USA!) GPES was to hoover up GP stuff into care.data 13/11/2014 Cathie Marsh Lecture, RSS

Advocating anonymisation 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS And ‘transparency’ 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS CPRD The clinical practice research datalink, run by the MHRA, makes some data available to researchers (even to guys like me :-) I made a freedom of information request for the anonymisation mechanisms Answer: sorry, that would undermine security Never heard of Kerckhoffs’ principle? Search for me, cprd on whatdotheyknow.com 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS The row over HES Hospital Episode Statistics (HES) has a record of every finished consultant episode going back 15 years (about a billion in total) Mar 13: formal complaint to ICO that PA put HES data in Google cloud despite many rules on moving identifable NHS data offshore Apr 3: HSCIC reveals that HES data sold to 1200 universities, firms and others since 2013 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS The row over HES Some HES records have name, address, … Some have only postcode, dob, … Some have this removed but still have “HESID” which usually contains postcode, dob Even if the HESID were encrypted, what about cardioversion, Hammersmith, Mar 19 2003? Yet the DoH describes pseudonymised HES data as “non-sensitive”! 13/11/2014 Cathie Marsh Lecture, RSS

ICO response to NGOs (29 Oct) We asked: ‘Precisely which patient data were stored outside the UK? Did they relate to single episodes or linked records? Did they contain postcode, date of birth, NHS number, or a pseudonym such as an encrypted NHS number?’ ICO: ‘Having met with and considered material provided by the HSCIC we reached the conclusion that the information provided by them to PA Consulting did not constitute personal data because a living individual could not be identified from the data.’ 13/11/2014 Cathie Marsh Lecture, RSS

ICO response to NGOs (continued) We pointed out: ‘Neither [HSCIC nor PA Consulting] has denied postcode, or year of birth, or the use of a pseudonym that would enable episode records to be linked.‘ ICO: ‘We also investigated whether this information had been linked to other data to produce “personal data” which was subject to the provisions of the Act. We asked PA Consulting whether they had done this. They confirmed that they had not. ’ 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS HES data bought by … Report items 40–42, 46–47, 62–66, 95–98, 159–162 … : selling data outside the NHS 191–2, 321, 329, 331, 362 … drug marketing 329, 336, … medical device marketing 408: Imperial College with HESID, date of birth, home address, GP practice: still marked "non sensitive” Many other uses for the data: market analysis, benchmarking, efficiency… 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS What works and doesn’t DoH policy – selling gigabytes of lightly anonymised personal records – doesn’t work Stable and well-designed applications like IMS Health can work, given adversarial testing Keeping big data in one place and being really careful, with big thresholds, can be less bad Privacy economics must be more sophisticated Obscurity doesn’t work though. We need to know what’s done with out data 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS To sum up … People used to assume that anonymisation and aggregation could protect privacy That assumption has been undermined since about 1980 by many technological changes First, engineers realised it didn’t work. Then lawyers. Now the politicians have screwed up so royally it’s just not credible Minimal fix: read Art 29 WP guidance, human rights law, and Kerckhoffs’ principle 13/11/2014 Cathie Marsh Lecture, RSS

Cathie Marsh Lecture, RSS An ethical way forward Rather than trying patch up with ‘risk management’, go back to first principles: Respect for persons Respect for the law, including ECHR Proper consultation with data subjects and others with morally relevant interests Proper reporting of uses, including breaches See Nuffield Bioethics Council report on Biodata, due out February, for details. 13/11/2014 Cathie Marsh Lecture, RSS