Anonymising quantitative data

Slides:



Advertisements
Similar presentations
Data security and controlling access Managing research data well workshop London, 30 June 2009 Manchester, 1 July 2009.
Advertisements

Accessing and managing data in a secure environment: the Secure Data Service Matthew Woollard Head of Digital Preservation and Systems, UKDA The significance.
Accessing the MCS from the Economic and Social Data Service Jack Kneeshaw MCS workshop 13 October 2009 ESDS Longitudinal.
Dealing with confidential research information anonymisation techniques and other measures to enable using and sharing research data Data Management and.
Dealing with confidential research information - Anonymisation techniques and access regulations to enable using and sharing research data Data Management.
Anonymisation techniques and other measures to enable using and sharing research data Managing and Sharing Research Data workshop London, 2 December 2009.
UK Data Archive Microdata Access and the New ESRC Secure Data Service Melanie Wright, UKDA 2 nd Workshop on Data Access Cardiff, February 2009.
Application of the Benefits Analysis Tools for MRC population health studies Professor Dipak Kalra Centre for Health Informatics and Multiprofessional.
MANAGING YOUR DATA WELL …………………………………………
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
2 1.Client protection principles 2.Principle #6 in practice 3.The client perspective 4.Participant feedback 5.Tools for improving practice 6.Conclusion.
Records Management and the NHS Code of Practice (Foundation) Information Governance Policy Team NHS Connecting for Health.
Records Management and the NHS Code of Practice (Foundation) Information Governance Policy Team NHS Connecting for Health.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
The Special Licence model for access to more detailed micro data IASSIST 2006 Thursday 25 May Karen Dennison UK Data Archive.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Strengthening Data Security Dr. Sharon Bolton Dr. Matthew Woollard.
FERPA 101 Student Records: Institutional Responsibility and Student Rights What Every University Employee Should Know Prepared by the Office of the Registrar.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Dealing with confidential research information and consent agreements in research Louise Corti Associate Director UK Data Archive University of Glamorgan.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.
Guidelines for data preparation - ESRC Datasets Policy Louise Corti ESDS/UKDA Social Science Data Archives for Social Historians: creating, depositing.
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Access to microdata in Europe P resented by Michel Isnard – Insee DwB Training Course, Barcelona, Jan
Population Census carried out in Armenia in 2011 as an example of the Generic Statistical Business Process Model Anahit Safyan Member of the State Council.
ESRC Datasets Policy and Qualitative Data Preparation Gill Backhouse Senior Acquisitions and Liaison Officer Qualidata.
Medical Law and Ethics, Third Edition Bonnie F. Fremgen Copyright ©2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
ELSA ELSA datasets and documentation available from the archive or by special arrangement Kate Cox National Centre for Social.
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
Open Access to Data Confidentiality, Consent and Archive Access CESSDA, Athens October John Southall ESDS Qualidata.
About the Secure Data Access For the academic research community in the UK Delivered by the UK Data Service/Archive Funded by the Economic and Social Research.
ANONYMISATION Research Data Management. c Research Data Management Sensitive Data Sensitive Data is information covering: The racial or ethnic origin.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Data for secondary analysis: the experience of the UK Data Archive Hilary Beedham UK Data Archive.
Development of UK Virtual Microdata Laboratory Felix Ritchie Shanghai, March 2010.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
Privacy and Personal Information. WHAT YOU WILL LEARN: What personal information is. General guidelines for the collection of personal information. Your.
Key Knowledge Confidentiality Year 4 Medical Ethics and Law Thread Course The Ethox Centre, University of Oxford.
Health & Social Care Information Centre SEPHIG: 12 th September, 2012.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Development of UK Virtual Microdata Laboratory
Data Confidentiality and the Common Good.
Access Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
Graduate Search Clinics
UK Data Service Secure Lab
Karen Dennison Collections Development Manager
Facility Level Reviews
Working with Sensitive or Confidential Data John Southall Bodleian Data Librarian Subject Consultant for Economics, Sociology, Social Policy and.
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
What is Administrative Data?
Susan Mowers, Data Librarian, GSG Centre - UOttawa
Sabrina Iavarone Senior User Services Officer
CPD Programme for Policing Data Specialists Fundamentals
WORKSHOP ON THE DATA COLLECTION OF OCCUPATIONAL DATA Luxembourg, 28 November 2008 Occupation as a core variable in social surveys Sylvain Jouhette
D3 Confidentiality.
Information management and communication
Data Protection Act and Anonymisation of Research Data
Consent A brief to the Patients.
Data Management Ethical considerations for educational research
High-level Working Group on Statistical Confidentiality
Open data in the social sciences, conundrum or feasible?”
Item 2.2 of the Agenda Remote access to confidential data for researchers: possible actions under the 7th Framework Programme Pascal JACQUES Unit B 5 15.
Mapping Data Production Processes to the GSBPM
Adult Education Survey progress report Point 6
A strategic approach to data development and data sharing in the social sciences Peter Elias NCRM/SRA Workshop: "Data Linkage: Exploring the Potential"
The Career and Technical Education (CTE) Completer Follow-up
Item 2.2 Scientific Use Files for the Time Use Survey
Presentation transcript:

Anonymising quantitative data Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex Anonymising Research Data workshop Dublin, 22 June 2016

The UK Data Service Single point of access to wide range of social science data: ukdataservice.ac.uk Funded by the ESRC to serve the academic community: training and guidance; UK Data Archive established 1967 Used by academic researchers and students; government analysts; charities; business; research centres; think tanks Survey microdata; cohort studies; international macrodata; census data; qualitative/mixed methods data Support and guide data creators, including disclosure review (anonymisation) and preparation for archiving

Protecting confidentiality: the ‘5 Safes’ Five guiding principles: Safe people - educate researchers to use data safely Safe projects - research projects for ‘public good’ Safe settings - SecureLab system for sensitive data Safe outputs - SecureLab projects outputs screened Safe data - treat the data to protect respondent confidentiality For this session, we will concentrate (mostly) on Safe data

Data collection: planning Explain to respondents what archiving entails and gain agreement for data sharing – informed consent Think about disclosure risks before starting – what kind of information do you need to collect? Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them? Unless explicit consent obtained for sharing, direct identifiers should always be removed from data

Anonymising data: indirect identifiers Indirect identifiers include: Sensitive information: health information/medical conditions; crime victimisation/offending; drug/alcohol use etc. ‘Less sensitive’ information: age/birth date; educational characteristics; employment details; religious affiliation; household size; geographic area Look at demographics in combination (e.g. demographics + geographies) Text/string variables – too detailed?

Anonymising indirect identifiers Aggregate categories to reduce precision Band ages, incomes, expenditure, etc. to disguise outliers Use standard coding frames – e.g. SOC2010 Generalise meaning of detailed text Document the changes you make Talk to other researchers, archives, data services Published guides: UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/disclosurecontrol/policyforsocialsurveymicrodata

Anonymising data: new developments and tools Statistical Disclosure Control (SDC) software is available: mu-Argus standalone software package recommended by Eurostat for government statisticians software and manual: http://neon.vb.cbs.nl/casc/mu.htm R tool - SDCMicro (GUI) Software, manual: http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro new documentation being developed by UK Data Service, working with R developers

Quiz 1: disclosive text in job title Frequency Valid Percent nurse 73 73.0 carer for elderly man 1 1.0 hospital ward cleaner social science researcher head of dental practice 2 2.0 cleaner in electronics factory Financial Director, Sunnyview Care Home, Colchester general manager GP Manager, Cotterill Village Stores works in electronics factory on benefits, not working police officer consultant, geriatric psychiatry Reetired retired Retired retirement geography teacher Teacher, music Seondary school teeacher unemployed web designer Total 100 100.0

Quiz 1: jobs coded with SOC2010 Job title: SOC2010 Frequency Valid Percent 1131: Director, financial 1 1.0 1171: Manager, general 1190: Manager, retail 2231: Nurse 73 73.0 2426: Researcher 2215: Dentist 2 2.0 2211: Doctor, medical 3312: Officer, police 2314 Teacher, secondary 3 3.0 2137: Designer, web 6145: Carer 9139: Worker, factory 9233: Cleaner Retired 4 4.0 Unemployed Total 100 100.0

Quiz 2: detailed religion categories Religious affiliation   Frequency Valid Percent 1 Protestant 41 41.4 2 Anglican 4 4.0 3 Catholic 26 26.3 4 Muslim 8 8.1 5 Sikh 5 5.1 6 Jehovah's Witness 6 6.1 7 Methodist 1 1.0 8 Mormon 9 Baptist 10 Buddhist 3 3.0 11 None 12 No religion 13 Moravian Total 99 100.0

Quiz 2: religion categories aggregated Religious affiliation   Frequency Valid Percent 1 Protestant 49 49.0 3 Catholic 26 26.0 4 Muslim 8 8.0 5 Sikh 5 5.0 6 Other religion 10 10.0 7 No religion 2 2.0 Total 100 100.0

Quiz 3: age in years Age in years Frequency Valid Percent 16 3 3.0 17   Frequency Valid Percent 16 3 3.0 17 18 9 9.0 19 20 16.0 21 4 4.0 22 2 2.0 23 24 25 26 27 28 29 30 31 1 1.0 32 40 11 11.0 41 42 43 49 50 13 13.0 51 60 61 62 63 64 Total 100 100.0

Quiz 3: banded age Age (banded) Frequency Valid Percent 1 16-20 40   Frequency Valid Percent 1 16-20 40 40.0 2 21-30 22 22.0 4 41-50 13 13.0 5 51-60 19 19.0 6 60-64 6 6.0 Total 100 100.0

Access control Don’t over anonymise - find balance between protecting respondents’ confidentiality and maintaining research usability of data Can’t fully anonymise data without removing all the useful detail? Go back to the 5 Safes – think about access control: Safe people, Safe settings, Safe outputs

Access control At UK Data Service, data available under 3 access levels: OPEN – open public access SAFEGUARDED – downloadable, but use is traceable Registered users only (agree not to try to identify any individual respondents) Special agreements/licence: permission-only access; approved projects – usage agreed in advance CONTROLLED – accredited users take a further training course Access via on-site safe setting or virtual secure environment (SecureLab) Outputs disclosure-checked before publication

Anonymising quantitative data: summary Informed consent Think about level of detail needed before data collection Remove direct identifiers Check and treat indirect identifiers to reduce disclosure risk Document your changes Balance anonymisation with access control to preserve data usability

Questions? Guidance on anonymisation: UCD: http://libguides.ucd.ie/data/ethics UKDS: www.data-archive.ac.uk/create-manage/consent-ethics/anonymisation Managing and Sharing Research Data book https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-data/book240297