WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.

Slides:



Advertisements
Similar presentations
Allyn & Bacon 2003 Social Work Research Methods: Qualitative and Quantitative Approaches Topic 5: Ethics and Politics in Social Work Research.
Advertisements

Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
The Statistics Act and Research Access to Data Paul J Jackson Legal Services ONS.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
National Science Foundation Division of Science Resources Statistics May The Confidential Information Protection and Statistical Efficiency Act.
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
The Special Licence model for access to more detailed micro data IASSIST 2006 Thursday 25 May Karen Dennison UK Data Archive.
Assessing Disclosure Risk in Sample Microdata Under Misclassification
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
The UK Statistics and Registrations Services Act Tanvi Desai Data Manager LSE Research Laboratory Research Laboratory IASSIST Tampere.
1 Human resources management in NSOs Training workshop for SADC member states. Luanda, 2-6 Dec 2006 Olav Ljones, Deputy Director General, Statistics Norway.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
Sub-session 1B: General Overview of CRVS systems.
GEOG3025 Census and administrative data sources 2: Outputs and access.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
The Statistical Business Register of Macao SAR Government of Macao SAR Statistics and Census Service.
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
Dissemination to support Research & Analysis John Cornish.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Access to microdata in Europe P resented by Michel Isnard – Insee DwB Training Course, Barcelona, Jan
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Census/NeSS Roadshows March 2003 Better Information Initiatives.
GEOG3025 Confidentiality and social implications.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Eloise Forster, Ed.D. Foundation for Educational Administration (FEA)
Access to sensitive data in the UK: a principles-based approach Felix Ritchie.
OPEN UP! Introduction to handling Freedom of Information requests.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
Statistical data confidentiality and micro data in Albania
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
IM NETWORK MEETING 20 TH JULY, 2010 CONSULTATION WITH 3 RD PARTIES.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Institutional and legal framework of the national statistical system: the national system of official statistics Management seminar on global assessment.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Development of UK Virtual Microdata Laboratory Felix Ritchie Shanghai, March 2010.
United Nations Workshop on Principles and Recommendations for a Vital Statistics System, Revision 3, for African English-speaking countries Addis Ababa,
The Review of the Dissemination of Health Statistics Carole Abrahams Office for National Statistics.
Ethics: Doing the Right Thing
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Researchers’ Usage of Microdata The example of Statistics Finland Advanced presentation – Some additional details Consultation Mission on Promoting the.
Key Knowledge Confidentiality Year 4 Medical Ethics and Law Thread Course The Ethox Centre, University of Oxford.
National Statistics - access and disclosure issues for Vital Events data Allan Baker Office for National Statistics.
The London Health Observatory: monitoring health and health care in the capital, supporting practitioners and informing decision-makers Disclosure control.
Ethical, legal and social aspects of public health genomics Mark Taylor, School of Law, University of Sheffield 7 th November 2014.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Hallgrímur Snorrason Management seminar on global assessment Session 6: Institutional and legal framework of the national statistical system Yalta
Natalie Shlomo Social Statistics, School of Social Sciences
Disclosure scenario and risk assessment: Structure of Earnings Survey
Development of UK Virtual Microdata Laboratory
Assessing Disclosure Risk in Microdata
Confidentiality in Published Statistical Tables
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Research Ethics Matthew Billington
2001 Census Disclosure Control UK variations
Ethical questions on the use of big data in official statistics
TG EHIS January 2012 Item 3.2 of the agenda EHIS wave 1 anonymised data Bart De Norre, Eurostat.
Ethics: Doing the Right Thing
Functioning of the vital statistics system
Presentation transcript:

WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS

The Statistical Disclosure Control Problem Original Data Data Utility Maximum Tolerable Risk Accessed Data No data Disclosure Risk

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Legal Issues Legal Context No general statistics act No comprehensive business register No population register Registrations of Births, Marriages and Deaths are public – including cause of death A system of common law An Information Commissioner – a privacy and an access to information champion with court powers Data Protection / Human Rights Freedom of Information

Legal Issues Legal Context continued : Business Surveys have statutory protection –But ONS has the lawful authority to disclose identified business survey data to any central government department for any purpose, and any local authority for their planning purposes.

Legal Issues Legal Context continued : Census records have statutory protection –But ONS has lawful authority to disclose personal census information to any person for statistical purposes.

Legal Issues Legal Context continued : Household survey records are protected by the civil “common law duty of confidence.” –But ONS has lawful authority to disclose identifying household survey data to any person where there is informed consent. –And ONS survey pledges obtain consent for disclosures of ‘detailed but anonymised data’to any genuine researcher.

Legal Issues Legal Context This extraordinary authority to disclose identifying microdata to certain persons, departments and authorities only delays the real issue – –The access needs management – MRP –When it is not ONS applying the SDC standards for outputs, then someone else has to. –Therefore usable standards and guidance are essential

Legal Issues Legal Context So when ONS has so many options, how does it decide – –i) who should have controlled access under what conditions, and –ii) what ONS or other users’ outputs should look like. So we need Policy

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Policy Issues So we need Policy National Statistics Code of Practice for the GSS Protocol for data access and confidentiality –A Confidentiality Guarantee, –National Statistics are guaranteed not likely to identify an individual, assuming an intruder is prepared to use a proportionate amount of time, effort and expertise. Departmental policy –Variations according to considerations of : data source type risk analysis and management methodology access / release options

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Risk Assessment An element of disclosure risk comes from records that are unique in the sample and in a known population Several approaches to assessing the disclosure risk in microdata: –Disclosure risk scenarios –Variable checklist –Quantitative risk measures

Disclosure Risk Scenarios Identify possible situations where disclosure risk could occur Assumptions concerning prior knowledge of intruder and information available to him, e.g. private database, journalist, nosy neighbour Identify key variables - indirectly identifying variables Use this process to decide what needs to be protected against –can be complex –requires discussion and judgement

SDC Checklist for Microdata Release Level of geography Ethnic classification Detail of occupation Visible variables Traceable variables Survey design Dissemination

Quantitative Risk Assessment Recognised need for quantitative risk measures Research project initiated Need for individual and global risk measures Problem for sample microdata is that population is an unknown parameter Different methods for estimating the disclosure risk measures –Heuristics –Probabilistic models

Probabilistic Modelling Estimate the disclosure risk based on natural assumptions about the distribution of the population Provides linked estimates of individual and global risk measures Research focused on –Model selection techniques –Robustness of estimates –Goodness of fit criteria Tested on ONS social surveys

Heuristics DIS/SUDA method consists of two elements –DIS - file level assessment of risk –SUDA - grades and orders records within a file according to level of risk Provide variable and variable value contribution to the risk Implemented by ONS for 2001 Census SAR

Evaluation of Quantitative Risk Measures Simulate sample surveys from Census data Compare risk measures with true risk Practical considerations How to set thresholds Incorporate risk measures into MRP decision process

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

SDC for Microdata Perturbative methods –Record swapping –Adding noise Non-perturbative methods –Recoding –Suppression –Sub-sampling Mixed strategies ONS mainly implements recoding PRAM implemented for 2001 Census SAR

Access Options - SPECIALISTS Data Laboratory Only government can use identifying business micro-data Identified census data is high risk Hence the on-site lab and the employment contracts Only safe data can leave the laboratory. Approx 150 users/yr

Access Options - GOVERNMENT Access Agreements in central and local government. UK is a devolved statistical system ONS discloses identifying survey micro-data to other government departments for statistics and research purposes –Users are professionals like us, subject to the same Code of Practice, and the same laws. –We don’t screen for research validity –We don’t check outputs Approximately 300 disclosures of confidential micro-data every year No known breaches of confidentiality.

Access Options - RESEARCH For the academic researchers, the UK Data Archive If it didn’t exist, we’d have to invent it. All ONS household survey datasets are deposited with UKDA –Year of birth, regional geography, all other variables (limited coding) –Some large households removed Academic researchers and government departments can download the dataset upon signing a user license. Takes about an hour. –This year, 16,600 downloads have taken place. Each can have up to 10 users in the institution…. –ONS does not screen the license applications –ONS does not vet the research proposals –ONS does not check outputs –In place for 30 years now –No known instance of wrongful identification.

Access Options The UK Data Archive, con’t But this is not enough. So ONS has now created the ‘Special License’ –Month of birth –Local authority geography –All households –Still access by downloading the data. ONS does check each Special License application –But not for valid research, only data needs, –And we still don’t check any outputs

Access Options - PUBLIC For the Public, Freedom of Information –ONS can only withhold microdata where its disclosure to an applicant would be likely to result, in : A breach of any law it was collected under An actionable breach of confidence A breach of a data protection principle –The Scottish Information Commissioner has instructed the Scottish Health Service to disclose to an applicant the counts of Leukaemia in under 14 yr olds by Ward (average ward population approx 4,000) The table was all 1s and zeros – effectively microdata, and ‘safe‘.

Access Options Are ONS access options and practices reasonable? They follow the constructs used by the Courts and Information Commissioners, in that policies are written in plain English Licensed academic users are, in 30 years of experience, not intruders. They are trusted colleagues – and like us they can make mistakes sometimes. Other civil service professionals are not intruders – they are as reliable and trustworthy as we are. They too have professional codes of conduct, ethics, and moral principles All statisticians and researchers need clear rules, and should be trusted to follow them.

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

Protecting and Providing Access to Microdata Legal Issues Policy Issues The Data Risk Assessment Risk Management Output Test and Evaluate

OUTPUTS Whatever access privileges Whatever research topic Whoever you are Outputs must be protected to the same standards Best research carried out when richest microdata is made available to those that can be trusted to apply these standards for outputs