WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.

Slides:



Advertisements
Similar presentations
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Advertisements

Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
WP 33 Information Loss Measures for Frequency Tables Natalie Shlomo University of Southampton Office for National Statistics Caroline.
Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics
© Statistisches Bundesamt, IIA - Mathematisch Statistische Methoden Summary of Topic ii (Tabular Data Protection) Frequency Tables Magnitude Tables Web.
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
Quality assurance -Population and Housing Census Alma Kondi, INSTAT, Albania.
Assessing Disclosure Risk in Sample Microdata Under Misclassification
Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Len Cook: Hero or Zero of the 2001 Census? OR A look at the impact of disclosure control on aggregate census outputs.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Merging census aggregate statistics with postal code-based microdata Laine Ruus University of Toronto. Data Library Service ,
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (28-30 October 2009) Accuracy evaluation of Nuts level 2 hypercubes with the adoption of.
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
1 Tel Aviv April 29th, 2007 Disclosure Limitation from a Statistical Perspective Natalie Shlomo Dept. of Statistics, Hebrew University Central Bureau of.
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Some ACS Data Issues and Statistical Significance (MOEs) Table Release Rules Statistical Filtering & Collapsing Disclosure Review Board Statistical Significance.
Joint UNECE / Eurostat meeting on Population and Housing Censuses 7-9 July 2010, Geneva Disseminating Census information to maximise use and value Keith.
MDG data at the sub-national level: relevance, challenges and IAEG recommendations Workshop on MDG Monitoring United Nations Statistics Division Kampala,
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
American Community Survey (ACS) Product Types: Tables and Maps Samples Revised
Statistical data confidentiality and micro data in Albania
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Copyright 2010, The World Bank Group. All Rights Reserved. Managing Data Processing Section B.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
1 1 Confidentiality protection of large frequency data cubes UNECE Workshop on Statistical Confidentiality Ottawa October 2013 Johan Heldal and Svetlana.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
United Nations Workshop on Evaluation and Analysis of Census Data, 1-12 December 2014, Nay Pyi Taw, Myanmar DATA VALIDATION-I Evaluation of editing and.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
The Review of the Dissemination of Health Statistics Carole Abrahams Office for National Statistics.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
The 2011 Census: Estimating the Population Alexa Courtney.
Joint UNECE-Eurostat worksession on confidentiality, 2011, Tarragona Sampling as a way to reduce risk and create a Public Use File maintaining weighted.
Remote Analysis Server for Tabulation and Analysis of Data Tarragonia, October 2011 James Chipperfield and Frank Yu (presenter)
Census Office Fernando Casimiro Geneva, July 2010 Portugal – Census results tailored to user needs «
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Natalie Shlomo Social Statistics, School of Social Sciences
Data Confidentiality and the Common Good.
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Assessing Disclosure Risk in Microdata
2001 Census Disclosure Control UK variations
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Imputation as a Practical Alternative to Data Swapping
Presentation transcript:

WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics

Topics of Discussion 1.Introduction 2.SDC Methods used in the 2001 UK Census 3.Data Used 4.Disclosure risk assessment 5.Data utility assessment 6.R-U confidentiality maps 7.Conclusions

1.SDC methods for 2001 UK Census: Original methods: random record swapping for all of the UK and higher thresholds in E and W Re-assessment of disclosure risk for the 2001 Census: 100% of the questionnaire was coded, increasing technologies and small area statistics and other external files on the web; perception of risk Additional method of small cell rounding for E and W (led to differential SDC methods across UK Statistical Offices) 2. Need to assess the SDC methods with respect to a disclosure risk-data utility framework in order to develop strategies for the 2011 Census Introduction

SDC Methods Pre-tabular method of random record swapping: Random sample of households with a fixed swapping rate is selected within control strata defined by the local authority, household size, sex and broad age distribution and hard-to- count index For each household selected, a paired household (within the control strata) is selected (if one isn’t found the selection goes beyond the local authority) All geographical variables swapped: - less edit failures (assumes conditional independence of census target variables and geographies given the control variables) - perturbs highly matchable variable

SDC Methods Pre-tabular method of random record swapping (cont): For this analysis we also examined: Targeted Record Swapping where households are selected and paired with other households that are in small cells (ones and twos) of selected tables Random record swapping not including imputed records since imputation gives a priori protection and there is no need to perturb them or take them into account in the risk assessment 3 swapping rates: 1%, 10%, 20%

SDC Methods AdvantagesDisadvantages Consistent totals for all tablesLeaves a high proportion of risky (unique) records unperturbed Preserves marginal distributions at higher aggregated levels Errors (bias) in data, in particular joint distributions distorted Some protection against disclosure by differencing two non- coterminous tables Effects of perturbation hidden and can’t be measures or accounted for in statistical analysis, i.e. a number in a table is not the true value Less edit failures when swapping geographies Method not transparent to users and appears as if no SDC method used Targeted swapping lowers disclosure risk Targeted swapping causes more distortion in the distributions of the table Pre-tabular method of record swapping (cont.):

SDC Methods Post Tabular method of small cell rounding Small Cell Rounding (SCA) – random rounding to base 3 of small cells: Perturbation has a mean of zero and variance of 2. Marginal totals obtained by adding perturbed and non- perturbed cells

SDC Methods Post Tabular method of small cell rounding (cont): For this analysis we also examined: Full Random Rounding (CRND) – random rounding to base 3 for all entries. First turn all entries into residuals of base 3 and apply same method as SCA. Preserve overall total of the table by controlling the stochastic process Semi-controlled random rounding (CSCA)– Preserve overall total of the table by controlling the stochastic process

SDC Methods Post Tabular method of small cell rounding (cont): AdvantagesDisadvantages Full protection for the high-risk (unique) cells Inconsistent totals between tables when margins aggregated from perturbed cells Full rounding protects against disclosure by differencing two non-coterminous tables. Small cell rounding gives little protection against disclosure by differencing so only one set of geographies and other variables disseminated Small cell rounding has less information loss Full rounding has margins rounded separately and tables aren’t additive Methods clear and transparent to usersStochastic methods of rounding are easier to unpick and tables may need to be audited prior to release Stochastic methods can be accounted for in statistical analysis

Data Used Estimation Area: SJ (Southwest England) 437,744 persons, 182,337 households, output areas 5 standard census tables (the number of categories are in parentheses): Religion(9) * Age-sex (6) * OA Travel to work (12) * Age-sex(12) * OA Country of birth (17) * Sex (2) * OA Economic Activity (9) * Sex (2) * Long term illness (2) * OA Health status (5) * Age-sex (14) * OA

Disclosure Risk Assume disclosure risk only arises when small cells are in the table, i.e. record swapping has disclosure risk since small cells are not eliminated but rounding has no disclosure risk Assume there is no risk of disclosure by differencing since only one set of variables and geographies are disseminated Take into account that imputed records have no disclosure risk Disclosure risk measure - Number of records that were perturbed or imputed in the small cells of the tables out of all the records in the small cells.

Disclosure Risk 16% a priori protection due to imputation No impact on disclosure risk at 1% swap Targeted record swapping lowers disclosure risk

Data Utility Distance metrics between original and perturbed cells in each OA and average across all OA’s Let be a table for OA k, number of cells in OA k, the number of OA’s in the area, and the cell frequency for cell c : Average Absolute Distance per Cell (AAD) Aggregation of perturbed cells and effects on sub-totals: Users aggregate lower level geographies which are perturbed to obtain non-standard geographies Calculate sub-total where

Data Utility

R-U Confidentiality Map 1% swapping rates have high utility but very high disclosure risk 10% targeted record swapping has same disclosure risk as the 20% random record swapping but much more utility Higher utility for random record swapping not including imputed records

Conclusions SDC methods of record swapping and rounding used for the 2001 UK Census managed the disclosure risk Random record swapping alone gives little protection against disclosure risk. Targeted record swapping lowers risk but higher information loss because of hidden biases Small cell adjustments give protection against disclosure risk but obtain different totals for tables with the same population base. Raise utility by controlled rounding (if possible) or semi- controlled rounding To avoid disclosure by differencing, one set of standard geographies and other variables are disseminated. This also lowers the utility of the census tables

Developing Strategies for Census 2011 Consistent SDC methods across all UK Statistical Offices that disseminate Census data Methods need to ensure that sufficient statistics (totals, averages and variances) are not compromised Flexible table generating software should be developed where the SDC method would be applied only once on the final outputted table and not aggregated from lower level geographies Improved GIS systems may allow more flexible dissemination of non-nested geographies SDC methods should be tailored to the type of output: standard tables, microdata, origin-destination tables, etc.