Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.

Slides:



Advertisements
Similar presentations
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Advertisements

Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
RELEASE OF THE 2001 CENSUS RESULTS March Release of the 2001 Census Content Media and formats Release schedule Arrangements for using the results.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.
Data linking – Project update 15 th May 2012 – Homecare & SDS event Atlantic Quay Ellen Lynch & Euan Patterson.
Census 2011 Output Content and User Consultation Joe Traynor.
Assessing Disclosure Risk in Sample Microdata Under Misclassification
Statistical Disclosure Control Philip Johnston, Information Services Division, NHSNSS ScotPHO training course, 1 April 2011.
United Nations Expert Group Meeting on Revising the Principles and Recommendations for Population and Housing Censuses New York, 29 October – 1 November.
Statistical Disclosure Control for the 2011 UK Census Keith Spicer Office for National Statistics.
Len Cook: Hero or Zero of the 2001 Census? OR A look at the impact of disclosure control on aggregate census outputs.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
Household projections for Scotland Hugh Mackenzie April 2014.
11 Comparison of Perturbation Approaches for Spatial Outliers in Microdata Natalie Shlomo* and Jordi Marés** * Social Statistics, University of Manchester,
GEOG3025 Census and administrative data sources 2: Outputs and access.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention.
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Coverage assessment and adjustment methodology Owen Abbott Methodology Directorate, ONS.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
2011 Census: Lessons learned from the Business Sector Dr Barry Leventhal MRS Census & Geodemographics Group CAG Meeting 8 th January 2015.
Audit Sampling: An Overview and Application to Tests of Controls
American Community Survey “It Don’t Come Easy”, Ringo Starr Jane Traynham Maryland State Data Center March 15, 2011.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 8-1 Chapter Eight Audit Sampling: An Overview and Application.
Design of the 2011 Census Coverage Survey Owen Abbott (ONS) James Brown (Institute of Education)
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
AASHTO & FHWA Appeal re: DRB “rule of three” decision before the Data Stewardship Executive Policy Committee 8/28/2008.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
Data Management and Analysis John Hollis Demographic Consultant, GLA Data Management and Analysis Statistical Aspects.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
Household Projections for Wales Welsh Statistical Liaison Committee 6 th March 2014.
The Review of the Dissemination of Health Statistics Carole Abrahams Office for National Statistics.
The 2011 Census: Estimating the Population Alexa Courtney.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, 7-9 JULY 2010 DISSEMINATING THE RESULTS OF THE 2011 CENSUS IN ENGLAND AND WALES.
General Register Office for S C O T L A N D information about Scotland's people 1 Small Area Population Estimates for Scotland Quality Assurance Harvey.
Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences.
Sharing Information Legally Lindsay Ould London Borough of Lewisham.
The complexities of publishing gridded data for the UK European Forum for Geostatistics Krakow – October 2014 Ian Coady Geography Policy and Research Manager.
Data Management and Analysis John Hollis (GLA) BSPS Conference University of St Andrew’s 11 September 2007 Data Management and Analysis Further Alterations.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Audit Sampling: An Overview and Application
Audit Sampling: An Overview and Application to Tests of Controls
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Assessing Disclosure Risk in Microdata
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
2001 Census Disclosure Control UK variations
Treatment of statistical confidentiality Part 3: Generalised Output SDC Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK.
Presentation transcript:

Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011

Overview Brief introduction to SDC Census outputs & confidentiality Record swapping Data utility 2001 vs 2011 Communal Establishments Further work

Introduction to SDC (1) - What is disclosure risk? There is a disclosure risk when information is published that could allow an intruder to indicate the identity or particulars of: an individual a household or family a business or another statistical unit

4 Identification disclosure Attribute disclosure (AD) Group disclosure Introduction to SDC (2) - Examples of disclosure risk

Introduction to SDC (3) - Statistical Disclosure Control Statistical Disclosure Control (SDC) involves either: introducing sufficient ambiguity/damage into, or reducing level of detail, of published statistics, so that the risk of disclosing confidential information is reduced to an acceptable level and/or: controlling access to data

Census outputs and confidentiality Disclosure control of Census outputs required by law Pledge on Census forms Visible variables –use to identify individual/family/household –find out something new about them –Data Environment Analysis Service (DEAS) Sensitive variables –defined by DPA

Risk – Utility balance Disclosure Risk: Information about confidential units Data Utility: Information about legitimate items Original Data No data Released Data Maximum Tolerable Risk High Low

SDC for Census 2001 Random record swapping Lack of harmonisation and late changes to agreed methodology SCA applied in E, W, NI, not in Scotland SCA protected individual tables, but some remaining risk through differencing Effect on utility at low geographies and in creating bespoke geographies

9 104 Delivery Groups (DGs) in England & Wales ≈ 4 LADs in a DG ≈ 20 MSOAs in an LAD ≈ 20 OAs in an MSOA Census Geography DG LAD MSOA OA

SDC for Census 2011 RsG agreement November 2006 –Small cell counts as long as ‘sufficient uncertainty’ –Main risk attribute disclosure Targeted record swapping –Targeted to ‘risky’ records –Risk looks at particular variables, takes account of geography –Risk scores for individuals combined to household score –Households swapped –Households swapped only as far as their risk is considered ‘high’ –Imputation considered as part protection

Targeted swapping (1) Households − Risk score on uniqueness/rarity of small number of key variables at different geographies Probability −inversely related to area imputation rate −positively related to household risk score Matching −look for matches only as far as is necessary −Match on household size, and other variables if possible

Targeted swapping – an example of how it works (1) Risky within OA Risky within MSOA Risky within LA Swap with h’hold in another OA in MSOA Swap with h’hold in another MSOA in LA Swap with h’hold in another LA within delivery group Household is in area that has high response rate, therefore low imputation. So area has higher than average swapping rate

Targeted swapping – an example of how it works (2) Household found to be risky within OA and is selected for swapping. Only swapped between OAs in the same MSOA. Households are matched on: Adults = 2 Children = 1 Pets = 2

Swapping & Sufficient uncertainty Level of swapping in an area determined by level of non-response / imputation Swapping lower where more imputed records Sufficient uncertainty has been assessed by two factors: –Percentage of real attribute disclosures (ADs) protected by imputation & swapping –Percentage of apparent ADs created

Effect of targeted swapping on data utility LLTI by OALLTI by MSOA Typical effect of swapping on numbers of people with LLTI Based on 2001 data Utility higher at MSOA than at OA

Summary of SDC methodology Main effect on utility will be for small cells at low level geographies Tables will be consistent and additive Will use minimum average cell size All univariate residence-based tables at OA publishable There will be no small cell adjustment Tables will contain apparent small cells and apparent ADs, but an intruder can’t find out something about an individual case with a “high degree of confidence”

17 Communal establishments For client residents: For staff residents:

Further work Minority population outputs. Flow data Microdata Workplace tables Commissioned tables Contact: SDC