Imputation as a Practical Alternative to Data Swapping

Slides:



Advertisements
Similar presentations
Alternative Approaches to Data Dissemination and Data Sharing Jerome Reiter Duke University
Advertisements

Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Business microdata dissemination at Istat Daniela Ichim Luisa Franconi
SDC for continuous variables under edit restrictions Natalie Shlomo & Ton de Waal UN/ECE Work Session on Statistical Data Editing, Bonn, September 2006.
The Smith Consulting Group1 Ethics and Accountability Bob Smith The Smith Consulting Group Spring 2004 Conference Oklahoma Association for Instructional.
Optimizing the Use of Microdata: Julia Lane Adapted from ASA presentation in honor of Pat Doyle.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
“OnTheMap” The Census Bureau’s New Tool for Residence-Workplace Analysis Fredrik Andersson and Jeremy Wu May 7, 2007 Daytona Beach, FL.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
Synthetic Data within the Risk – Utility Framework Keith Spicer Office for National Statistics.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
1 Overview of Statistical Disclosure Methodology for Microdata Laura Zayatz Census Bureau BTS Confidentiality Seminar Series, April.
Data Shuffling for Protecting Confidential Data Data Shuffling for Protecting Confidential Data A Software Demonstration Rathindra Sarathy* and Krish Muralidhar**
Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Luisa Franconi Integration, Quality, Research and Production Networks Development Department Unit on microdata access ISTAT Essnet on Common Tools and.
JSM, Boston, August 8, 2014 Privacy, Big Data and The Public Good: Statistical Framework Stefan Bender (IAB)
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
Protecting Sensitive Labels in Social Network Data Anonymization.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Assessing Disclosure for a Longitudinal Linked File Sam Hawala – US Census Bureau November 9 th, 2005.
Name Position Organisation Date. What is data integration? Dataset A Dataset B Integrated dataset Education data + EMPLOYMENT data = understanding education.
Comments: The Big Picture for Small Areas Alan M. Zaslavsky Harvard Medical School.
The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom December.
Joint UNECE / Eurostat meeting on Population and Housing Censuses 7-9 July 2010, Geneva Disseminating Census information to maximise use and value Keith.
Data Perturbation An Inference Control Method for Database Security Dissertation Defense Bob Nielson Oct 23, 2009.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
1 1 A statistical approach to surrogate data Li-Chun Zhang Statistics Norway
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
European Conference on Quality in Official Statistics, Rome, July 2008 Community Innovation Survey: a Flexible Approach to the Dissemination of Microdata.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Data for secondary analysis: the experience of the UK Data Archive Hilary Beedham UK Data Archive.
The Review of the Dissemination of Health Statistics Carole Abrahams Office for National Statistics.
Jerry Reiter Department of Statistical Science and the Information Initiative at Duke Duke University.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Linking data resources Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar 3 on.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Natalie Shlomo Social Statistics, School of Social Sciences
Data Confidentiality and the Common Good.
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Progress towards a table builder with in-built disclosure control for 2021 Census Keith Spicer UNECE, 22 September 2017.
Assessing Disclosure Risk in Microdata
Measures for Information Loss in Protected Data
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Census Data for Transportation Planning—Some Thoughts
Access to European microdata for scientific purposes
Harmonisation process of anonymisation of microdata
Classification Trees for Privacy in Sample Surveys
Protecting Confidential Data
Federal Statistical Office Germany Research Data Centre
Managing Private and Public Views of DDI Metadata Repositories
Item 4.3 Confidentiality on the fly
The role of metadata in census data dissemination
Item 5 Wim Kloek, Eurostat
Jerome Reiter Department of Statistical Science Duke University
Presentation transcript:

Imputation as a Practical Alternative to Data Swapping Saki Kinney, David Wilson, Alan Karr (RTI); Kelly Kang (NSF) 29th July 2019

Statistical Disclosure Control Agencies frequently publish microdata files which have been subject to alteration for the purpose of protecting the confidentiality of individuals represented in the dataset.These data still retain many important statistical properties of the original data. Statistical disclosure control (SDC) methods strive to balance data quality and confidentiality protection. Examples of SDC methods Data reduction: Top-coding, coarsening, rounding, suppression Data perturbation: Data swapping, Synthetic data (imputation)

Data Swapping and Imputation In data swapping, selected variables and records have their values swapped with other similar records. This serves to add uncertainty to any attempted record linkage. In synthetic data applications, typically larger portions of datasets, sometimes entire datasets, have their values replaced with multiple imputations. This can provide high disclosure protection while allowing users to make valid inferences that account for uncertainty due to the disclosure protection.

Our paper We propose to apply imputation in the paradigm of swapping. That is, a select portion of records have their values replaced with (single) imputations. Compared to swapping: Imputation is simpler to implement with open source software Model-based approach is more flexible, intuitive, and transparent Imputation approximately preserves marginal distributions whereas swapping preserves precisely Imputation better preserves relationships between perturbed and unperturbed variables Higher perturbation levels can be used, enhancing disclosure protection We conducted experiments to demonstrate benefits. See poster for results and discussion.