Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.

Slides:



Advertisements
Similar presentations
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Advertisements

Statistical Disclosure Control (SDC) for 2011 Census Progress Update Keith Spicer – ONS SDC Methodology 23 April 2009.
Confidentiality and the SARs Update on SAR progress, and discussion of the disclosure work done for Scotland. Sam Smith
The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group ( University.
LFS User Group meeting 21 October 2003 Measuring ethnicity in the LFS Vivienne Avery Labour Market Division, ONS.
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.
1 Editing the Integrated Census in Israel. EDITING THE INTEGRATED CENSUS IN ISRAEL Prepared by Eva Rotenberg, Central Bureau of Statistics, Israel (1)
The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
Progress on the SDC Strategy for the 2011 Census 23 rd June 2008 Keith Spicer and Caroline Young.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
WP 9 Assessing Disclosure Risk in Microdata using Record Level Measures Natalie Shlomo University of Southampton Office for National Statistics
Assessing Disclosure Risk in Sample Microdata Under Misclassification
United Nations Expert Group Meeting on Revising the Principles and Recommendations for Population and Housing Censuses New York, 29 October – 1 November.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Scotland’s 2011 Census Migration Matters Scotland Thematic Event Cecilia Macintyre 26 February 2015.
Shirley Crompton Source: Rob Allan. Institutional Repository Subject Repository Data Producer Repository share resources solve bigger problems integrate.
A Measure of Disclosure Risk for Fully Synthetic Data Mark Elliot Manchester University Acknowledgements: Chris Dibben, Beata Nowak and Gillian Raab.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
Economics and Statistics Administration U.S. CENSUS BUREAU U.S. Department of Commerce Comparing IRS Exemptions to 2010 Census Population Counts Esther.
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Research data workflow Practice in Slovenian Social Science Data Archives SERSCIDA WP4 – WORKSHOP Ljubljana September 2013.
Using the Health Survey for England to examine ethnic differences in obesity, diet and physical activity Vanessa Higgins & Angela Dale Centre for Census.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
Central egency for public mobilization and statistics.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
1 Statistical Disclosure Control Methods for Census Outputs Natalie Shlomo SDC Centre, ONS January 11, 2005.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
1 Discrepancies between National and International Data WORKSHOP ON MDG MONITORING BANGKOK, THAILAND 14 th – 16 th January 2009 By W.J.Nigamuni Deputy.
Register-Based Census 2011 in Slovenia – Some Quality Aspects Danilo Dolenc Statistical Office of the Republic of Slovenia UNECE-Eurostat Expert Group.
European Conference on Quality in Official Statistics Roma, July 8-11, 2008 New Sampling Design of INSEE’s Labour Force Survey Sébastien Hallépée Vincent.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.
© Nuffield Trust 22 June 2015 Matched Control Studies: Methods and case studies Cono Ariti
The Dutch Virtual Census based on registers and already existing surveys Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
The use of protected microdata in tabulation: case of SDC-methods microaggregation and PRAM Researcher Janika Konnu Manchester, United Kingdom December.
The Dutch Virtual Census of 2001 A New Approach by Combining Different Sources Eric Schulte Nordholt ECE Census meetings Geneva, November 2004.
Statistik.atSeite 1 Norbert Rainer Quality Reporting and Quality Indicators for Statistical Business Registers European Conference on Quality in Official.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
The Long Term Strategy for Population Surveys in Scotland 2009 – 2019 Alex Stannard Statistician, Scottish Government.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Disclosure Risk and Grid Computing Mark Elliot, Kingsley Purdam, Duncan Smith and Stephan Pickles CCSR, University of Manchester
United Nations Regional Workshop on the 2010 World Programme on Population and Housing Censuses: Census Evaluation and Post Enumeration Surveys Asunción,
Analysis of the characteristics of internet respondents to the 2011 Census to inform 2021 Census questionnaire design Orlaith Fraser & Cal Ghee.
Data Management and Analysis John Hollis Demographic Consultant, GLA Data Management and Analysis Statistical Aspects.
The 2011 Census: Estimating the Population Alexa Courtney.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Using administrative data to produce official social statistics New Zealand’s experience.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
The London Health Observatory: monitoring health and health care in the capital, supporting practitioners and informing decision-makers Disclosure control.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
Assessing Disclosure Risk in Microdata
Beata Nowok Chris Dibben & Gillian Raab Administrative Data
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Imputation as a Practical Alternative to Data Swapping
Presentation transcript:

Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie Marsh Centre for Census and Survey Research, University of Manchester

Overview Linkage experiments using: –Individual microdata from the UK census –microdata from the UK Labour Force Survey (LFS). Objective –to assess the disclosure risk impact of the statistical disclosure control methods on the used on the census microdata in 2001

Data Three datasets were used in this study: –The spring 2001 quarter of the standard Labour Force Survey (LFS). –The standard release version of the 2001 individual level SAR (post-SDC SAR). –The pre-SDC version of the 2001 individual level SAR (Pre-SDC SAR).

Background The 2001 SARS were subject to extensive statistical disclosure risk assessment and targeted control methods. The risk assessment was carried through collaboration of Manchester University and ONS using a variety of approaches. The disclosure control was a mixture of global recoding and local suppression and reimputation based on a variant of the PRAM (post randomisation) method.

Procedure 1) The variables were selected and then the codings of these variables on the different datasets were harmonised. 2) The matching was conducted. 3) The SUDA Software was run over the SARs to obtain DIS-SUDA scores for the matches. 4) All the unique matches (one-to-one) were sent to ONS. 5) The matches were verified by ONS Census and LFS divisions 6) The matches were returned with an indicator placed on the match file indicating whether the match was true or false. 7) The proportion of correct matches was generated under several different assumptions.

Variables Age (95 categories for the pre SDC SARS-LFS match, 44 categories for the post-SDC SARs-LFS match) Sex (2 Categories) Marital Status (5 Categories) Region of residence. (11 Categories) Number of Residents (7 categories) Primary economic status (9 categories) Country of birth (14 Categories) Ethnic group (15 categories) Tenure (5 categories)

Matching In principle, we could have used fuzzy matching methods to allow for data divergence However, the number of direct one to one matches was very large on both files and therefore this was deemed to cause an unnecessary administrative burden at the match verification stage. Therefore, a simple combine and sort algorithm was used for the matching.

Matching 6085 one to one matches between the pre-SDC SAR and the LFS 3130 one to one matches between the released SAR and the LFS. matches sent to ONS for verification.

Verification problem a significant number of matches there was no address linkable to the LFS identifying variables. –This affected 1602 matches (26.32%) against the pre-SDC SARS file and 895 matches (28.95%) against the post-SDC SARS file. –However, no strong relationships with match key variables.

Results 2.74% correct match rate with PRE-SDC SARS 1.63% correct match rate with PRE-SDC SARS

DIS-SUDA Band False matches Correct matches % correctTotal 0-> > > > > > > > Total Pre SDC-SAR

DIS-SUDA Threshold False MatchesCorrect matches %correct > > > > > > > > Pre SDC-SAR

DIS-SUDA Band False matches Correct matches % CorrectTotal 0-> > > > > > > > Total Post SDC-SAR

DIS-SUDA Threshold False MatchesCorrect matches%correct > > > > > > > >

Concluding Remarks Study provides evidence that disclosure control method use in the 2001 SARS provided protection against targeted intrusion. Caveats: –Data divergence, coverage, secondary attacks –Alternative method for identifying risky records