Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center “ Inadequate.

Slides:



Advertisements
Similar presentations
Statistical disclosure limitation: Balancing data confidentiality and data access.
Advertisements

Dealing with confidential research information anonymisation techniques and other measures to enable using and sharing research data Data Management and.
Dealing with confidential research information - Anonymisation techniques and access regulations to enable using and sharing research data Data Management.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
Issues in Designing a Confidentiality Preserving Model Server by Philip M Steel & Arnold Reznek.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
How IPUMS Harmonizes Microdata Data Sources and Bibliography Data Sources: Original census data are contributed to the IPUMS- International project by.
Counting the Dutch, The Future of the Virtual Census in the Netherlands Presentation at the seminar Counting the 7 Billion 24 February 2012 * Geert Bruinooge.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Welcome IPUMS/IECM-Europe Workshop: Accomplishments, plans and challenges * * * Robert McCaa, Professor of.
IPUMS workshop * * * Robert McCaa, Professor of Population History University of Minnesota additional information.
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS i integration principles IPUMS i integration principles » 1. Respect absolute anonymity and confidentiality »
Statistical confidentiality and privacy. 2. Case study: IPUMS-International * * * Robert McCaa Minnesota Population Center.
Proposed IPUMS-International Secure Data Enclave Patricia Kelly Hall
Hist.umn.edu/~rmccaa/ipums-europe1 From IPUMS-USA (1989-) & PAU-Aging (1992-) From IPUMS-USA (1989-) & PAU-Aging (1992-) to IPUMS-International (1999-)
IPUMS-Europe: Confidentiality measures for licensing and disseminating restricted-access census microdata extracts
IPUMS-International: High precision Population Census Samples: Balancing the Privacy-Quality Tradeoff by Means of Restricted Access Microdata Extracts.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Country Practices on Census Data Archiving.
MOLLA HUNEGNAW STATISTICIAN AFRICAN CENTRE FOR STATISTICS ECASTATS.UNECA.ORG Confidentiality and Anonymization of Microdata 1 United Nations Regional Seminar.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
Hist.umn.edu/~rmccaa/ipums-europe1 IPUMS-Europe, : Restricted-access, anonymized microdata for scientific and policy research * * * Robert McCaa,
Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September 2011 Overview of Archiving of Microdata Session 4 United Nations.
The Application of the Concept of Uniqueness for Creating Public Use Microdata Files Jay J. Kim, U.S. National Center for Health Statistics Dong M. Jeong,
Intruder Testing: Demonstrating practical evidence of disclosure protection in 2011 UK Census Keith Spicer, Caroline Tudor and George Cornish 1 Joint UNECE/Eurostat.
Confidentiality and Security Issues in ART & MTCT Clinical Monitoring Systems Meade Morgan and Xen Santas Informatics Team Surveillance and Infrastructure.
Dissemination to support Research & Analysis John Cornish.
Statistical Coherence: Census Hub Hypercubes and IPUMS Microdata UNECE Expert Group on Population and Housing Censuses Geneva, September 2014 Lara.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
WP. 46 Providing access to data and making microdata safe, experiences of the ONS Jane Longhurst Paul Jackson ONS.
Introduction to the Public Use Microdata Sample (PUMS) File from the American Community Survey Updated February 2013.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Population Census carried out in Armenia in 2011 as an example of the Generic Statistical Business Process Model Anahit Safyan Member of the State Council.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Census/NeSS Roadshows March 2003 Better Information Initiatives.
RESEARCH ETHICS AND DATA CONFIDENTALITY: ANONYMISATION AND ACCESS CONTROL ……………………………………………………………………………………………………………………………….…………………………….. ……………………………………………………………......…...
Trans-Border access to Census Microdata: The IPUMS-IECM partnership * * * Robert McCaa and Albert Esteve Palós “You have to.
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
 Public Use Microdata Sample – sample file of unaggregated raw data with no identifying information about an individual person or household (no addresses,
2008 NCHS Data Users’ Conference Omni Shoreham Hotel Washington, DC Wednesday, August 13, 2008.
Statistical data confidentiality and micro data in Albania
The availability of Dutch census microdata Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands Division Social.
Using Targeted Perturbation of Microdata to Protect Against Intelligent Linkage Mark Elliot, University of Manchester Cathie.
Access to environmental microdata in Germany IAOS Conference, Chile, 2010 Markus Zwick Federal Statistical Office Germany.
Disclosure Control in the UK Census Keith Spicer 11 January 2005.
1 Dissemination Michael J. Levin Harvard Center for Population and Development Studies
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Data Dissemination Conditions in the European Statistical System (ESS) UNECE, Warschau May 2009.
The 2011 Census: Estimating the Population Alexa Courtney.
Census quality evaluation: Considerations from an international perspective Bernard Baffour and Paolo Valente UNECE Statistical Division Joint UNECE/Eurostat.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
11 September 2008 Expert group meeting on the scope and content of Social Statistics 1 The Development of Social Statistics in the European Statistical.
1. Introduction 2. Background 3. Funding framework 4. EU participation 5. Timetable 6. Progress report 7. Future plans I ntegrating the E uropean C ensus.
The London Health Observatory: monitoring health and health care in the capital, supporting practitioners and informing decision-makers Disclosure control.
Disclosure scenario and risk assessment: Structure of Earnings Survey
Data Confidentiality and the Common Good.
Country report Germany
Country report Germany
Country report Germany
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
6.1 Quality improvement Regional Course on
Nicolás J. I. Rodríguez & Arild Mellesdal
The 2010 World Population and Housing Census Programme
Disclosure Avoidance: An Overview
Item 2.2 Scientific Use Files for the Time Use Survey
Presentation transcript:

Statistical confidentiality and privacy: 1. General considerations * * * Robert McCaa Minnesota Population Center “ Inadequate use of microdata has high costs ” --Len Cook (2003, registrar general, ONS)

UNSD Principles and Recommendations (Rev. 1, 1997) endorse dissemination of census microdata » §1.218: “There are a range of methods…that can be used to make such microdata available while still protecting individuals’ rights to privacy.” (Rev. 2 has a stronger statement.) » In four decades of distributing microdata there is not a single allegation of a breach of confidentiality or privacy (includes 100% microdata stored at CELADE in Santiago, Chile).

Why disseminate microdata? Julia Lane, European Statisticians Conference (2003) » 1. Analyze more realistic questions » 2. Develop reality-based policy » 3. Acquire new constituencies and stakeholders » 4. Build trust; reduce suspicions of data cooking » 5. Replicate findings » a. use standards of UNSD, Eurostat, ISCO, ISCED, etc. » b. facilitate comparative research in time and space » 6. Calculate marginal effects » 7. Assess data quality » …and much, much more….

Confidentializing an integrated microdata base with: » 200+ samples of households (70+ countries) » Containing ½ billion person records with thousands of variables » Available to tens of thousands of licensed users regardless of country of birth, citizenship, residence or place of work » Without a single allegation of violation of privacy or statistical confidentiality-- What ’ s the problem?

5 Usage: Off-site vs. on-site use (secure microdata laboratory)? Germany RDC, : ten-to-one Jan-Sept RDCs are expensive and attract few users.

“Statistical disclosure control methods may modify the data or the design of the statistic, or a combination of both. They will be judged sufficient when the guarantee of confidentiality can be maintained, taking account of information likely to be available to third parties, either from other sources or as previously released National Statistics outputs, against the following standard: “It would take a disproportionate amount of time, effort and expertise for an intruder to identify a statistical unit to others, or to reveal information about that unit not already in the public domain.” Protocols on Data Access and Confidentiality, pp. 7-8 (2004) “Statistical disclosure control methods may modify the data or the design of the statistic, or a combination of both. They will be judged sufficient when the guarantee of confidentiality can be maintained, taking account of information likely to be available to third parties, either from other sources or as previously released National Statistics outputs, against the following standard: “It would take a disproportionate amount of time, effort and expertise for an intruder to identify a statistical unit to others, or to reveal information about that unit not already in the public domain.” Protocols on Data Access and Confidentiality, pp ONS-UK(2004)

Risk assessment of household samples of UK 1991 census: attempts at matching are “fruitless” few matches; many false positives » After taking into account errors in the data, coding variability and changing of personal characteristics in time » Dale and Elliott, JRSS-A (2003): “For a user of an outside database, attempting this sort of match with no opportunity for verification would prove fruitless. In the first place, the small degree of expected overlap would be a considerable deterrent to an intruder. However, if a match between the two files was attempted the large number of apparent matches would be highly confusing as an intruder would have no way of checking correct identification.”

8 complete microdata confidential microdata de-facto anonymised microdata delete direct identifier anonymisation method Degree of confidentiality Degree of analysis potential stronger anonymisation method fully anonymised microdata Level of Anonymization (FSO-Germany) Trade-off between confidentiality and analysis potential: is it monotonic (as portrayed)?

9 complete microdata confidential microdata de-facto anonymised microdata delete direct identifier anonymisation method Degree of confidentiality Degree of analysis potential stronger anonymisation method fully anonymised microdata Level of Anonymization— not monotonic 95% & Construct sample 50%25%45% 99%99.9% Trade-off is not monotonic

Resources » UN-ECE (2007), Managing Statistical Confidentiality & Microdata Access » IHSN Tools & Guidelines, anonymization: » Eurostat (1999)

UN-ECE (2007)

IHSN www. Survey network.org www. Survey network.orgwww. Survey network.org

IHSN www. Survey network.org www. Survey network.orgwww. Survey network.org

IHSN www. Survey network.org www. Survey network.orgwww. Survey network.org 1.Remove variables Identifiers: name, address, low-level administrative geographyIdentifiers: name, address, low-level administrative geography Sensitive: tribe, disabilitySensitive: tribe, disability 2.Global recoding Aggregate classes: age (5 yr groups), country of birth (continent), administrative geography, occupation (4 digit  3), etc.Aggregate classes: age (5 yr groups), country of birth (continent), administrative geography, occupation (4 digit  3), etc. Top and bottom coding (continuous variables-- income, size of residence, number of rooms, etc.)Top and bottom coding (continuous variables-- income, size of residence, number of rooms, etc.) 3.Local suppression--sparse categories (population n < 250…2,500) 4.Data swapping (household geography) 5.Complex perturbations

EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International » 1. Restrict access to samples » 2. Limit geographical detail » 3. Re-code unique categories--top and bottom » 4. Sign non-disclosure agreement » 5. Prohibit redistribution to third parties » 6. Prohibit attempts to identify individuals or the making any claim to that effect » 7. Require users to provide copies of publications

EUROSTAT statistical confidentiality standards (Thorogood, 1999) --all endorsed by IPUMS-International 8. Construct age from birthdate, if necessary8. Construct age from birthdate, if necessary 9. Do not identify date of birth9. Do not identify date of birth 10. Do not identify precise place of birth10. Do not identify precise place of birth 11. Migration: timing/place not identified in detail11. Migration: timing/place not identified in detail 12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention)12. Identify place of residence by major civil division (pop>20k, 60k, 100k, 1 million—i.e., national convention) 13. Do sensitivity analysis13. Do sensitivity analysis 14. Do confidentiality assessment (not yet)14. Do confidentiality assessment (not yet)

“There has been no known attempt at identification with the 1991 SARs [microdata samples of the UK]- nor in any other countries that disseminate samples of microdata” --Elliott and Dale, Journal of the Royal Statistical Society, 1999 Countering Fear, Hysteria and Paranoia…with reason

ChoicePoint Data Sources and Clients. Source: Washington Post Why Not? Companies want linkable data with names, addresses, ID #s, etc. * * * * * * * * * * * * * * * * * * * Probabilistic linking with 90% of the population missing is not good enough

To play ”pizza” video:

“There has been no known attempt at identification with the 1991 SARs [microdata samples of the UK]- nor in any other countries that disseminate samples of microdata” --Elliott and Dale, Journal of the Royal Statistical Society, 1999 Countering Fear, Hysteria and Paranoia…with reason

Please allow me to invite you to think about producing (or permitting IPUMS to produce) anonymized, integrated samples for all the censuses of your country for which microdata survive… Thank you * * * * * * Contact: this ppt is available at: See “Port of Spain workshop”