Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention.

Slides:



Advertisements
Similar presentations
BTS Confidentiality Seminar Series June 11, 2003 FCSM/CDAC Disclosure Limiting Auditing Software: DAS Mark A. Schipper Ruey-Pyng Lu Energy Information.
Advertisements

© Statistisches Bundesamt, IIA - Mathematisch Statistische Methoden Summary of Topic ii (Tabular Data Protection) Frequency Tables Magnitude Tables Web.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
In a Virtual Data Centre Protecting Confidentiality COMPUTATIONAL INFORMATICS Christine O’Keefe, Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO.
1 California and U.S. Teen Birth Rates, U.S. California Year Sources: Teen births: Birth Statistical Master File, years , Office of.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
Statistical Disclosure Control Philip Johnston, Information Services Division, NHSNSS ScotPHO training course, 1 April 2011.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
Annual Age-Adjusted Hepatitis C Mortality Rates, CA & U.S., Source: Los Angeles County Department of Public Health. An Analysis of Hepatitis.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
Protection of Personally Identifiable Information through Disclosure Avoidance Techniques Michael Hawes Statistical Privacy Advisor U.S. Department of.
Illustrating HIV/AIDS in the United States African American University Treatment and Science College Training Alexandra Ricca, MPH Emory University Friday,
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
United States Department of Justice The goal : Enable justice information sharing and protect privacy.
Basque Statistics Office Confidentiality Project: Final stages Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain,
FAMIS CONFERENCE Mari M. Presley, Assistant General Counsel Florida Department of Education June 14, 2011.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
1. “Software for Tabular Data Protection” Joe Fred Gonzalez, Jr. Lawrence H. Cox National Center for Health Statistics NCHS Data Users Conference July.
Balancing Research & Privacy E. C. Hedberg Arizona State University & NORC at the University of Chicago.
Overview of 2002 CIPSEA: Methods to Protect Confidential Tabular Data Amrut Champaneri, Ph.D. U.S. Department of Transportation Bureau of Transportation.
1 Tel Aviv April 29th, 2007 Disclosure Limitation from a Statistical Perspective Natalie Shlomo Dept. of Statistics, Hebrew University Central Bureau of.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
New Guidelines for Data Result Suppression of Small Cell Numbers in IBIS-Query Output Public Health Informatics Brown Bag July 22, 2009 Kathryn Marti,
Disclosure Control in Practice: issues and approaches Andy Sutherland Health and Social Care Information Centre.
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
Daniel Beckler United States Department of Agriculture National Agricultural Statistics Service Timothy Mulcahy NORC at the University of Chicago Topic.
Vital Event Data Release Scoring Criteria 5 June, 2005 NAPHSIS Meeting – Cincinnati, OH Mark Flotow Illinois Center for Health Statistics, IDPH.
California and U.S. Teen Birth Rates, U.S. California Year Sources: Teen births: Birth Statistical Master File, years , Health Information.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Agency for Healthcare Research and Quality Advancing Excellence in Health Care A Web-Based Tool for Quality and Utilization Reporting Anne.
Methodological questions of migration and ethnocultural diversity in Population Censuses StatCapCA Training Workshop No 3 Dushanbe, March 2007 Werner Haug.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
Data accessibility, confidentiality and copyright Bangkok 2010.
TISSUE REPOSITORIES: THE COMMON RULE and THE HIPAA PRIVACY RULE Mark A. Rothstein, J.D. Herbert F. Boehl Chair of Law and Medicine Director, Institute.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
The views expressed herein are those of the author and should not necessarily be attributed to the IMF, its Executive Board, or its management Data Confidentiality,
Disclosure Analysis: What do RDC Analysts do? Research Data Centre Program, Statistics Canada James Chowhan Ontario DLI Training, Queen's University
Access to microdata in the Netherlands: from a cold war to co-operation projects Eric Schulte Nordholt Senior researcher and project leader of the Census.
Joint UNECE/Eurostat work session on statistical data confidentiality Manchester, December 2007 Dealing with Confidentiality in Dissemination: The.
The Review of the Dissemination of Health Statistics Carole Abrahams Office for National Statistics.
Healthy Women: State Trends in Health and Mortality CD-ROM training Kate Brett Joanna Skilogianis Centers for Disease Control and Prevention National Center.
United Nations Statistics Division Dissemination of IIP data.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
2012 SLDS P-20W Best Practice Conference 1 A GGREGATE R EPORTING AND D ATA D ISCLOSURE A VOIDANCE T ECHNIQUES Monday, October 29,2012 Kim Nesmith, Louisiana.
1 Confidentiality and Data Access Committee Jacob Bournazian, Chair Energy Information Administration BTS Confidentiality Seminar Series June 11, 2003.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Data disclosure control Nordic Forum for Geography and Statistics Stockholm, 10 th September 2015.
Information Governance Jo Wall South East Public Health Intelligence Analyst Training Day 2, Session 5 11 th February 2016.
National Statistics - access and disclosure issues for Vital Events data Allan Baker Office for National Statistics.
The London Health Observatory: monitoring health and health care in the capital, supporting practitioners and informing decision-makers Disclosure control.
11 Measuring Disclosure Risk and Data Utility for Flexible Table Generators Natalie Shlomo, Laszlo Antal, Mark Elliot University of Manchester
“Software for Tabular Data Protection”
Confidentiality in Published Statistical Tables
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
The Impact of Public Health Spending on the Rate of Gonorrhea: Evidence from California Counties Given the time constraint for your presentation, (i) don’t.
Census Data for Transportation Planning—Some Thoughts
Poverty Gradients and Racial/Ethnic Analysis of Gonorrhea in California Michael C. Samuel, DrPH; Yuri Springer, PhD; Denise Gilson; Gail Bolan, MD STD.
Data from statistical modeling (e. g
Disclosure Avoidance: An Overview
Expert Group on Quality of Life Indicators
Federal Statistical Office Germany Research Data Centre
Dealing with confidential data Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION.
Treatment of statistical confidentiality Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE.
Presentation transcript:

Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention Conference Confronting Challenges, Applying Solutions Chicago, Illinois, March 10-13, 2008

* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal Total

Confidentiality – Types of Disclosure Identity Disclosure –Identity of an individual can be determined based on the released data Or …can reasonably be determined… Attribute Disclosure –Confidential information about an individual is revealed based on the released data Or “sensitive” information; or “embarrassing” information

Extensive Literature Key Resource: Federal Committee on Statistical Methodology Office of Management and Budget

Release of public health data –Balance obligations to protect the public’s health with obligations to respect individual privacy & confidentiality If “significant” risks –“Statistical Disclosure Limitation” True Risk versus Perception of Risk Key Concepts

Disclosure Limitation with Tabular Data If cells are deemed sensitive based on specified threshold rule –Alter underlying “line-listed” or “microdata” before the tables are constructed – may be particularly relevant technique for on-line query systems –Change table: aggregate rows or columns –Suppress cells

Threshold Rules Numerator rule –e.g. cell size <3, <5 (many) Population denominator rule –e.g. population < 20,000 (HIPPA-based), <50 Numerator and population denominator rule –numerator > 10 AND denominator > 50 (Oregon cancer registry) Population denominator minus numerator rule –e.g. population-cell count < 10 (Missouri)

Cell Suppression Simple Cell Suppression Random Rounding Controlled Rounding Controlled Tabular Adjustment

No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal Total

No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal Total

Simple Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal sss s10 s s35 Total s – data withheld to limit disclosure

Simple & Complementary Row and/or Column Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal sss ss s10 s25 35+s147s35 Total s – data withheld to limit disclosure

Simple & Complementary Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal sss ss s10 s25 35+s147s35 Total s – data withheld to limit disclosure = 1 based on linear combinations

Simple & Complementary – “Protected by Suppression” Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal sss ss10s25 35+s14ss35 Total s – data withheld to limit disclosure Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern

Los Angeles County

CA STD Control Suppression Rule Suppress any cell if –numerator ≠ 0 AND –0 < (Cell denominator – cell numerator) < 100 AND, If so –Suppress any complementary cells necessary to avoid re-calculation of suppressed cell –OR –Suppress all cells in a table if any cell meet criteria above

Fresno County

Modoc County

Alpine County

Sierra County

Attribute Disclosure

Solano County

Recommendations Confidentiality Concerns –Assess real versus perceived risk –If real, determine best rule(s) –Proposition: suppress if: Denominator – Numerator < 100 AND Numerator Not = 0 If denominator unknown, estimate reasonably or use reasonable “numerator only” rule

? Michael C. Samuel, Dr.P.H