Download presentation
Presentation is loading. Please wait.
Published byJanis Evans Modified over 9 years ago
1
Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention Conference Confronting Challenges, Applying Solutions Chicago, Illinois, March 10-13, 2008
2
* NOT issue of “small cells” with expected value < 5 and impact on chi-square tests Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-191513120 20-242010 1555 25-34310 225 35+12147235 Total50353020135
3
Confidentiality – Types of Disclosure Identity Disclosure –Identity of an individual can be determined based on the released data Or …can reasonably be determined… Attribute Disclosure –Confidential information about an individual is revealed based on the released data Or “sensitive” information; or “embarrassing” information
4
Extensive Literature http://www.fcsm.gov/working-papers/spwp22.html Key Resource: Federal Committee on Statistical Methodology Office of Management and Budget
5
Release of public health data –Balance obligations to protect the public’s health with obligations to respect individual privacy & confidentiality If “significant” risks –“Statistical Disclosure Limitation” True Risk versus Perception of Risk Key Concepts
6
Disclosure Limitation with Tabular Data If cells are deemed sensitive based on specified threshold rule –Alter underlying “line-listed” or “microdata” before the tables are constructed – may be particularly relevant technique for on-line query systems –Change table: aggregate rows or columns –Suppress cells
7
Threshold Rules Numerator rule –e.g. cell size <3, <5 (many) Population denominator rule –e.g. population < 20,000 (HIPPA-based), <50 Numerator and population denominator rule –numerator > 10 AND denominator > 50 (Oregon cancer registry) Population denominator minus numerator rule –e.g. population-cell count < 10 (Missouri)
8
Cell Suppression Simple Cell Suppression Random Rounding Controlled Rounding Controlled Tabular Adjustment
9
No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-191513120 20-242010 1555 25-34310 225 35+12147235 Total50353020135
10
No Suppression (“With Disclosure”) Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-191513120 20-242010 1555 25-34310 225 35+12147235 Total50353020135
11
Simple Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-1915sss20 20-242010 1555 25-34s10 s25 35+12147s35 Total50353020135 s – data withheld to limit disclosure
12
Simple & Complementary Row and/or Column Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-1915sss20 20-2420ss1555 25-34s10 s25 35+s147s35 Total50353020135 s – data withheld to limit disclosure
13
Simple & Complementary Suppression Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-1915sss20 20-2420ss1555 25-34s10 s25 35+s147s35 Total50353020135 s – data withheld to limit disclosure = 1 based on linear combinations
14
Simple & Complementary – “Protected by Suppression” Numbers from Working Paper 22, from Cox 1986 Race/Ethnicity Age GroupBlackWhiteHispanicAsian/PITotal 15-1915sss20 20-242010 1555 25-34ss10s25 35+s14ss35 Total50353020135 s – data withheld to limit disclosure Methods available to select appropriate cells for suppression and to audit a proposed suppression pattern
18
Los Angeles County - 2006
19
CA STD Control Suppression Rule Suppress any cell if –numerator ≠ 0 AND –0 < (Cell denominator – cell numerator) < 100 AND, If so –Suppress any complementary cells necessary to avoid re-calculation of suppressed cell –OR –Suppress all cells in a table if any cell meet criteria above
20
Fresno County - 2006
21
Modoc County - 2006
22
Alpine County - 2006
23
Sierra County - 2006
24
Attribute Disclosure
26
Solano County - 2004
27
Recommendations Confidentiality Concerns –Assess real versus perceived risk –If real, determine best rule(s) –Proposition: suppress if: Denominator – Numerator < 100 AND Numerator Not = 0 If denominator unknown, estimate reasonably or use reasonable “numerator only” rule
28
? Michael C. Samuel, Dr.P.H. Michael.Samuel@cdph.ca.gov 510.620.3198
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.