Download presentation
Presentation is loading. Please wait.
Published byPascual Cano Modified over 5 years ago
1
Open Data Sharing and its Statistical Limitations
Pooja Iyer Barbara Do RTI International
2
Outline I. Data utility vs. data risk
II. Introduction of the standard disclosure techniques III. Reason for cell suppression IV. Additional limitation techniques a. Limitation of detail b. Top/bottom coding c. Additive Noise
3
R vs. U R vs. U³ 1. High data utility U, so faithful in critical ways to the original (analytically valid) data 2. Low disclosure risk R, so confidentiality is protected (safe data)
4
Standard Disclosure Techniques De-Identification of PHI in accordance with HIPAA¹
Medical Record Numbers ‘Geographic subdivisions smaller than state’ Site ID Other items to consider removing/randomizing: remove ALL PROPER NOUNS i.e. names, initials, specific geographic locations Mask specific dates to a new category that shows ‘days from randomization’ Telephone numbers, address, social security numbers, all biometric identifiers
5
Cell Suppression “In a contingency table, cells with too few observations cannot be released to the public, as it may be easy to infer the identity of these individuals.” ²
6
Additional techniques²
Limitation of detail collapsing categories Top/bottom coding adding categories Additive noise Top/Bottom Coding Additive Noise Z = transformed point X = original data point ε = random variable with distribution e~N(0,σ²) Limitation of detail: View slide in presentation mode: Summary Limitation of detail Top/Bottom Coding Noise Addition Be sure to explain what MUAC reading is, and why it is potentially identifiable
7
References ¹HHS Office of the Secretary,Office for Civil Rights, & OCR. (2015, November 06). Methods for De-identification of PHI. Retrieved April 05, 2018 ²Matthews, G. J., & Harel, O. (2011). Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing prviacy. Statistics Surveys, 5, doi: /11-SS074 ³Duncan, G. T., Keller-McNulty, S. A., & Stokes, S. L. (2001). Disclosure Risk vs. Data Utility: The R-U Confidentiality Map. National Institute of Statistical Sciences, 5-7. Retrieved April 5, 2018.
8
Pooja Iyer piyer@rti.org
RTI International Thank you Pooja Iyer
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.