Open Data Sharing and its Statistical Limitations Pooja Iyer Barbara Do RTI International
Outline I. Data utility vs. data risk II. Introduction of the standard disclosure techniques III. Reason for cell suppression IV. Additional limitation techniques a. Limitation of detail b. Top/bottom coding c. Additive Noise
R vs. U R vs. U³ 1. High data utility U, so faithful in critical ways to the original (analytically valid) data 2. Low disclosure risk R, so confidentiality is protected (safe data)
Standard Disclosure Techniques De-Identification of PHI in accordance with HIPAA¹ Medical Record Numbers ‘Geographic subdivisions smaller than state’ Site ID Other items to consider removing/randomizing: remove ALL PROPER NOUNS i.e. names, initials, specific geographic locations Mask specific dates to a new category that shows ‘days from randomization’ Telephone numbers, email address, social security numbers, all biometric identifiers
Cell Suppression “In a contingency table, cells with too few observations cannot be released to the public, as it may be easy to infer the identity of these individuals.” ²
Additional techniques² Limitation of detail collapsing categories Top/bottom coding adding categories Additive noise Top/Bottom Coding Additive Noise Z = transformed point X = original data point ε = random variable with distribution e~N(0,σ²) Limitation of detail: View slide in presentation mode: Summary Limitation of detail Top/Bottom Coding Noise Addition Be sure to explain what MUAC reading is, and why it is potentially identifiable
References ¹HHS Office of the Secretary,Office for Civil Rights, & OCR. (2015, November 06). Methods for De-identification of PHI. Retrieved April 05, 2018 ²Matthews, G. J., & Harel, O. (2011). Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing prviacy. Statistics Surveys, 5, 1-29. doi:10.1214/11-SS074 ³Duncan, G. T., Keller-McNulty, S. A., & Stokes, S. L. (2001). Disclosure Risk vs. Data Utility: The R-U Confidentiality Map. National Institute of Statistical Sciences, 5-7. Retrieved April 5, 2018.
Pooja Iyer piyer@rti.org RTI International Thank you Pooja Iyer piyer@rti.org