Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Data Sharing and its Statistical Limitations

Similar presentations


Presentation on theme: "Open Data Sharing and its Statistical Limitations"— Presentation transcript:

1 Open Data Sharing and its Statistical Limitations
Pooja Iyer Barbara Do RTI International

2 Outline I. Data utility vs. data risk
II. Introduction of the standard disclosure techniques III. Reason for cell suppression IV. Additional limitation techniques a. Limitation of detail b. Top/bottom coding c. Additive Noise

3 R vs. U R vs. U³ 1. High data utility U, so faithful in critical ways to the original (analytically valid) data 2. Low disclosure risk R, so confidentiality is protected (safe data)

4 Standard Disclosure Techniques De-Identification of PHI in accordance with HIPAA¹
Medical Record Numbers ‘Geographic subdivisions smaller than state’ Site ID Other items to consider removing/randomizing: remove ALL PROPER NOUNS i.e. names, initials, specific geographic locations Mask specific dates to a new category that shows ‘days from randomization’ Telephone numbers, address, social security numbers, all biometric identifiers

5 Cell Suppression “In a contingency table, cells with too few observations cannot be released to the public, as it may be easy to infer the identity of these individuals.” ²

6 Additional techniques²
Limitation of detail collapsing categories Top/bottom coding adding categories Additive noise Top/Bottom Coding Additive Noise Z = transformed point X = original data point ε = random variable with distribution e~N(0,σ²) Limitation of detail: View slide in presentation mode: Summary Limitation of detail Top/Bottom Coding Noise Addition Be sure to explain what MUAC reading is, and why it is potentially identifiable

7 References ¹HHS Office of the Secretary,Office for Civil Rights, & OCR. (2015, November 06). Methods for De-identification of PHI. Retrieved April 05, 2018 ²Matthews, G. J., & Harel, O. (2011). Data confidentiality: A review of methods for statistical disclosure limitation and methods for assessing prviacy. Statistics Surveys, 5, doi: /11-SS074 ³Duncan, G. T., Keller-McNulty, S. A., & Stokes, S. L. (2001). Disclosure Risk vs. Data Utility: The R-U Confidentiality Map. National Institute of Statistical Sciences, 5-7. Retrieved April 5, 2018.

8 Pooja Iyer piyer@rti.org
RTI International Thank you Pooja Iyer


Download ppt "Open Data Sharing and its Statistical Limitations"

Similar presentations


Ads by Google