Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical disclosure limitation: Balancing data confidentiality and data access.

Similar presentations


Presentation on theme: "Statistical disclosure limitation: Balancing data confidentiality and data access."— Presentation transcript:

1 Statistical disclosure limitation: Balancing data confidentiality and data access

2 Enables evidence based policy-making Informs the general public on local and national concerns Advances scientific research Trains students in data analysis and decision making Access to high quality data is vital

3 Breach of confidentiality: May violate laws (e.g., CIPSEA, HIPAA) Undermine broadly held and highly valued ethical principles May lead data providers to withhold important information or refuse to participate in research Protecting confidentiality of data is essential

4 HIPAA Privacy Rule - Safe harbor - Statistical standard - Limited data sets 2008: Delaware Cancer Registry vs. press - Publics desire to learn cancer sites - State requirement to protect privacy - New legislation Example of tension between access and confidentiality

5 De-identification Strip unique identifiers like names, addresses, and tax IDs from shared files. Reducing potential for re-identifcation Seemingly innocuous information may reveal individual identities and information Protecting confidentiality while providing access

6 De-identification Original data name abcdefghijkl Name deleted abcdefghijkl Re-identification Shared data abcdefghijkl Other data abcdefmnop name Where: a = Day, month, year of birthd = County b = Gendere = Occupation c = State of residencef = Race Example: Re-identification by matching

7 Advances in statistical analysis and the collection of more detailed data enable researchers and policy makers to ask refined questions Enormous amounts of individual-level data are collected, processed, widely distributed … and linkable. Better matching technologies enable linkages Better data – opportunities and problems

8 Personal information available on the Internet, from private sources, and government surveys Individuals with the right skills and resources could link this personal information to publicly available data: – MIT student re-identifies Massachusetts governor – NIH scientists express caution in making genetic information available Problems – a closer look

9 Statisticians: Develop ways to identify risk of confidentiality breaches Develop methods for providing safe access to confidential data Conduct research on providing safe access to emerging, complex data types Statisticians can help find a satisfactory balance

10 General strategies for data protection: Modify data content Remove or alter sensitive or identifying values, and provide unrestricted access to modified data (e.g., public use files) Control data access Use technology and training to reduce chances of breaches, limit who can access the confidential data, the conditions under which the data can be accessed, and the purposes for which the data can be used Useful data can be shared and protected

11 Eliminate variables (geography) Aggregate sensitive data (age, income) Add random variation to numerical data values Exchange some values between selected records Replace sensitive data with values simulated from statistical models estimated with the original data Modified data: General techniques

12 Methods can be applied to all or some cases with varying degrees Wider application of methods improves confidentiality protection, but… …degrades usefulness of data Statisticians measure the tradeoffs between disclosure risk and analytic/policy priorities Key features of modified data

13 Restricted data enclaves (Census, NCHS) Remote access systems (NCHS, NORC) Licensing (NCES, BLS, ) Online tabulations/analysis (Census, NCHS, NCES) Restricted access increasingly provided - examples

14 Safe projects: Authorized projects, typically with data use agreement Safe people: Approved analysts from authorized institutions; trained in confidentiality issues Safe sites: Use actively monitored by data custodians Safe outputs: Data products subject to statistical and confidentiality review => Analysts have use of detailed data but do not own them which permits manipulations not possible with publicly available data Key features of restricted access

15 Data access and data confidentiality are intimately connected Statisticians play a central role in improving data usefulness while protecting data confidentiality Statisticians in government, academia, and industry can provide guidance to policy- makers on key issues related to privacy and confidentiality Summary

16 ASA Statement on Data Access and Personal Privacy http://www.amstat.org/news/statementondataaccess.cfm http://www.amstat.org/news/statementondataaccess.cfm ASAs Privacy and Confidentiality Committee http://www.amstat.org/committees/commdetails/cfm?txtComm=CCNP RO02 http://www.amstat.org/committees/commdetails/cfm?txtComm=CCNP RO02 ASAs Privacy, Data Security and Confidentiality Website http://www.amstat.org/committee/pc/index.html OMB/FCSM Report on Statistical Disclosure Limitation Methodology http://www.fcsm.gov/working-papers/spwp22.html http://www.fcsm.gov/working-papers/spwp22.html Expanding Access to Research Data: Reconciling Risks and Opportunities http://books.nap.edu/catalog.php?record_id=11434 Further information

17 American Statistical Association 732 N. Washington Street Alexandria, Virginia 22314 703.684.1221 http://www.amstat.org


Download ppt "Statistical disclosure limitation: Balancing data confidentiality and data access."

Similar presentations


Ads by Google