Download presentation
Presentation is loading. Please wait.
Published byElvin Jacobs Modified over 9 years ago
1
Security Methods for Statistical Databases
2
Introduction Statistical Databases containing medical information are often used for research Some of the data is protected by laws to help protect the privacy of the patient Proper security precautions must be implemented to comply with laws and respect the sensitivity of the data
3
Accuracy vs. Confidentiality Accuracy – Researchers want to extract accurate and meaningful data Confidentiality – Patients, laws and database administrators want to maintain the privacy of patients and the confidentiality of their information
4
Laws Health Insurance Portability and Accountability Act – HIPAA (Privacy Rule) Covered organizations must comply by April 14, 2003 Designed to improve efficiency of healthcare system by using electronic exchange of data and maintaining security Covered entities (health plans, healthcare clearinghouses, healthcare providers) may not use or disclose protected information except as permitted or required Privacy Rule establishes a “minimum necessary standard” for the purpose of making covered entities evaluate their current regulations and security precautions
5
HIPAA Compliance Companies offer 3 rd Party Certification of covered entities Such companies will check your company and associating companies for compliance with HIPAA Can help with rapid implementation and compliance to HIPAA regulations
6
Types of Statistical Databases Static – a static database is made once and never changes Example: U.S. Census Dynamic – changes continuously to reflect real-time data Example: most online research databases
7
Security Methods Access Restriction Query Set Restriction Microaggregation Data Perturbation Output Perturbation Auditing Random Sampling
8
Access Restriction Databases normally have different access levels for different types of users User ID and passwords are the most common methods for restricting access In a medical database: Doctors/Healthcare Representative – full access to information Researchers – only access to partial information (e.g. aggregate information)
9
Query Set Restriction A query-set size control can limit the number of records that must be in the result set Allows the query results to be displayed only if the size of the query set satisfies the condition Setting a minimum query-set size can help protect against the disclosure of individual data
10
Query Set Restriction Let K represents the minimum number or records to be present for the query set Let R represents the size of the query set The query set can only be displayed if K R
11
Query Set Restriction
12
Microaggregation Raw (individual) data is grouped into small aggregates before publication The average value of the group replaces each value of the individual Data with the most similarities are grouped together to maintain data accuracy Helps to prevent disclosure of individual data
13
Microaggregation National Agricultural Statistics Service (NASS) publishes data about farms To protect against data disclosure, data is only released at the county level Farms in each county are averaged together to maintain as much purity, yet still protect against disclosure
14
Microaggregation
16
Data Perturbation Perturbed data is raw data with noise added Pro: With perturbed databases, if unauthorized data is accessed, the true value is not disclosed Con: Data perturbation runs the risk of presenting biased data
17
Data Perturbation
18
Output Perturbation Instead of the raw data being transformed as in Data Perturbation, only the output or query results are perturbed The bias problem is less severe than with data perturbation
19
Output Perturbation Query Results
20
Auditing Auditing is the process of keeping track of all queries made by each user Usually done with up-to-date logs Each time a user issues a query, the log is checked to see if the user is querying the database maliciously
21
Random Sampling Only a sample of the records meeting the requirements of the query are shown Must maintain consistency by giving exact same results to the same query Weakness - Logical equivalent queries can result in a different query set
22
Comparison Methods Security Security – possibility of exact disclosure, partial disclosure, robustness Richness of Information Richness of Information – amount of non-confidential information eliminated, bias, precision, consistency Costs Costs – initial implementation cost, processing overhead per query, user education The following criteria are used to determine the most effective methods of statistical database security:
23
A Comparison of MethodsMethodSecurity Richness of Information Costs Query-set Restriction Low Low 1 Low MicroaggregationModerateModerateModerate Data Perturbation HighHigh-ModerateLow Output Perturbation ModerateModerate-lowLow AuditingModerate-LowModerateHigh SamplingModerateModerate-LowModerate 1 Quality is low because a lot of information can be eliminated if the query does not meet the requirements
24
Sources This presentation is posted on http://www.cs.jmu.edu/users/aboutams http://www.cs.jmu.edu/users/aboutams Adam, Nabil R. ; Wortmann, John C.; Security- Control Methods for Statistical Databases: A Comparative Study; ACM Computing Surveys, Vol. 21, No. 4, December 1989 (http://delivery.acm.org/10.1145/80000/76895/p515- adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110)http://delivery.acm.org/10.1145/80000/76895/p515- adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110 Official HIPAA – (http://cms.hhs.gov/hipaa/) incurhttp://cms.hhs.gov/hipaa/ Bernstein, Stephen W.; Impact of HIPAA on BioTech/Pharma Research: Rules of the Road (http://www.privacyassociation.org/docs/3-02bernstein.pdf)http://www.privacyassociation.org/docs/3-02bernstein.pdf Service Bureau; 3rd Party Testing (http://hipaatesting.com/service_bureau.html)http://hipaatesting.com/service_bureau.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.