Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

Slides:



Advertisements
Similar presentations
Set 6, Statistical Database Security
Advertisements

Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
© 2011 Pearson Education, Inc
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
The Excel NORMDIST Function Computes the cumulative probability to the value X Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Last time Finish OTR Database Security Introduction to Databases
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Determining the Size of
17 June, 2003Sampling TWO-STAGE CLUSTER SAMPLING (WITH QUOTA SAMPLING AT SECOND STAGE)
1 Psych 5500/6500 Statistics and Parameters Fall, 2008.
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
Standard Error and Research Methods
COLLECTING QUANTITATIVE DATA: Sampling and Data collection
1 Basic Definitions Greg C Elvers, Ph.D.. 2 Statistics Statistics are a set of tools that help us to summarize large sets of data data -- set of systematic.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
Database Security DBMS Features Statistical Database Security.
CS573 Data Privacy and Security Statistical Databases
Secure Cloud Database using Multiparty Computation.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 Theoretical Physics Experimental Physics Equipment, Observation Gambling: Cards, Dice Fast PCs Random- number generators Monte- Carlo methods Experimental.
PARAMETRIC STATISTICAL INFERENCE
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Computer Security: Principles and Practice
Sampling is the other method of getting data, along with experimentation. It involves looking at a sample from a population with the hope of making inferences.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 5 – Database Security.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
© 2009 Pearson Education, Inc publishing as Prentice Hall 12-1 Sampling: Design and Procedure Sampling Size.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Data Perturbation An Inference Control Method for Database Security Dissertation Defense Bob Nielson Oct 23, 2009.
Quantitative research – variables, measurement levels, samples, populations HEM 4112 – Research methods I Martina Vukasovic.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Stat 1510: Sampling Distributions
Inference Problem Privacy Preserving Data Mining.
Data Collection & Sampling Dr. Guerette. Gathering Data Three ways a researcher collects data: Three ways a researcher collects data: By asking questions.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a measure of the population. This value is typically unknown. (µ, σ, and now.
Privacy-preserving data publishing
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
Introduction Database Security Overview. Readings This lecture: This lecture: –Textbook: Chapter 5.2 –Lecture materials from CSCE 522, Nov. 3, Lecture.
Trustworthy Semantic Web Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem March 4, 2011.
9-1 Copyright © 2016 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Inference Problem. Access Control Policies Direct access Information flow Not addressed: indirect data access CSCE Farkas 2 Lecture 19.
Inference Problem Privacy Preserving Data Mining.
Security Methods for Statistical Databases. Introduction  Statistical Databases containing medical information are often used for research  Some of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Sampling Concepts Nursing Research. Population  Population the group you are ultimately interested in knowing more about “entire aggregation of cases.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
Formulation of the Research Methods A. Selecting the Appropriate Design B. Selecting the Subjects C. Selecting Measurement Methods & Techniques D. Selecting.
Introduction to Instrumentation Engineering
Database Security Jagdish S. Gangolly School of Business
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Overview of Database Security
Trustworthy Semantic Web
A handbook on validation methodology. Metrics.
How Confident Are You?.
Security in Computing, Fifth Edition
Presentation transcript:

Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well as statistical ones. Main problem: achievment of compromise between the privacy needs of individuals and the right of organizations to know and process information: preventing statistical inference.

Statistical database security Issues: –Characteristics of the SDB to be protected: Is the database on-line (i.e. queries executed immediately) or off-line (queries executed later)? Is the SDB static (no updates) or dynamic? –Additional knowledge of users: depending on the knowledge of a user it is easier or more difficult for the user to perform inference. –Types of attacks: developer needs to “know” the type of inference attacks potential snoopers will use.

Inference protection techniques Conceptual techniques: definition of populations, partitioning. Restriction-based techniques: restrict the type of queries that may be asked or the kind of result that may be obtained. Perturbation-based techniques: distort the data so that the statistical results are still correct but possibly inferred data are incorrect.

Inference protection techniques Conceptual techniques: –The lattice model: a lattice can be built for combinations of conditions on attributes. The n-respondent k%-dominance criterion says: a statistic is sensitive if n or fewer records represent more than k% of the population. –Conceptual partitioning: populations are defined at a semantic level. (e.g. male employees in a department.)

Inference protection techniques Restriction-based techniques: –Query-set size control: a statistic query q(C) is permitted only if its query set X(C) satisfies k  | X(C) |  N – k (N is the number of SDB record and k  0 is a fixed parameter.) This prevents simple attacks based on very small or very large query sets. It does not prevent more sophisticated attacks using trackers, general trackers, double trackers and union trackers.

Inference protection techniques Restriction-based techniques: –Expanded query-set size control: given query q(A=a and B=b and … and C=c) there are 2 m implied query sets where m is the number of parts in the query: q( A=a and B=b and … and C=c) where X i is either “” or “not”. The query q is only allowed if all 2 m implied query sets fall in the allowable range [k, N – k]. This technique becomes very expensive for large values of m.

Inference protection techniques Restriction-based techniques: –Query-set overlap control: check the overlap in query sets of successive queries against the number of common records they have. Query q(C) is permitted only if: | X(C)  X(D) |  ,  > 0 thus, the number of common records between query set of q(C) and the query set of all the query sets q(D) of all earlier released queries is not more than .

Inference protection techniques Restriction-based techniques: –Audit-based controls: while query-overlap control may not be very effective at preventing inference, it is possible to detect attempts at such inference by observing audit-trails of successive queries (by the same user or by a group). –Techniques based on number of attributes: the DBA determines that statistical queries involving more than d attributes are not permissible.

Inference protection techniques Restriction-based techniques: –Partitioning: the population is divided into small disjoint subgroups (and population of 1 is not allowed). Queries are only allowed on such groups, thus forbidding arbitrary sets. –Cell suppression: like with partitioning, but all “cells” which satisfy the n-respondent k%- dominance rule are considered sensitive and cannot be examined.

Inference protection techniques Perturbation-based techniques: –Record-based perturbation: the records in the database are distorted before applying the statistics. –Result-based perturbation: the correct result is distorted before releasing it. –The difference between the true value and the released value of a statistic is called bias. –Perturbed statistics must be consistent, i.e. free of paradoxes. Whatever the bias, the results should be “possible”.

Inference protection techniques Perturbation-based techniques: –Data swapping: attribute values between the records of the original SDB are exchanged in such a way that the resulting modified SDB has no records in common with the original SDB. –Random-sample queries: the actual query set is replaced by a random sampled query set. This only works if the query sets are large enough, otherwise attacks based on small-size query sets become possible.

Inference protection techniques Perturbation-based techniques: –Fixed perturbation: the values of the attributes used in the computation of statistics are modified in a fixed way (does not vary from query to query). This fixed way eliminates the risk of improving the estimates by repeating a query. –Query-based perturbation: The perturbation is different for different queries. –Rounding: The result of a statistical query is rounded before being released. There is systematic, random and controlled rounding.