Week 9 - Friday.  What did we talk about last time?  Database security requirements  Database reliability and integrity  Sensitive data.

Slides:



Advertisements
Similar presentations
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Advertisements

Unit 7: Store and Retrieve it Database Management Systems (DBMS)
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Computer Concepts 5th Edition Parsons/Oja Page 492 CHAPTER 10 File And Database Concepts Section A PARSONS/OJA Databases.
Information Security Principles & Applications
Database Management: Getting Data Together Chapter 14.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Last time Finish OTR Database Security Introduction to Databases
Build a database IV: Create queries for a new Access database Overview: Ask your data — create queries It’s time to create queries, one of the most powerful.
Basic Concept of Data Coding Codes, Variables, and File Structures.
SE571 Security in Computing
Lecture 18 Page 1 CS 111 Online Design Principles for Secure Systems Economy Complete mediation Open design Separation of privileges Least privilege Least.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Chapter 6 – Database Security  Integrity for databases: record integrity, data correctness, update integrity  Security for databases: access control,
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 13 Database Management Systems: Getting Data Together.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Week 9 - Wednesday.  What did we talk about last time?  Government evaluation standards  Database basics.
L.E.A. Data Technologies L.E.A. Data Technologies Introduction.
Database Security And Audit. Databasics Data is stored in form of files Record : is a one related group of data (in a row) Schema : logical structure.
Database Security DBMS Features Statistical Database Security.
Sensitive Data  Data that should not be made public  What if some but not all of the elements of a DB are sensitive Inherently sensitiveInherently sensitive.
Failsafe systems Fail by Failing to be Failsafe. Or to put it simply Don’t worry, nothing can go wrong click go wrong click go wrong click.
Microsoft ® Office Access ® 2007 Training Build a database I: Design tables for a new Access database ICT Staff Development presents:
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Computer Security: Principles and Practice
Databases Collections of data. Set of rules to organize data. Types ◦ Relational: use (rows) & columns to organize. ◦ Object oriented: complex data (audio,
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Tools for Privacy Preserving Distributed Data Mining
CHAPTER 1 LESSON 3 Math in Science.
Microsoft ® Access ® 2010 Training Create Queries for a New Database If a yellow security bar appears at the top of the screen in PowerPoint, click Enable.
Math in Science.  Estimate  Accuracy  Precision  Significant Figures  Percent Error  Mean  Median  Mode  Range  Anomalous Data.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Chapter - 2 Basics of Sound Structure Author: Graeme C. Simsion and Graham C. Witt.
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Presenter: Shanshan Lu 03/04/2010
1 What to do before class starts??? Download the sample database from the k: drive to the u: drive or to your flash drive. The database is named “FormBelmont.accdb”
Views Lesson 7.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
CHAPTER 5 Database Security 1. Objectives  Explain briefly the concept of databases  Identify the security requirement of the databases  List and explain.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Lecture 7: What is Regression Analysis? BUEC 333 Summer 2009 Simon Woodcock.
M1G Introduction to Database Development 4. Improving the database design.
ITGS Databases.
INFO1408 Database Design Concepts Week 16: Introduction to Database Management Systems Continued.
Database Principles. Basics A database is a collection of data, along with the relationships between the data The data has to be entered into a structure,
Computer Science and Engineering Computer System Security CSE 5339/7339 Session 21 November 2, 2004.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Lecture 17 Page 1 CS 236 Online Onion Routing Meant to handle issue of people knowing who you’re talking to Basic idea is to conceal sources and destinations.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Network Security Philadelphia UniversitylAhmad Al-Ghoul Module 7 Module 7 Data Base Security  MModified by :Ahmad Al Ghoul  PPhiladelphia.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Week 10 - Thursday.  What did we talk about last time?  Database inference  Data mining.
Experiments Textbook 4.2. Observational Study vs. Experiment Observational Studies observes individuals and measures variables of interest, but does not.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Lecture 2 Page 1 CS 236 Online Security Policies Security policies describe how a secure system should behave Policy says what should happen, not how you.
Week 4 - Wednesday CS 113.
Federalist Papers Activity
Inference and Flow Control
Database Security (Chapter 8, Sections 4-7)
Database Security Jagdish S. Gangolly School of Business
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Security in Computing, Fifth Edition
Presentation transcript:

Week 9 - Friday

 What did we talk about last time?  Database security requirements  Database reliability and integrity  Sensitive data

Ted Bushong

 In a direct attack on sensitive information, a user will try to determine the values of a sensitive field by finding the right query  Consider the following table of students: NameSexRaceAidFinesDrugsDorm AdamsMC Holmes BaileyMB000Grey ChinFA West DewittMB Grey EarhartFC Holmes FeinFC West GroffMC400003West HillFB Holmes KochFC001West LiuFA0102Grey MajorsMC200002Grey

 SELECT NAME FROM STUDENTS WHERE SEX="M" AND DRUGS="1"  This query might be rejected because it asks for a specific value of sensitive field Drugs  SELECT NAME FROM STUDENTS WHERE (SEX="M" AND DRUGS="1") OR (SEX<>"M" AND SEX<>"F") OR (DORM="AYRES")  This query might be accepted by some systems because it appears to mask the sensitive information by including other information  However, the additional OR clauses add no records

 To avoid leaking sensitive data, some DBMSs allow statistics to be reported  Each of the following statistics can be attacked in different ways:  Sum  Count  Mean  Median

 A single carefully chosen query can leak information  The sum of financial aid broken down by gender and dorm reveals that no female student in Grey receives financial aid  If we know that Liu is a female student in Grey, we know she gets no aid

 Count statistics are often released in addition to sum  Together these two allow averages to be computed  Alternatively, if count and average are released, sums could be computed  A query of the count of students reveals that there is 1 male in Holmes and 1 male in Grey  We can get their names because that is unprotected data  Using the previous sums of financial aid, we can determine exactly how much they get

 If you are able to get means for slightly different sets, you can determine the values for the difference of those sets  Example (with made-up numbers)  Average salary for an E-town employee: $47,600  Average salary for all E-town employees except for the president: $47,000  Given that there are 500 employees at E-town, how much does Dr. Strikwerda make?

 We can use the median values of lists to reconstruct the user who has that median value  Some extra information might be needed  If someone knows that Majors is the only male whose drug-use score is 2, they can find the median of the financial aid value for males and the median of the financial aid value for people whose drug-use score is 2  If the two values match, that number is Majors’s aid, with high probability

 A DBMS may refuse to give back a result that has only a single record  COUNT(SELECT * FROM STUDENTS WHERE SEX="F" AND RACE="C" AND DORM="HOLMES")  However, we can use the laws of set theory to defeat the DBMS  |A  B  C| = |A| - |A  (B  C) c |  Thus, we find:  COUNT(SELECT * FROM STUDENTS WHERE SEX="F")  COUNT(SELECT * FROM STUDENTS WHERE SEX="F" AND (RACE<>"C" OR DORM<>"HOLMES"))  Then, we subtract the two values  This attack can be extended to a system of linear equations to solve for a particular set of values

 Suppression means that sensitive values are not reported  Concealing means that values close to the sensitive values are returned, but not the values themselves  This represents the tension between security and precision

 The n items over k percent rule means that the data should not be reported if the number of records n makes up more than k percent of the total  This is a standard rule for suppression  However, if counts are supplied, additional values may need to be hidden so that the hidden values can’t be computed  It’s like Sudoku!

 Instead of failing to report values, we can combine ranges  Though drug use values of 0, 1, 2, and 3 would allow us to recover individual names, we can report the category of 0 – 1 and 2 – 3  Numerical values such as money can reported in ranges as well  Finally, values could be rounded (note that this is essentially the same as using ranges)

 In random sample control, results are given from a random sample of the database, not the whole thing  Useless for count, but not bad for mean  To prevent users from getting more information from repeated queries, the random sample for a given query should always be the same  Random data perturbation adds small amounts of error to each value  Because the errors are positive or negative, means and sums will be only slightly affected

 Suppress obviously sensitive information  Easy, but incomplete  Track what the user knows  Expensive in terms of computation and storage requirements  Analysis may be difficult  Multiple users can conspire together  Disguise the data  Data is hidden  Users who are not trying to get sensitive data get slightly wrong answers

 A single element may have different security from other values in the record or other elements in the same attribute  Sensitive and non-sensitive might not be enough categories to capture all possible confidentiality relationships  The security of an aggregate (sums, counts, etc.) may be different from the security of its parts  We want to add different levels of security, similar to Bell-La Padula

 Integrity is difficult, but we can assign levels of trust  It is necessarily not going to be as rigorous as Biba  Confidentiality  Difficult and causes redundancies since top secret information cannot be visible in any way to low clearance users  Worse, we don’t want to leak any information by preventing a record from being added with a particular primary key (because there is a hidden record that already has that primary key)  Polyinstantiation means that records with similar or identical primary keys (but different data) can exist at different security levels

 Partitioning  Each security level is a separate database  Simple, but destroys redundancy and makes access harder  Encryption  A single key for a security level isn’t good enough: a break in the key reveals everything  A key for every record (or field) gives good security, but the time and space overhead is huge

 An integrity lock database has a security label for each item, which is:  Unforgeable  Unique  Concealed (even the sensitivity level is unknown)  If these labels are preserved on the data storage side, an untrusted front-end can be used as long as it has good authentication  Storage requirements are heavy

 A trusted front-end is also known as a guard  The idea isn’t that different from an integrity lock database with an untrusted front end  It is trying to leverage DBMS tools that are familiar to most DBAs  The front-end can be configured instead of interacting with database internals  The system can be inefficient because a lot of data can be retrieved and then discarded by the front end if it isn’t at the right level

 What do Walmart, hurricanes, and Pop-Tarts have to do with one another?  A 2004 NY Times article says that Walmart's analysis shows the demand for strawberry Pop-Tarts goes up by a factor of 7 before a hurricane makes landfall  But the top selling item is beer

 Data mining means looking for patterns in massive amounts of data  These days, governments and companies collect huge amounts of data  No human being could sift through it all  We have to write computer programs to analyze it  It is sort of a buzzword, and people argue about whether some of these activities should simply be called data analysis or analytics

 We have huge databases (terabytes or petabytes)  Who is going to look through all that?  Machines of course  Data mining is a broad term covering all kinds of statistical, machine learning, and pattern matching techniques  Relationships discovered by data mining are probabalistic  No cause-effect relationship is implied

 It is a form of machine learning or artificial intelligence  At the most general, you can:  Cluster analysis: Find a group of records that are probably related ▪ Like using cell phone records to find a group of drug dealers  Anomaly detection: Find an unusual record ▪ Maybe someone who fits the profile of a serial killer  Association rule mining: Find dependencies ▪ If people buy gin, they are also likely to buy tonic

 Social media providers have access to lots of data  Facebook alone has details about over a billion people  Can they find hidden patterns about your life?  Should they inform the police if they think they can reliably predict crime?  What about data the government has?  For research purposes, some sets of "anonymized" data are made public  But researchers often discover that the people involved can be discovered anyway

 Privacy issues are complex  Sharing data can allow relationships to become evident  These relationships might be sensitive  Integrity  Because data mining can pull data from many sources, mistakes can propagate  Even if the results are fixed, there is no easy way to correct the source databases  Data mining can have false positives and false negatives

 Network basics  Graham Welsh presents

 Read Sections 7.1 and 7.2  Finish Project 2  Due tonight by midnight  Assignment 4 has been assigned