21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.

Slides:



Advertisements
Similar presentations
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.1 Chapter Five Data Collection and Sampling.
Advertisements

Agenda Problem Existing Approaches The e-Lab Is DRM the solution?
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
1 Privacy Prof. Ravi Sandhu Executive Director and Endowed Chair March 8, © Ravi Sandhu World-Leading Research.
Access Control Methodologies
Chap 1: Overview Concepts of CIA: confidentiality, integrity, and availability Confidentiality: concealment of information –The need arises from sensitive.
Greg Lamb. Introduction It is clear that we as consumers and entrepreneurs cannot expect complete privacy when discussing business matters. However… There.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Information Security Principles & Applications
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen,
Data Security against Knowledge Loss *) by Zbigniew W. Ras University of North Carolina, Charlotte, USA.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Privacy Policy, Law and Technology Carnegie Mellon University Fall 2007 Lorrie Cranor 1 Data Privacy.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
1 Information and Data Privacy: An Indian Perspective  Why is this important? Public concern about privacy.  Considerable concern in developed countries.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
Last time Finish OTR Database Security Introduction to Databases
Basic Concept of Data Coding Codes, Variables, and File Structures.
SE571 Security in Computing
1st MODINIS workshop Identity management in eGovernment Frank Robben General manager Crossroads Bank for Social Security Strategic advisor Federal Public.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Chapter 6 – Database Security  Integrity for databases: record integrity, data correctness, update integrity  Security for databases: access control,
Relational Database Concepts. Let’s start with a simple example of a database application Assume that you want to keep track of your clients’ names, addresses,
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Sensitive Data  Data that should not be made public  What if some but not all of the elements of a DB are sensitive Inherently sensitiveInherently sensitive.
Artificial, Composite and Secondary UIDs
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 11 09/27/2011 Security and Privacy in Cloud Computing.
Privacy in computing Material/text on the slides from Chapter 10 Textbook: Pfleeger.
Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Chapter - 2 Basics of Sound Structure Author: Graeme C. Simsion and Graham C. Witt.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter No 4 Query optimization and Data Integrity & Security.
Database Security Outline.. Introduction Security requirement Reliability and Integrity Sensitive data Inference Multilevel databases Multilevel security.
Databases Unit 3_6. Flat File Databases One table containing data Data must be entered as a whole each time e.g. customer name and address each time (data.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.1 Chapter Five Data Collection and Sampling.
Chapter Five Data Collection and Sampling Sir Naseer Shahzada.
M1G Introduction to Database Development 4. Improving the database design.
ITGS Databases.
The world’s libraries. Connected. Theoretical Research about Privacy Georgia State University Reporter: Zaobo He Seminar in 10 / 07 / 2015.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Finding a PersonBOS Finding a Person! Building an algorithm to search for existing people in a system Rahn Lieberman Manager Emdeon Corp (Emdeon.com)
Copyright © 2015 by Saunders, an imprint of Elsevier Inc. All rights reserved. Chapter 3 Privacy, Confidentiality, and Security.
Anonymity and Privacy Issues --- re-identification
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Organization of statistical investigation. Medical Statistics Commonly the word statistics means the arranging of data into charts, tables, and graphs.
CSCI 347, Data Mining Data Anonymization.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Inference Problem Privacy Preserving Data Mining.
Security Methods for Statistical Databases. Introduction  Statistical Databases containing medical information are often used for research  Some of.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Unlinking Private Data
HCI problems in computer security Mark Ryan. Electronic voting.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Database and Cloud Security
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Side-Channel Attack on Encrypted Traffic
Executive Director and Endowed Chair
Executive Director and Endowed Chair
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Lecture 27: Privacy CS /7/2018.
Presented by : SaiVenkatanikhil Nimmagadda
Mapping an ERD to a Relational Database
Security in Computing, Fifth Edition
Presentation transcript:

21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity Locks  Designs of MLS Databases

21-2 This time Data Mining  Integrity and Availability  Privacy and Data Mining  Privacy-Preserving Data Mining

21-3 Data Mining Multilevel databases weren’t a commercial success  Mainly military clients, finding all possible inferences is NP-complete However, the combination of (sensitive) information, stored in multiple (maybe huge) databases, as done for data mining, raises similar concerns and has gotten lots of attention recently So far, a single entity has been in control of some data  Knows what kind of data is available  Who has accessed it (ignoring side channels)‏ No longer the case in data mining, data miners actively gather additional data from third parties

21-4 Data Mining (cont.)‏ Data mining tries to automatically find interesting patterns in data using a plethora of technologies  Statistics, machine learning, pattern recognition,…  Still need human to judge whether pattern makes sense (causality vs. coincidence)‏ Data mining can be useful for security purposes  Learning information about an intrusion from logs

21-5 Security Problems of Data Mining Confidentiality  Derivation of sensitive information Integrity  Mistakes in data Availability  (In)compatibility of different databases

21-6 Confidentiality Data mining can reveal sensitive information about humans (see later) and companies In 2000, the U.S. National Highway Traffic Safety Administration combined data about Ford vehicles with data about Firestone tires and become aware of a problem with the Ford Explorer and its Firestone tires  Problem started to occur in 1995, and each company individually had some evidence of the problem  However, data about product quality is sensitive, which makes sharing it with other companies difficult Supermarket can use loyalty cards to learn who buys what kind of products and sell this data, maybe to manufacturers’ competitors

21-7 Data Correctness and Integrity Data in a database might be wrong  E.g., input or translations errors Mistakes in data can lead to wrong conclusions by data miners, which can negatively impact individuals  From receiving irrelevant mail to being denied to fly Privacy calls for the right of individuals to correct mistakes in stored data about them  However, this is difficult if data is shared widely or if there is no formal procedure for making corrections In addition to false positives, there can also be false negatives: don’t blindly trust data mining applications

21-8 Availability Mined databases are often created by different organizations  Different primary keys, different attribute semantics,… Is attribute “name” last name, first name, or both? US or Canadian dollars? Makes combination of databases difficult Must distinguish between inability to combine data and inability to find correlation

21-9 Privacy and Data Mining Data mining might reveal sensitive information about individuals, based on the aggregation and inference techniques discussed earlier Avoiding these privacy violations is active research Data collection and mining is done by private companies  Privacy laws (e.g., Canada’s PIPEDA or U.S.’ HIPAA) control collection, use, and disclosure of this data  Together with PETs But also by governments  Programs tend to be secretive, no clear procedures  Phone tapping in U.S., no-fly lists in U.S. and Canada

21-10 Privacy-Preserving Data Mining Anonymize data records before making them available  E.g., strip names, addresses, phone numbers  Unfortunately, such simple anonymization might not be sufficient August 6, 2006: AOL released 20 million search queries from 658,000 users To protect users’ anonymity, AOL assigned a random number to each user  “numb fingers”  “landscapers in Lilburn, Ga”  “how to kill your wife” August 9: New York Times article re-identified user  Thelma Arnold, 62-year old widow from Lilburn, GA

21-11 Another Example (by L. Sweeney)‏ 87% of U.S. population can be uniquely identified based on person’s ZIP code, gender, and date of birth Massachusetts’ Group Insurance Commission released anonymized health records Records left away individuals’ name, but gave their ZIP code, gender, and date of birth (and health information, of course)‏ Massachusetts's voter registration lists contain these three items plus individuals’ names and are publicly available Enables re-identification by linking

21-12 k-Anonymity Ensure that for each released record, there are at least k-1 other released records from which record cannot be distinguished For health-records example, release a record only if there are k-1 other records that have same ZIP code, gender, and date of birth  Assumption: there is only one record for each individual Because of the 87% number, this won’t return many records, need some pre-processing of records  Strip one of { ZIP code, gender, date of birth } from all records  Reduce granularity of ZIP code or date of birth

21-13 Discussion In health-records example, the attributes ZIP code, gender, and date of birth form a “quasi-identifier” Determining which attributes are part of the quasi- identifier can be difficult  Should health information be part of it?  Some diseases are rare and could be used for re- identification  However, including them is bad for precision Quasi-identifier should be chosen such that released records do not allow any re-identification based on any additional data that attacker might have  Clearly we don’t know all this data

21-14 Value Swapping Data perturbation based on swapping values of some (not all!) data fields for a subset of the records  E.g., swap addresses in subset of records Any linking done on the released records can no longer considered to be necessarily true Trade off between privacy and accuracy Statistically speaking, value swapping will make strong correlations less strong and weak correlations might go away entirely

21-15 Adding Noise Data perturbation based on adding small positive or negative error to each value Given distribution of data after perturbation and the distribution of added errors, distribution of underlying data can be determined  But not its actual values Protects privacy without sacrificing accuracy

21-16 Recap Data Mining  Integrity and Availability  Privacy and Data Mining  Privacy-Preserving Data Mining

21-17 Next time Administering Security  Security planning  Risk Analysis