UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.

Slides:



Advertisements
Similar presentations
Estimating Identification Risks for Microdata Jerome P. Reiter Institute of Statistics and Decision Sciences Duke University, Durham NC, USA.
Advertisements

Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
Tracking Meeting Khaled El Emam, CHEO RI & uOttawa.
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY.
29e CONFÉRENCE INTERNATIONALE DES COMMISSAIRES À LA PROTECTION DES DONNÉES ET DE LA VIE PRIVÉE 29 th INTERNATIONAL CONFERENCE OF DATA PROTECTION AND PRIVACY.
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Health Insurance Portability and Accountability Act HIPAA Education for Volunteers and Students.
HIPAA Health Insurance Portability and Accountability Act.
1 HIPAA Education CCAC Professional Development Training September 2006 CCAC Professional Development Training September 2006.
Am I authorized to disclose this information? What level of protection does this information require? Releasing and Publishing Information: 1 st Ask Yourself.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
Chapter 10 Privacy and Security McGraw-Hill
Privacy Research Overview
Issues for Multimedia Privacy & Security ---- Video Content Privacy Protection, Copyright Protection & Database Access Control Jianping Fan Dept of Computer.
Government Databases and You or How I Learned to Stop Worrying and Love Information Loss. By Patrick Fahey Mis 304.
1 Dr. Xiao Qin Auburn University Spring, 2011 COMP 7370 Advanced Computer and Network Security Generalizing.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Attacks against K-anonymity
HIPAA What’s Said Here – Stays Here…. WHAT IS HIPAA  Health Insurance Portability and Accountability Act  Purpose is to protect clients (patients)
HIPAA Privacy & Security EVMS Health Services 2004 Training.
Healthcare Group: The 12 Stories Peng (group lead), Paul, Bhavani, Le, Gail, Prabhakaran, Khan, Murat Feb 19-20, 2009 NSF Data & Application Security Workshop.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Company LOGO Data Privacy HIPAA Training. Progress Diagram Function in accordance Apply your knowledge Learn the Basics Orientation Evaluation Training.
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
Privacy in computing Material/text on the slides from Chapter 10 Textbook: Pfleeger.
Chapter 10– Estimating Voter Preferences Statistics is the science of making decisions in the face of uncertainty. We use information gathered from a sample.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Dimensions of Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Organizing Data and Information. What is Data?? Numbers, characters, images, or other method of recording, in a form which can be assessed by a human.
Lecture 17 Page 1 CS 236 Online Privacy CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
Chapter No 4 Query optimization and Data Integrity & Security.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
IT Applications Theory Slideshows By Mark Kelly Vceit.com Privacy Laws.
HIPAA Health Insurance Portability and Accountability Act of 1996.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem - I September.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Copyright © 2015 by Saunders, an imprint of Elsevier Inc. All rights reserved. Chapter 3 Privacy, Confidentiality, and Security.
Anonymity and Privacy Issues --- re-identification
CSCI 347, Data Mining Data Anonymization.
Trustworthy Semantic Web Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem March 4, 2011.
Differential Privacy (1). Outline  Background  Definition.
Inference Problem Privacy Preserving Data Mining.
Security Methods for Statistical Databases. Introduction  Statistical Databases containing medical information are often used for research  Some of.
Unraveling an old cloak: k-anonymity for location privacy
Table of Contents. Lessons 1. Introduction to HIPAA Go Go 2. The Privacy Rule Go Go.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Information Security, Theory and Practice.
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Michael Spiegel, Esq Timothy Shimeall, Ph.D.
IT Applications Theory Slideshows
Privacy-preserving Release of Statistics: Differential Privacy
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
National Bureau of Statistics of China
Other Sources of Information
Harvard Medical School Center for Biomedical Informatics
HIPAA Overview.
Presented by : SaiVenkatanikhil Nimmagadda
HIPAA & PHI TRAINING & AWARENESS
18734: Foundations of Privacy
Trustworthy Semantic Web
The Health Insurance Portability and Accountability Act
Presentation transcript:

UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006

UTEPComputer Science Dept.2 Database with Confidential Information Examples: –census data –medical information Privacy: protect the confidentiality of individuals Usefulness: want to derive meaningful statistics

UTEPComputer Science Dept.3 The Need for Privacy Safeguards Per person available disk space: –1983: 0.02Mb –1996: 28Mb –2000: 472Mb Equivalent of one page per 3 minutes of life

UTEPComputer Science Dept.4 Misuse of personal health information: –banker cross-referencing cancer patients with outstanding loans –using medical records to make decisions about employees –snooping in hospital computer network –40% of insurers disclose personal health information to lenders, employers, marketers, without customer permission The Need for Privacy Safeguards

UTEPComputer Science Dept.5 Approaches Access control, encryption: –Only fixes who has access to what –Does not protect disclosures based on inference Problem –Sometimes it may be possible to derive confidential information from released information

UTEPComputer Science Dept.6 Examples Salary database Query: what’s the average salary of white male professors with 2 children living El Paso Texas since 1994 and in Boston from 1987 to 1994?

UTEPComputer Science Dept.7 Examples 87% of population of the US are unique under ID made of: –5 digit ZIP, –gender, –date of birth

UTEPComputer Science Dept.8 Linking to Re-Identify Data Medical database: –Ethnicity, visit date, diagnosis, procedure, medication, ZIP, Birth date, Sex Voter list: –Name, address, date registered, ZIP, Birth date, Sex

UTEPComputer Science Dept.9 Statistical Database Data collected with the purpose of releasing statistical information. Important for research, policy Facing tremendous demand for person- specific data –data mining, fraud detection, homeland security

UTEPComputer Science Dept.10 Sample Size Possible solution: do not release any statistics on any set of less than, say,10 records

UTEPComputer Science Dept.11 Problem Remains Query 1: What’s the average salary of every male age 89 in zip code 79912? Query 2: What’s the average salary of people age 89 in zip code 79912?

UTEPComputer Science Dept.12 K-anonymity Release only information where at least k records are identical (work by Sweeney) Attacks are still possible: –Unsorted matching: use the order of records solution: randomize order

UTEPComputer Science Dept.13 K-anonymity –Complementary release: combining k-anonymous releases may not be k- anonymous solution: consider all releases together –Temporal attack: data is dynamic, adding and removing data affects k-anonymous properties solution: analyze k-anonymous properties of dynamic data

UTEPComputer Science Dept.14 Other Solutions Add noise in the answers Add noise in the data Limit the kinds of queries allowed to the statistical database

UTEPComputer Science Dept.15 Quantifying Information Need a formal model, possibly based on information theory Measure entropy in database records before and after a statistical release

UTEPComputer Science Dept.16 Further Complications Some data is more sensitive than others –Example: bits in salary Common knowledge, information from other databases –Could define entropy conditional to available information –Very impractical in applications Some people know some of the records

UTEPComputer Science Dept.17 Non Additivity Data sensitivity is non additive –Ex: don’t mind either digit of SSN to be released, but not all digits Privacy loss is non additive –Ex: There could be 2 sets of information, each of which, if released, gives no information, but which, if together released, reveals all the information

UTEPComputer Science Dept.18 Past Research Denning: “Cryptography and data security”, 1982 Sweeney: Ph.D. thesis, Applications to medical data, 1996 A few more stray results, topics becoming popular again in “privacy preserving data mining”.