Versatile Publishing For Privacy Preservation

Slides:



Advertisements
Similar presentations
Fourth normal form: 4NF 1. 2 Normal forms desirable forms for relations in DB design eliminate redundancies avoid update anomalies enforce integrity constraints.
Advertisements

Chapter 16: Relational Database Design and Further Dependencies
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Relational Database Design Algorithms by Pinar Senkul resources: mostly froom Elmasri, Navathe.
Chapter 8 Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Third Normal Form (3NF) Zaki Malik October 23, 2008.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Privacy-preserving data publishing
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Chapter 16: Relational Database Design and Further Dependencies
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Relational Database Schema Designer Using Bernstein’s Algorithm
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Advanced Normalization
Chapter 15 Relational Design Algorithms and Further Dependencies
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Side-Channel Attack on Encrypted Traffic
Database Management Systems (CS 564)
Normalization Functional Dependencies Presented by: Dr. Samir Tartir
Relational Database Design by Dr. S. Sridhar, Ph. D
Chapter 8: Relational Database Design
Advanced Normalization
Normalization Introduction & 1NF Presented by: Dr. Samir Tartir
Probabilistic Data Management
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Fourth normal form: 4NF.
CPSC-310 Database Systems
Module 5: Overview of Normalization
Normalization Boyce-Codd Normal Form Presented by: Dr. Samir Tartir
Functional Dependencies and Normalization
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Outline: Normalization
Some slides are from Dr. Sara Cohen
CS 405G: Introduction to Database Systems
Presented by : SaiVenkatanikhil Nimmagadda
TELE3119: Trusted Networks Week 4
Relational Database Theory
Multivalued Dependencies
Lecture 6: Functional Dependencies
Chapter 3: Multivalued Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Refined privacy models
Presentation transcript:

Versatile Publishing For Privacy Preservation Xin Jin, Mingyang Zhang, Nan Zhang George Washington University Gautam Das University of Texas at Arlington

Outline Introduction Inference For Multiple Privacy Rules Guardian Normal Form GD and UAD Algorithms Experimental Results Conclusion

Privacy Preserving Data Publishing QI SA, i.e., an adversary knowing QI cannot infer the SA of a tuple (beyond a privacy guarantee). A privacy guarantee example: l–diversity Quasi-identifier (QI) Sensitive Attribute (SA) Age Gender Disease Allen [30-80] * HIV Bob diabetes Calvin [35-55] F David flu Eve [20-40] M drug Grace Give an example of 2-diversity in this particular example. Generally, protect privacy for any individual. 2 – diversity Published Table

A Sneak Peek at Real Application The Texas Department of State Health Services publishes every year a table of all patients discharged from more than 450 state-licensed hospitals. www. Dshs.state.tx.us/thcic/Hospitals/HospitalsData. shtm Defines 9 privacy requirements. Example: If a hospital has fewer than five discharges of a particular gender, then suppress the zipcode of its patients of that gender. Race is changed to ‘Other’ and ethnicity is suppressed if a hospital has fewer than ten discharges of a race. The entire zipcode and gender code are suppressed if the ICD code indicates alcohol or drug use or an HIV diagnosis. … Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

Texas Inpatient Discharge Data Example: If a hospital has fewer than five discharges of a particular gender, then suppress the zipcode of its patients of that gender. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. hospital, gender zipcode

Multiple SA Publishing [MKGV06] defines multiple SA attributes Treats Si as the sole SA attribute and {Q1, Q2, …, Qm, S1, …, Si-1, Si+1, …, Sn} is treated as QI. Lack of flexibility: provides stronger privacy definition than necessary. age, ICD, state, gender race age, ICD, hospital, race state SA: race and state Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

A Novel Problem: Versatile Publishing Allows the privacy requirement of publishing a table to be defined as an arbitrary set of privacy rules. Each rule: {Q1, Q2, …, Qp} {S1, S2, …, Sr} LHS attributes RHS attributes Assures that an adversary learning the LHS attributes cannot learn the RHS attributes beyond a pre-defined privacy guarantee such as l-diversity, t-closeness, etc.. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

A Running Example hospital age gender ICD state race A 37 F HIV TX asian 71 M diabetes MN white B 55 CA black flu VA C 23 drug Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. Rule #1: age, ICD race Rule #2: gender, ICD state Rule #3: hospital, race state Privacy guarantee: 2-diversity

Simple Solution #1: Straight Decomposition age, ICD race gender, ICD state hospital, race state age ICD 37 HIV 23 drug flu 55 diabetes 71 race asian white black gender ICD F HIV M drug flu diabetes state TX MN VA CA hospital race A asian B black white C state TX CA MN VA join Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. Asian is linked with TX or MN Asian is linked with TX or CA Intersection Attack [GKS08] asian TX, violating hospital, race state

Multiple SA Publishing Method Defines as SA all attributes that appear on the RHS of at least one privacy rule, and QI as the set of all other attributes. Rule #1: age, ICD race Rule #2: gender, ICD state Rule #3: hospital, race state 2 SA: race, state 4 SA: ICD, state, race,hospital Curse of dimensionality Rule #4: hospital, age ICD Rule #5: gender, race hospital Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

Traditional Data Normalization Step 1: Obtain irreducible functional dependencies (FD). Step 2: Test whether there is any FD violates the normal form over the large table. Step 3: Decompose the table to remove the violation if there is any.

Inference For Multiple Rules Inference on multiple privacy rules. Example: AB C implies that A C and B C Completeness of Inference Rules

Guardian Normal Form (GNF) Non-triviality: a privacy rule satisfied by two anonymized table might be broken by the combination of these two, due to intersection attack. Guardian Normal Form (GNF): a normal form for the schema of published tables which guarantees that all privacy rules are guaranteed over the collection of published tables. GNF is defined at the schema-level of published tables rather than tuple-level. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

age, hospital, gender race no privacy rule enforced An Example ICD, gender hospital hospital state age, hospital, gender race no privacy rule enforced hospital state age race Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. gender ICD Rule #1: age, ICD race

age, hospital, gender race no privacy rule enforced An Example ICD, gender hospital hospital state age, hospital, gender race no privacy rule enforced hospital state race is unreachable from age or ICD age race gender ICD Rule #1: age, ICD race

age, hospital, gender race no privacy rule enforced An Example ICD, gender hospital hospital state age, hospital, gender race no privacy rule enforced hospital state state is reachable from either gender or ICD age race Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. gender ICD Rule #2: gender, ICD state

Guardian Decomposition Algorithm Similar in spirit to the database normalization algorithm [EN03] (decomposition into BCNF) Find a privacy rule which violates GNF, decompose the existing sub-tables to address the privacy rule, and continue until no more offending privacy rule exists. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group. Greedily add attributes if GNF remains   End: no further decomposition, publish T11 and T12

Utility Aware Decomposition Algorithm Leverage the link between utility optimization and as the MIN-VERTEX-COLORING problem.

Experimental Results Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

Conclusion Defined novel problem of versatile publishing which captures the real-world requirement of multiple privacy rules. Derived the sound and complete set of inference axioms for privacy rules. Defined guardian normal form (GNF). Developed two decomposition algorithms GD and UAD and conducted comprehensive experiments. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

Reference [1] Texas Department of State Health Services, User manual of texas hospital inpatient discharge public use data file, 2008 [2] A. Machanavajjhala, D. Kifer, J. Gehrke and M. Vekitasubramaniam. l-diversity: Privacy beyond k-anonymization, in ICDE, 2006. [3] S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxililary information in data privacy, in KDD 2008 [3] R. Elmasri and S.B. Navathe. Fundamentals of Database Systems. (4th Edition), Addison Wesley, 2003. Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

Thank You Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.

age, hospital, gender race no privacy rule enforced ICD, gender hospital hospital state age, hospital, gender race no privacy rule enforced

Inference For Multiple Rules   Introduce utility first. All the published table wants to optimize the utility. Focus on Eve’s group.