Jeremiah Blocki CMU Ryan Williams IBM Almaden ICALP 2010.

Slides:



Advertisements
Similar presentations
Weighted Matching-Algorithms, Hamiltonian Cycles and TSP
Advertisements

Inapproximability of Hypergraph Vertex-Cover. A k-uniform hypergraph H= : V – a set of vertices E - a collection of k-element subsets of V Example: k=3.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.
Minimum Vertex Cover in Rectangle Graphs
IMIM v v v v v v v v v DEFINITION L v 11 v 2 1 v 31 v 12 v 2 2 v 32.
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Approximation Algorithms
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Introduction to Algorithms
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
The number of edge-disjoint transitive triples in a tournament.
Linear Programming.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
An Zhu Towards Achieving Anonymity. Introduction  Collect and analyze personal data Infer trends and patterns  Making the personal data “public” Joining.
Approximation Algorithms
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tomás Feder, Krishnaram Kenthapadi, Rajeev Motwani,
Outline Introduction The hardness result The approximation algorithm.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
NP-Completeness: 3D Matching
April 10, 2002Applied Discrete Mathematics Week 10: Relations 1 Counting Relations Example: How many different reflexive relations can be defined on a.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Restricted Track Assignment with Applications 報告人:林添進.
5.2 Trees  A tree is a connected graph without any cycles.
On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
Variations of the Prize- Collecting Steiner Tree Problem Olena Chapovska and Abraham P. Punnen Networks 2006 Reporter: Cheng-Chung Li 2006/08/28.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
The NP class. NP-completeness
University of Texas at El Paso
Mathematical Foundations of AI
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Richard Anderson Lecture 26 NP-Completeness
Decision trees Polynomial-Time
The minimum cost flow problem
CS4234 Optimiz(s)ation Algorithms
Approximation Algorithms
Chapter 5. Optimal Matchings
Computability and Complexity
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Chapter 4 Linear Programming: The Simplex Method
Coverage Approximation Algorithms
Richard Anderson Lecture 28 Coping with NP-Completeness
Linear Programming Duality, Reductions, and Bipartite Matching
Fair Clustering through Fairlets ( NIPS 2017)
Problem Solving 4.
5.4 T-joins and Postman Problems
Presented by : SaiVenkatanikhil Nimmagadda
Backtracking and Branch-and-Bound
TELE3119: Trusted Networks Week 4
Approximation Algorithms for k-Anonymity
the k-cut problem better approximate and exact algorithms
Clustering.
Approximation Algorithms
Presentation transcript:

Jeremiah Blocki CMU Ryan Williams IBM Almaden ICALP 2010

 Goal: Release a database that is both ◦ Useful to benign researchers who wish to study trends ◦ Useless to a malicious part who wishes to compromise the privacy of individuals  Attempt 1: Remove identifying information (name, ssn, etc…) and hope that adversary can’t identify any of the patients in the dataset.

Voter Registration Dataset (Publicly Available) DOB Zip Code Gender Medical History of Patient Name SSN Name Political Affiliation Medical Dataset The tuple (DOB, Zip Code, Gender) uniquely identifies many people [Sweeney,2002] We can leave out more information, like DOB, but how can we be sure we have done enough to prevent similar attacks?

 K-Anonymity ◦ Sweeney (2002) ◦ Samarati (2001)  L-Diversity ◦ Machanavajjhala, et al. (2007)  Thm: 2-Diversity is NP-Hard with binary attributes and three sensitive attributes  Thm: 3-Diversity is NP-Hard with binary attributes and only one sensitive ternary attribute  Differential Privacy ◦ Dwork (2006)

 A database is a table with n rows (records) and m columns (attributes).  The alphabet of a database (Σ) is the range of values that individual cells in the database can take.  A database is said to be k-anonymous if for every row r there are at least k-1 other identical rows  To achieve k-anonymity we may replace cells with a special symbol * (denotes suppression)

FirstLastAgeRace HarryStone34Afr-Am JohnReyner36Cauc BeatriceStone34Afr-Am JohnDelgado22Hisp

FirstLastAgeRace **** **** **** **** Goal: Minimize the number of stars introduced to make the database k- anonymous

FirstLastAgeRace HarryStone34Afr-Am JohnReyner36Cauc BeatriceStone34Afr-Am JohnDelgado22Hisp Goal: Minimize the number of stars introduced to make the database k- anonymous

FirstLastAgeRace HarryStone34Afr-Am JohnReyner36Cauc BeatriceStone34Afr-Am JohnDelgado22Hisp Goal: Minimize the number of stars introduced to make the database k- anonymous

FirstLastAgeRace * Stone34Afr-Am John *** * Stone34Afr-Am John *** Goal: Minimize the number of stars introduced to make the database k- anonymous

 3-Anonymity is NP-Hard for |Σ| = O(n) [Myerson & Williams, 2004]  3-Anonymity is NP-Hard for |Σ| = 3 [Aggarwal, et al, 2005]  3-Anonymity is APX-Hard for |Σ| = 2 [Bonizzoni, et al, 2007]  4-Anonymity is APX-Hard when m = O(1) [Bonizzoni, et al, 2009]  7-Anonymity is MAX SNP-Hard when m = 3 [Chakaravarthy, et al, 2010]

 2-Anomyity is in P  3-Anomyity is MAX SNP-hard with m = 27  K-Anonymity can be solved in time  In particular, this is efficient whenever ◦ m ≤ (log log n)/log |Σ| and ◦ |Σ| ≤ log n

 Partition the rows into groups such that the size of each group is at least two.  The cost is the number of stars necessary to anonymize each group.  This is not reducible to weighted matching on simple graphs. ◦ Sometimes it is better to use groups of size 3

***** *****

 Given a hypergraph H with hyperedges of size 2 and 3, and a cost function C(e).  Find the minimum cost node partition into hyperedges.  Simplex Matching Constraints: 1.(u,v,w) 2 E(H) ! (u,v),(v,w),(u,w) 2 E(H) 2.C(u,v) + C(u,w) + C(v,w) · 2 C(u,v,w)  Simplex Matching is in P [AK,2007]

1. Add a vertex corresponding to each row 2. Add a hyperedge for every pair (or triple) of rows 3. The cost of an edge is the cost of anonymizing these rows.  Now we just need to verify that the simplex conditions apply… ◦ Condition 1 applies trivially because H contains every possible hyperedge. ◦ (u,v,w) 2 E(H) ! (u,v),(v,w),(u,w) 2 E(H)

 Fact: Given a anonymized group, adding another row to the group can only increase the number of stars per row. First NameLast NameAgeGender JohnSmith24Male JaneSmith24Female

 Fact: Given a anonymized group, adding another row to the group can only increase the number of stars per row. First NameLast NameAgeGender *Smith24* *Smith24*

 Fact: Given a anonymized group, adding another row to the group can only increase the number of stars per row. First NameLast NameAgeGender *Smith24* *Smith24* BillJohnson24Male

 Fact: Given a anonymized group, adding another row to the group can only increase the number of stars per row. First NameLast NameAgeGender * * 24* * * * * * *

 By our fact, for all rows i,j,k:  By Symmetry  Adding inequalities

 Theorem: 3-Anonymity is MAX-SNP Hard with only 27 attributes. ◦ Goal: Maximize the number of non-starred entries in the database  Reduction is from bounded three dimensional matching (MAX 3DM-3)

Thm: Max 3DM-3 is MAX-SNP complete [Kann,1991]

1. Every triple has its own symbol (Σ = M ) 2. For every element r 2 W ∪ X ∪ Y add the corresponding row r 3. Let S,T,U 2M be the 3 triples containing r ◦ If r 2 W then add the row …27 rSTUSTUSTUST…U

1. Every triple has its own symbol (Σ = M ) 2. For every element r 2 W ∪ X ∪ Y add the corresponding row r 3. Let S,T,U 2M be the 3 triples containing r ◦ If r 2 X then add the row …27 rSSSTTTUUUSS…U

1. Every triple has its own symbol (Σ = M ) 2. For every element r 2 W ∪ X ∪ Y add the corresponding row r 3. Let S,T,U 2M be the 3 triples containing r ◦ If r 2 Y then add the row …27 rSSSSSSSSSTT…U

 Suppose M contains the triple t = (w,x,y)  In fact, this is the only way any group of 3 rows can match in one attribute. Row wttttttttt… xttt………………t yt……t……t………

 Suppose M contains the triple t = (w,x,y)  In fact, this is the only way any group of 3 rows can match in one attribute! Row wt********… xt********… yt********…

 We can easily translate any 3-Anonymity solution to a 3DM-3 solution such that the following quantities are equal ◦ The number of non-suppressed entries in our 3- Anonymity solution ◦ The number of elements covered by our 3DM-3 solution

 Fact: There are at most different possible rows.  Idea: Doesn’t the optimal k-anonymity solution have to group identical rows? ◦ If this is the case then we could just automatically group rows with at least k identical copies. ◦ At most rows would remain. ◦ We could anonymize the remaining rows by brute force. ◦ Unfortunately, the answer is no. ◦ But there is a small threshhold,…

 Lemma: If we have at least identical copies of a row r, then the optimal k- anonymity solution must contain a group containing only copies of row r.

 Any k-anonymity instance D can be efficiently reduced to a smaller instance D’ of size at most  Optimal k-anonymity solutions to D’ yield optimal solutions to D  If D’ is small enough, then it can be anonymized by brute force

 2-Anomyity is in P  3-Anomyity is MAX SNP-hard with m = 27  K-Anonymity can be solved in time  In particular, this is efficient whenever ◦ m ≤ (log log n)/log |Σ| and ◦ |Σ| ≤ log n

 Is there a faster 2-Anonymity algorithm than Simplex Matching, which would take ?  3-Anonymity with m < 27?  Can the bound on grouping identical rows be improved?  To what degree can k-anonymity be approximately solved when m is small?  Is there an efficient algorithm for optimal l-diversity when the alphabet and the number of attributes are both small?