2. Attacks on Anonymized Social Networks. Setting A social network Edges may be private –E.g., “communication graph” The study of social structure by.

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Iterative Rounding and Iterative Relaxation
The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
WSPD Applications.
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
A Model of Computation for MapReduce
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
NON-MALLEABLE EXTRACTORS AND SYMMETRIC KEY CRYPTOGRAPHY FROM WEAK SECRETS Yevgeniy Dodis and Daniel Wichs (NYU) STOC 2009.
Lecture 7 CS 728 Searchable Networks. Errata: Differences between Copying and Preferential Attachment In generative model: let p k be fraction of nodes.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
On the Crossing Spanning Tree Vineet Goyal Joint work with Vittorio Bilo, R. Ravi and Mohit Singh.
1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Privacy in Social Networks:
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
May 7 th, 2006 On the distribution of edges in random regular graphs Sonny Ben-Shimon and Michael Krivelevich.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Decomposing Networks and Polya Urns with the Power of Choice Joint work with Christos Amanatidis, Richard Karp, Christos Papadimitriou, Martha Sideri Presented.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
CS548 Advanced Information Security Presented by Gowun Jeong Mar. 9, 2010.
CS 3343: Analysis of Algorithms Lecture 21: Introduction to Graphs.
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Protecting Sensitive Labels in Social Network Data Anonymization.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
GRAPHS THEROY. 2 –Graphs Graph basics and definitions Vertices/nodes, edges, adjacency, incidence Degree, in-degree, out-degree Subgraphs, unions, isomorphism.
Testing the independence number of hypergraphs
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Anonymized Social Networks, Hidden Patterns, and Structural Stenography Lars Backstrom, Cynthia Dwork, Jon Kleinberg WWW 2007 – Best Paper.
1 Decomposition into bipartite graphs with minimum degree 1. Raphael Yuster.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Proof of correctness of Dijkstra’s algorithm: Basically, we need to prove two claims. (1)Let S be the set of vertices for which the shortest path from.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Approximating the MST Weight in Sublinear Time
June 2017 High Density Clusters.
COMP 6/4030 ALGORITHMS Prim’s Theorem 10/26/2000.
Differential Privacy in Practice
Enumerating Distances Using Spanners of Bounded Degree
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
On the effect of randomness on planted 3-coloring models
Compact routing schemes with improved stretch
Lecture 12 CSE 331 Sep 22, 2014.
Switching Lemmas and Proof Complexity
Locality In Distributed Graph Algorithms
Planting trees in random graphs (and finding them back)
Minimum Spanning Trees
Presentation transcript:

2. Attacks on Anonymized Social Networks

Setting A social network Edges may be private –E.g., “communication graph” The study of social structure by social networks –E.g., the small world phenomenon –Requires data Common practice – anonymization –“A rose by any other word would smell as sweet” –An anonymized network has same connectivity, clusterability,etc. V2783! R3579X Y5873T D2893L FGH389 OE &V H#928! 928&23 I378FG

Main Contribution Raising a privacy concern –Data is never released in the void Proving the concern by presenting attacks  One cannot rely on anonymization Thus, highlighting the need for mathematical rigor –(But isn’t DP + calibrated noise mechanism rigorous enough?) DB

Key Idea Goal: Given a single anonymized network, de- anonymize 2 nodes and learn if connected What is the challenge? –Compare to breaking anonymity of Netflix What special kind of auxiliary data can be used? –Hint: Active attacks in Cryptography Solution –“Steganography”

Outline Attacks on anonymized networks – high level description The “Walk-Based” active attack –Description –Analysis –Experiments Passive attack

Kinds of Attacks Active attack Passive attack Hybrid attack

Active Attacks - Challenges Let G be the network, H the subgraph With high probability, H must be: Uniquely identifiable in G –For any G Efficiently locatable –Tractable instance of subgraph isomorphism But undetectable –From the point of view of the data curator

Active Attacks - Approaches Basic idea: H is randomly generated –Start with k nodes, add edges independently at random Two variants: –k = Θ(logn) de-anonymizes Θ(log 2 n) users –k = Θ(√logn) de-anonymizes Θ(√ logn) users H needs to be “more unique” Achieved by “thin” attachment of H to G The “Walk-based” attack – better in practice The “Cut-based” attack – matches theoretical bound

Outline Attacks on anonymized networks – high level description The Walk-Based active attack –Description –Analysis –Experiments Passive attack

The Walk-Based Attack – Simplified Version Construction: –Pick target users W = {w 1,…,w k } –Create new users X = {x 1,…,x k } and random subgraph G[X] = H –Add edges (x i, w i ) Recovery –Find H in G ↔ No subgraph of G isomorphic to H –Label H as x 1,…,x k ↔ No automorphisms –Find w 1,…,w k W1W1 X2X2 W2W2 X1X1

The Walk-Based Attack – Full Version Construction: –Pick target users W = {w 1,…,w b } –Create new users X = {x 1,…,x k } and H –Connect w i to a unique subset N i of X –Between H and G – H Add Δ i edges from x i where d 0 ≤ Δ i ≤ d 1 =O(logn) –Inside H, add edges (x i, x i+1 ) To help find H X1X1 X2X2 X3X3

(2+δ)lognO(log 2 n) w1w1 w2w2 w4w4 w3w3 x1x1 x2x2 x3x3 N1N1 Δ3Δ3 Total degree of x i is Δ' i G Construction of H

Recovering H Search G based on: –Degrees Δ' i –Internal structure of H α1α1 αlαl Search tree T G root f (α 1 )f (α l ) v β

Analysis Theorem 1 [Correctness]: With high probability, H is unique in G. Formally: –H is a random subgraph –G is arbitrary –Edges between H and G – H are arbitrary –There are edges (x i, x i+1 )  Then WHP no subgraph of G is isomorphic to H. Theorem 2 [Efficiency]: Search tree T does not grow too large. Formally: –For every ε, WHP the size of T is O(n 1+ε )

Theorem 1 [Correctness] H is unique in G. Two cases: –For no disjoint subset S, G[S] isomorphic to H –For no overlapping S, G[S] isomorphic to H Case 1: –S = nodes in G – H –ε S – the event that s i ↔ x i is an isomorphism – –By Union Bound,

Theorem 1 continued Case 2: S and X overlap. Observation – H does no have much internal symmetry Claim (a): WHP, there are no disjoint isomorphic subgraphs of size c 1 logk in H. Assume this from now on. Claim (b): Most of A goes to B, most of Y is fixed under f (except c 1 logk nodes) (except c 2 logk nodes) G X B Y A B Y Y A f

Theorem 1 - Proof What is the probability of an overlapping second copy of H in G? f ABCD : AUY → BUY = X Let j = |A| = |B| = |C| ε ABCD – the event that f ABCD is an isomorphism #random edges inside C ≥ j(j-1)/2 – (j-1) #random edges between C and Y' ≥ (|Y'|)j – 2j Probability that the random edges match those of A Pr[ε ABCD ] ≤ 2 #random edges X A D Y' B C A B,C D

Theorem 2 [Efficiency] Claim: Size of search tree T is near-linear. Proof uses similar methods: –Define random variables: #nodes in T = Γ Γ = Γ' + Γ'' = #paths in G – H + #paths passing in H –This time we bound E(Γ') [and similarly E(Γ'')] –Number of paths of length j with max degree d 1 is bounded –Probability of such a path to have correct internal structure is bounded  E(Γ') ≤ (#paths * Pr[correct internal struct])

Experiments Data: Network of friends on LiveJournal –4.4∙10 6 nodes, 77∙10 6 edges Uniqueness: With 7 nodes, an average of 70 nodes can be de-anonymized –Although log(4.4∙10 6 ) ≈ 15 Efficiency: |T| is typically ~9∙10 4 Detectability: –Only 7 nodes –Many subgraphs of 7 nodes in G are dense and well-connected

Probability that H is Unique

Outline Attacks on anonymized networks – high level description The Walk-Based active attack –Description –Analysis –Experiments Passive attack

Passive Attack H is a coalition, recovered by same search algorithm Nothing guaranteed, but works in practice

Summary & Open Questions One cannot rely on anonymization of social networks Major open problem – what (if anything) can be done in the non-interactive model? –Released object must answer many questions accurately while preserving privacy –Noise must increase with number of questions [DN03] Novel models

Any Questions? Thank you

Passive Attack - Results