Presentation is loading. Please wait.

Presentation is loading. Please wait.

P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan.

Similar presentations


Presentation on theme: "P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan."— Presentation transcript:

1 P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan

2 D ATA M INING The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules 2

3 F INDING P ATTERNS 3

4 F INDING R ULES 4

5 D ATA M INING VS. P RIVACY The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules Can we develop accurate models without access to precise information in individual data records? Why? 5

6 P RIVACY I SSUES IN D ATA 6

7 7

8 8

9 D ATA M INING AND P RIVACY The primary task in data mining: development of models about aggregated data. Can we develop accurate models without access to precise information in individual data records? Answer: yes, by randomization. R. Agrawal, R. Srikant “ Privacy Preserving Data Mining, ” SIGMOD 2000 How about the data utility? 9

10 D ATA M INING VS. S OCIAL N ETWORKS 10 Attributes: Name, Salary, … Links: Friends, Neighborhood, … Communities: Interests, Activities, … Attributes: Name, Salary, … Links: Friends, Neighborhood, … Communities: Interests, Activities, …

11 P RIVACY I SSUES ON S OCIAL N ETWORKS Personal information leaked, even if the vertex identifies are hidden… 11 Many information can be used to re-associate the vertex with its identity. Vertex degree : k-degree anonymity, … Neighborhood configuration : k-neighborhood anonymity, k-automorphism anonymity, k-isomorphism anonymity, grouping-and- collapsing, … Many information can be used to re-associate the vertex with its identity. Vertex degree : k-degree anonymity, … Neighborhood configuration : k-neighborhood anonymity, k-automorphism anonymity, k-isomorphism anonymity, grouping-and- collapsing, …

12 P RIVACY C ONCERNS I N D ATA S HARING Personal information leaked, even if the vertex identifies are hidden… E.g., Friendship attacks E.g., Community Identification 12 C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural diversity for privacy in publishing social networks,” In SDM, 2011. C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, " Privacy- preserving Social Network Publication Against Friendship Attacks," In KDD, 2011. C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities Anonymization in Dynamic Social Networks," In ICDM-11, 2011.

13 13 C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, " Privacy- preserving Social Network Publication Against Friendship Attacks," In KDD, 2011.

14 F RIENDSHIP A TTACK Still there are another type of information for vertex re-identification – friendship attack 14

15 F RIENDSHIP A TTACK Given a target individual A and the degree pair information D 2 = (d 1,d 2 ), a friendship attack (D 2,A) exploits D 2 to identify a vertex v 1 corresponding to A in a published social network where v 1 connects to another vertex v 2 with the degree pair (d v 1,d v 2 ) = (d 1,d 2 ). 15 10 123 4 5 Alice 67 9 8 Ex 1. Assume that an attacker knows that Alice has 3 connections, Bob has 2 connections, and Alice and Bob are friends. The attacker identifies v 9 as Alice with 100% confidence.

16 F RIENDSHIP A TTACK 16 In DBLP data set, the percentages of vertices that can be re-identified with a probability larger than 1/ k by degree and friendship attacks. Original Social Network k -degree anonymized Social Network k Degree Attack Friendship Attack 50.28%5.37%2.89% 100.53%10.69%4.65% 150.73%14.71%5.82% 200.93%18.44%7.23%

17 N EW P RIVACY M ODEL A GAINST F RIENDSHIP A TTACK k 2 -degree Anonymity A social network is k 2 -degree anonymous if, for every vertex with an incident edge of degree pair (d 1,d 2 ), there exist at least k − 1 other vertices, each of which also has an incident edge of the same degree pair. 17 10 123 4 5 67 9 8 Ex 2. Even with the knowledge (D 2,A)=((3,2),Alice), the probability that an attacker can re-identify Alice in the 2 2 -degree anonymous social network is limited to ½.

18 T HE A NONYMIZATION Problem formulation: Given a graph G(V, E) and an integer k, the problem is to anonymize G to satisfy k 2 -degree anonymity such that information distortion is minimized. The challenges: Any alteration on an edge will affect the degrees of two vertices. 18

19 G RAPH A NONYMIZATION A LGORITHMS Integer Programming formulation Obtain the optimal solution with bad scalability DEgree SEqence ANonymization (DESEAN) Step1. Degree Sequence Anonymization. - determine the groups of vertices protecting each others Step2. Privacy Constraint Satisfaction. - eliminate the advantage of knowing friendship information Step3. Anonymous Degree Realization. - have the vertices in the same group share the same vertex degree 19

20 Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a target degree d x for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σ vx (d x - d v ) + (1 − ω) Σ vx (d v - d x ) is as small as possible. A LGORITHM DESEAN 20 10 123 4 5 67 9 8 Ex 3. Given k = 2 and ω = 0.5.

21 A LGORITHM DESEAN Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a target degree d x for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σ vx (d x - d v ) + (1 − ω) Σ vx (d v - d x ) is as small as possible. 21 10 123 4 5 67 9 8 123 4 5 67 9 8 Ex 3. Given k = 2 and ω = 0.5.

22 Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k. A LGORITHM DESEAN 22 10 123 4 5 67 9 8 Ex 3. Given k = 2 and ω = 0.5.

23 Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k. A LGORITHM DESEAN 23 10 123 4 5 67 9 8 123 4 5 67 9 8 Ex 3. Given k = 2 and ω = 0.5.

24 Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each cluster x meet the target degree d x selected in Step 1. 10 123 4 5 67 9 8 A LGORITHM DESEAN 24 Ex 3. Given k = 2 and ω = 0.5.

25 Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each cluster x meet the target degree d x selected in Step 1. 10 123 4 5 67 9 8 123 4 5 67 9 8 A LGORITHM DESEAN 25 Ex 3. Given k = 2 and ω = 0.5.

26 26 C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural diversity for privacy in publishing social networks,” In SDM, 2011.

27 C OMMUNITY I DENTIFICATION Vertex identification is considered to be an important privacy issue in publishing social networks. ◦ k-degree anonymity, k-neighborhood anonymity, … In addition to a vertex identity, each individual is also associated with a community identity. ◦ Could be used to infer the political party affiliation or disease information sensitive to the public. ◦ Is a kind of structural information 27

28 C OMMUNITY I DENTIFICATION Community information is explicitly given: Ex. Alice knows recently… Bob is sick Bob participates in this social network Bob makes 5 friends. (vertex degree attack)  Bob has AIDS! AIDS Com.SLE Com. 28

29 C OMMUNITY I DENTIFICATION Community information is not given: Ex. Alice knows Bob participates in this social network and has 5 friends. (vertex degree attack)  Alice can know the approximation of Bob’s neighborhood. 29

30 C OMMUNITY I DENTIFICATION % of vertices violating k-structural diversity ◦ (a) DBLP ◦ (b) ca-CondMat k-degree anonymization is insufficient 30

31 C OMMUNITY I DENTIFICATION structural diversity ◦ (a) original DBLP ◦ (b) 10-degree anonymized DBLP Vertices with large degrees appear in a small set of communities 31

32 N EW P RIVACY M ODEL A GAINST C OMMUNITY I DENTIFICATION k-Structural Diversity To protect against vertex degree attack, for each vertex, there should be other vertices with the same degree located in at least k-1 other communities. If a graph satisfies k-structural diversity, then it also satisfies k-degree anonymity. 32

33 T HE A NONYMIZATION Problem formulation: Given a graph G(V, E, C) and an integer k, 1 ≦ k ≦ |C|, the problem is to anonymize G to satisfy k-structural diversity such that information distortion is minimized. The challenges: How to preserve community structures, even in the implicit cases, while preserving privacy. 33

34 P ROBLEM F ORMULATION Operation Adding Edge ◦ Connect two vertices belonging to the same community. ◦ Can avoid destroying the communities. 34

35 P ROCEDURE M ERGENCE ◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d Com. 1 v Com. 2 Com. 3 35

36 P ROCEDURE M ERGENCE ◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d Com. 1 v Com. 2 Com. 3 36

37 P ROCEDURE C REATION To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v Com. 1 Com. 2Com. 3 v 37

38 P ROCEDURE C REATION To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v Com. 1 Com. 2Com. 3 v 38

39 T HE E DGE -R EDITECTION MECHANISM ◦ Is defined on w, v, x in the same community  w: an anonymized vertex  v and x : two not-yet-anonymized vertices ◦ Is to replace the edge (w, v) with the edge (w, x) v w v w x 39

40 T HE E DGE -R EDITECTION MECHANISM By mergence v w Com. 1 Com. 2 Com. 3 Q. Why needs the Edge-Reditection mechanism? By mergence v w Com. 1 Com. 2 Com. 3 40

41 T HE E DGE -R EDITECTION MECHANISM By creation Com. 1 Com. 2Com. 3 v w Q. Why needs the Edge-Reditection mechanism? By creation Com. 1 Com. 2Com. 3 v w 41

42 A LGORITHM E DGE C ONNECT (EC) Procedures Mergence and Creation ◦ Let R v be the set of edges that could be redirected away from v The Edge-Reditection mechanism 42

43 P ROBLEM F ORMULATION Operation Adding Edge ◦ Connect two vertices belonging to the same community. ◦ Can avoid destroying the communities. Operation Splitting Vertex ◦ Replace a vertex v with a set of substitute vertices, such that each substitute vertex is connected with at least one edge incident to v originally. ◦ Each substitute vertex presents partial truth of the vertex v. 43

44 P ROCEDURE C REATION B Y S PLIT Split a set of vertices, including v, to create a new anonymous group Com. 1 Com. 2 Com. 3 v Com. 1 Com. 2 Com. 3 v1 v2 44

45 P ROCEDURE M ERGE B Y S PLIT Split v into a set of substitute vertices s.t. each substitute vertex is protected in some existing anonymous group Com. 1 Com. 2 Com. 3 v v1 v2 v3 45

46 46 C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities Anonymization in Dynamic Social Networks," In ICDM-11, 2011.

47 T HE P ROBLEM IN D YNAMIC S CENARIOS … A dynamic social network will be sequentially released. An attacker can monitor a victim for a period w. Therefore, the adversary knowledge includes: The releases G t-w+1, G t-w+2, …, G t during w A degree sequence Δ v w =( d v t- w+1, d v t-w+2, …, d v t ) of a victim v during w 47 G2G2 G1G1 Ex. John has two friends at time 1, and three friends at time 2.

48 P RIVACY MODEL : K W - STRUCTURAL DIVERSITY ANONYMITY Base case of w=1 A group θ d, consisting of all vertices of degree d, is a k-shielding group if there is a vertex subset θ ⊆ θ d s. t. (1) | θ | ≥ k, and (2) any two vertices u and v in θ, C v ∩ C v = ø, where C is the community identity. 48 The adversary knowledge includes: 1.The release social network G t 2.A degree sequence Δ v 1 =( d v t ) of a victim v The adversary knowledge includes: 1.The release social network G t 2.A degree sequence Δ v 1 =( d v t ) of a victim v Ex. Mary has four friends. ??? G

49 P RIVACY MODEL : K W - STRUCTURAL DIVERSITY ANONYMITY Dynamic scenarios of w>1 A consistent group Θ Δ is the set of vertices that always share the same degree during w. A consistent group Θ Δ is a k-shielding if at each time instant t in w, there is a vertex subset Θ t ⊆ Θ Δ s. t. (1) | Θ t | ≥ k, and (2) any two vertices u and v in Θ t, C v t ∩ C v t = ø, where C t is the community identity at time t. 49 The adversary knowledge of includes: 1.The releases G t- w+1, G t-w+2, …, G t during w 2.A degree sequence Δ v w =( d v t-w+1, d v t- w+2, …, d v t ) of a victim v during w The adversary knowledge of includes: 1.The releases G t- w+1, G t-w+2, …, G t during w 2.A degree sequence Δ v w =( d v t-w+1, d v t- w+2, …, d v t ) of a victim v during w …

50 T HE A NONYMIZATION Problem formulation: Suppose that every vertex in a series of sequential releases G t-w+1, G t-w+2, …, G t-1 is protected. Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the current social network G t s. t. every vertex is protected in a k -shielding consistent group. The challenges: The anonymization is depended on not only the current social network but also previous w -1 releases. Searching through all the w-1 releases to eliminate privacy leak is time consuming. 50

51 A NONYMIZATION A LGORITHM Construct CS (Clustering Sequence) -Table to prevent the search through w graphs. CS-Table Summary the vertex information in w-1 previous releases.  Fetch v ’ info. in w graphs without scanning the graphs. According to the degree sequence during w, sort the vertices in hierarchical clustering manner.  Vertices in the same k-consistent shielding group are close in CS-Table.  CS-Table can be incrementally updated. According to the vertex ranking in CS-Table, anonymize each vertex to be protected. 51

52 T HANK YOU ~!


Download ppt "P RIVACY -P RESERVING R ELEASES OF S OCIAL N ETWORKS Chih-Hua Tai Dept. of CSIE, National Taipei University, New Taipei City, Taiwan."

Similar presentations


Ads by Google