Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Motivation Privacy Preserving Social Network Publishing node-anonymization cannot guarantee identity/link privacy due to subgraph queries. Backstrom et al. WWW07, Hay et al. UMass TR07 edge randomization Random Add/Del Random Switch K-anonymity Hay et al. VLDB08, Liu&Terzi SIGMOD08, Zhou&Pei ICDE08 Utility preserving randomization Spectral feature preserving Ying&Wu SDM08 Real space feature preserving Ying&Wu SDM09 2
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Problem Formalization 3 Prior beliefvs. Posterior belief Ying&Wu SDM08 similarity measure value between node i and j This paper Add k then del k edges
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Network of US political books (105 nodes, 441 edges, r=8%) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative". 4 Polbooks network
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Proportion of true edges vs. similarity 5 After randomly add/delete 200 edges (totally 441 edges)
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Similarity measures vs. Link prediction Similarity measures The number of common neighbors Adamic/Adar, the weighted number of common neighbors Katz, a weighted sum of the number of paths connecting two nodes Commute time, the expected steps of random walks from node i to j and back to i. Similarity measures have been exploited in the classic link prediction problem. Liben-Nowell&Kleinberg CIKM03 6
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Proportion of true edges vs. similarity 7 After randomly add/delete 200 edges (totally 441 edges)
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Calculating Posterior belief 8 The attacker does not know this value, what he can do? Applying Bayes theorem
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks MLE estimation Estimate based on randomized graph 9 Posterior belief can be calculated by attackers
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Comparison 10
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Comparison 11
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Empirical Evaluation Attacker’s Prediction Strategy Calculate posterior probability of all node pairs Choose top t node pairs (with highest post. Prob.) as predicted candidate links 12 For each t, the precision of predictions (k=0.5m)
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Empirical Evaluation 13 The posteriori beliefs with similarity measures achieve higher precision than that without exploiting similarity measures. One measure that is best for one data is not necessarily best for another data.
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Determining k to guarantee privacy 14 Data Owner
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Conclusion & Future Work We have shown that node proximity measures can be exploited by attackers to breach link privacy in edge add/del randomized networks 15 How about other topological properties? How about other randomization strategies? Privacy vs. utility tradeoff
Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation IIS and CNS Thank You! 16
PAKDD-09, April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks Graph space : {G: with the given degree seq. & } Examining proportion of sample graphs with existence a link between node i and j Ying&Wu,SDM09 17 Utility preserving randomization Attacker’s confidence on link (i,j)