Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
How to Schedule a Cascade in an Arbitrary Graph F. Chierchetti, J. Kleinberg, A. Panconesi February 2012 Presented by Emrah Cem 7301 – Advances in Social.
De-anonymizing social networks Arvind Narayanan, Vitaly Shmatikov.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Analysis and Modeling of Social Networks Foudalis Ilias.
Exact Inference in Bayes Nets
SOCELLBOT: A New Botnet Design to Infect Smartphones via Online Social Networking th IEEE Canadian Conference on Electrical and Computer Engineering(CCECE)
Practical Recommendations on Crawling Online Social Networks
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Structural Data De-anonymization: Quantification, Practice, and Implications Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
1 Walking on a Graph with a Magnifying Glass Stratified Sampling via Weighted Random Walks Maciej Kurant Minas Gjoka, Carter T. Butts, Athina Markopoulou.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
HCS Clustering Algorithm
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 Preserving Privacy in Collaborative Filtering through Distributed Aggregation of Offline Profiles The 3rd ACM Conference on Recommender Systems, New.
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
A scalable multilevel algorithm for community structure detection
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
Models of Influence in Online Social Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
Social Network Analysis via Factor Graph Model
Private Analysis of Graphs
University of California at Santa Barbara Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Protecting Sensitive Labels in Social Network Data Anonymization.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
Forward-Scan Sonar Tomographic Reconstruction PHD Filter Multiple Target Tracking Bayesian Multiple Target Tracking in Forward Scan Sonar.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
1 Value of information – SITEX Data analysis Shubha Kadambe (310) Information Sciences Laboratory HRL Labs 3011 Malibu Canyon.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Twitter Community Discovery & Analysis Using Topologies
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
Cohesive Subgraph Computation over Large Graphs
Uncovering the Mystery of Trust in An Online Social Network
Greedy & Heuristic algorithms in Influence Maximization
Location Cloaking for Location Safety Protection of Ad Hoc Networks
SocialMix: Supporting Privacy-aware Trusted Social Networking Services
TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:
Peer-to-Peer and Social Networks Fall 2017
Scaling up Link Prediction with Ensembles
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014

Online Social Networks (OSNs) have revolutionized the way our society communicates Billion 540 million 225 million 187 million Monthly active users 40 million

3 Reference: OSN providers have become treasure troves of information for marketers and researchers

4 Reference: Social Data platforms gather, filter and deliver social data to enterprise-scale companies

5 Also, OSN providers publish their ‘anonymized’ social data for competitions and challenges

6 Several works have shown that this ‘anonymized’ published data can be de-anonymized

7 The Kaggle social network challenge: Link prediction on an anonymized dataset

8 Crawled Flickr and matched users of two public and anonymized Flickr networks [Narayanan and Shmatikov, 2009] Public Flickr NetworkAnonymized Flickr Network

9 De-anonymizing a social network using another public social network Flickr NetworkTwitter Network Alice Bob Carol Eve Rob Joh n Republican Democrat Republican

10 Narayanan and Shmatikov’s (NS) de-anonymization approach 1- Seed identification 2- Propagation Reference NetworkAnonymized Network

Seed identification that randomly samples a subset of k-cliques from the reference graph and finds the corresponding cliques in the other graph. the degree sequence of the k nodes in the given clique and the number of common neighbors between each of C(k,2) pairs of users compares the two sequences and decides based on an error parameter, whether they are the same people or not 11

Propagation 12

13 Network communities provide an effective way to divide-and-conquer the problem

14 Comm-aware vs. Comm-blind

15 Step 1- Community Detection: slicing the network into smaller, dense chunks Reference NetworkAnonymized Network

16 Step 2- Creating graph of communities and mapping communities Reference NetworkAnonymized Network

17 Step 2- Creating graph of communities and mapping communities

18 Step 3- Seed enrichment and local propagation Identifying more seeds using nodes’ degrees and clustering coefficients

19 Step 3- Seed enrichment and local propagation The clustering coefficient is a property of a node in a network and quantifies how close its neighbors are to being a clique

20 Step 4- Global propagation further extends the mapping Reference NetworkAnonymized Network

We tested our approach on real-world datasets 21 Real-world data setNumber of Nodes Number of edges arXiv collaboration network36,458171,735 Twitter mention network 190,332377,588 Twitter mention network 29,74550,164 Used the METIS graph partitioning algorithm to obtain a smaller network

Generating noisy anonymized networks with same set of nodes and different but overlapping set of edges 22 - Noise level: {0.1%, 1%, 5%, 10%, 15%, 20%, 30%, 40%} - Generated an ensemble of 10 networks for each network

Measuring performance using success rate and error rate 23 With 20% edge noise and 16 seeds, the NS maps can barely maps any node while, our approach maps 40% of the nodes

Need to consider information gain: degree of anonymity 24 In practice, the mapping algorithm may still leave several nodes unmapped. For these unmapped nodes, however, the community structure reveals information about the true mapping

25 What is the degree of anonymity for Waldo?

26 Degree of anonymity for Wlado degrades knowing that he loves socks!

27 Calculating degree of anonymity

28 The anonymity for a user u is the entropy over the probability distribution of potential mappings being true for user u: The normalized degree of anonymity for user u: The degree of anonymity for the whole system:

29 Calculating degree of anonymity: Case Comm-blind Comm-aware

30 Community-aware algorithm greatly improves de-anonymization performance under noise With 15% edge noise and 16 seeds, the comm-blind technique reduces anonymity by 2.6 bits, whereas our approach reduces anonymity by bits

Community-aware algorithm is more robust to larger network size and a low number of seeds 31 For the Twitter dataset with 90K nodes, with 10% edge noise and only 4 seeds, the comm-blind technique reduces anonymity by 2.14 bits, whereas our approach reduces anonymity by bits

Limitations 32 We didn’t have access to two real-world social network data sets with the overlapping sets of users and edges Our measure is estimating the upper bound of the degree of anonymity We approximate the real probabilities for calculating degree of anonymity by running simulations

Future work 33 Advanced anonymization techniques are required Our approach can be improved by use of additional attributes for re-identifying communities and users Test other anonymization techniques using comm- aware de-anonymization approach

Conclusion 34 Our approach divides the problem into smaller sub- problems that can be solved by leveraging existing network alignment methods recursively on multiple levels Our approach is more robust against added noise to the anonymized data set, and can perform well with fewer known seeds as well as larger networks. We analyzed the ‘degree of anonymity’ of users in the graph and showed that the mapping of communities may markedly reduce the degree of anonymity of users.

35 THANK YOU! QUESTIONS?