Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte Privacy and Spectral Analysis on Social Network Randomization.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modularity and community structure in networks
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Lecture 21: Spectral Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Privacy Preserving Market Basket Data Analysis Ling Guo, Songtao Guo, Xintao Wu University of North Carolina at Charlotte.
SAC’06 April 23-27, 2006, Dijon, France Towards Value Disclosure Analysis in Modeling General Databases Xintao Wu UNC Charlotte Songtao Guo UNC Charlotte.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
CS8803-NS Network Science Fall 2013
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Models of Influence in Online Social Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Protecting Sensitive Labels in Social Network Data Anonymization.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Xintao Wu Jan 18, 2013 Retweeting Behavior and Spectral Graph Analysis in Social Media.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Randomization based Privacy Preserving Data Mining Xintao Wu University of North Carolina at Charlotte August 30, 2012.
Privacy-preserving data publishing
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Ultra-high dimensional feature selection Yun Li
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Graph clustering to detect network modules
Random Walk for Similarity Testing in Complex Networks
A paper on Join Synopses for Approximate Query Answering
Community detection in graphs
Personalized Privacy Protection in Social Networks
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

Xiaowei Ying, Leting Wu, Xintao Wu University of North Carolina at Charlotte Privacy and Spectral Analysis on Social Network Randomization

Framework 2 Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy Feature Preserving Randomization Spectrum preserving randomization General feature preserving randomization (Markov chain based) Attacks to feature preserving randomization Reconstruction from Randomized Graphs Spectrum Based Fraud Detection A spectral framework to quantify non-randomness of social networks Spectrum based fraud detection Future Work

Background & Motivation 3

Social Network 4 Friendship in Karate club [Zachary, 77] Biological association network of dolphins [Lusseau et al., 03] Collaboration network of scientists [Newman, 06] Network of US political books (105 nodes, 441 edges) Books about US politics sold by Amazon.com. Edges represent frequent co-purchasing of books by the same buyers. Nodes have been given colors of blue, white, or red to indicate whether they are "liberal", "neutral", or "conservative".

7 Public/ Third party/ Research Inst. Data Owner The original graph data release Background & Motivation Publish/outsource data for mining/analysis Data miner: discover patterns/features of the data (utility) -- find central nodes, community partition, link prediction Attacker: breach sensitive information the data (privacy) -- identity of nodes (and sensitive attributes), sensitive relation between two individuals

Privacy issues in publishing social network data: Anonymization is not enough for protecting the privacy. Active/passive attacks[1], subgraph attacks [2]. [1] L. Backstrom, et. al., Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. WWW07 [2] M. Hay et. al. Resisting Structural Reidentification in Anonymized Social Networks, VLDB08 Background & Motivation 8

Privacy Preserving Social Network Publishing Node-anonymization cannot guarantee identity/link privacy due to subgraph queries. K-anonymity generalization The released graph has at least k nodes with the same degree/subgraph/neighorhood [Liu&Terzi SIGMOD08, Zhou&Pei ICDE08, Chen VLDB09] Graph (edge) randomization Random Add/Del & Random Switch Utility preserving randomization Super graph generalization Generate nodes into supper nodes, and edges into supper edges 9

Background & Motivation Graph Randomization/Perturbation: 1. Random Add/Del edges (no. of edges unchanged) 2. Random Switch edges (nodes’ degree unchanged) 10

Background & Motivation Graph Randomization/Perturbation: Data privacy: How graph randomization prevents privacy disclosure? Data utility: How will the graph structure change due to randomization? How to preserve graph structural features better? 11

Background & Motivation Numerous topological measures of networks Harmonic mean of shortest distance Transitivity(cluster coefficient) Subgraph centrality Modularity (community structure); And many others 12

Background & Motivation Spectral measures – adjacency matrix Adjacency Matrix A (symmetric) Adjacency Spectrum 13

Laplacian Matrix and Spectrum: Normal Matrix and Spectrum 14 Background & Motivation

Many topological features are related to spectral measures: No. of triangles: Subgraph centrality: Graph diameter: k disconnected parts in the graph ⇔ k 0’s in the Laplacian spectrum. 15

Background & Motivation Two important eigenvalues: and 1. The maximum degree, chromatic number, clique number etc. are related to ; 2. Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; 3. indicates the community structure of the graph: clear community structure ⇔ ≈ 0. 16

The Laplacian eigenvalues Basic Facts of Graph Spectrum Graph from: A. Capocci , et. al., Detecting communities in large networks

Basic Facts of Graph Spectrum The Laplacian eigenvectors

Privacy in Randomized Graph 19

Framework 20 Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy Feature Preserving Randomization Spectrum preserving randomization General feature preserving randomization (Markov chain based) Attacks to feature preserving randomization Spectrum Based Fraud Detection A spectral framework to quantify non-randomness of social networks Spectrum based fraud detection Future Work

Link Privacy: Prior & Posterior Beliefs Quantify attacker’s belief (assume that node identities are known) Prior probabilities: Posterior probability for node pair (i, j): Serious jeopardize the privacy when 21

Link Privacy: Prior & Posterior Beliefs Method I [Ying, Wu, SDM08] Add & Del k links Switch k times 22

Link Privacy: Prior & Posterior Beliefs M ethod II [Ying, Wu, PAKDD09] A common phenomenon: in real-world graphs similar nodes tend to connect to each other 23

Link Privacy: Prior & Posterior Beliefs M ethod II [Ying, Wu, PAKDD09] Even after moderate randomization, the phenomenon still exists: 24

Link Privacy: Prior & Posterior Beliefs M ethod II [Ying, Wu, PAKDD09] 25

Add/Del: 1. True links are deleted w.p. 2. False links are added w.p. Link Privacy: Prior & Posterior Beliefs M ethod II [Ying, Wu, PAKDD09] 26 With Bayes’ theorem

Link Privacy: Prior & Posterior Beliefs Method II [Ying, Wu, PAKDD09] Evaluation (add/del 50% true links) 27

Link Privacy: Prior & Posterior Beliefs M ethod II [Ying, Wu, PAKDD09] The total sum of prior and posterior probabilities is the same: 28 prior prob.posterior prob. Iposterior prob. II

Link Privacy: Prior & Posterior Beliefs M ethod III [Ying, Wu, SDM09] Intuition: degree sequence specifies a graph space, and the true graph is just one member of the space. 29 Example: switch graph with degree sequence {3,2,2,2,3} Is node 1 and 5 connected?

Link Privacy: Prior & Posterior Beliefs M ethod III [Ying, Wu, SDM09] Graph space = {G: with a given degree sequence} Impractical to enumerate all members in the space Sample the graph space through Markov chain: 30

Link Privacy: Prior & Posterior Beliefs M ethod III [Ying, Wu, SDM09] Evaluation Polbooks (r=8%) Enron (r=8%) 31

Identity Privacy: Re-identify nodes in the anonymous graphs based on some background information (e.g. degree) Randomization reduces attackers’ beliefs Node Identity Privacy 32 Polbooks: degree distribution After randomization

Node Identity Privacy Nodes’ prior and posterior risks Given an individual α with degree d α and a randomized graph Prior risk: Posterior risks 33

Node Identity Privacy Ongoing work: Compare randomization and k-anonymity approach: -- to achieve the same privacy protection level, which approach can achieve better utility? Combine identity privacy and node privacy. Node identity privacy issue under different background information (e.g., sub-graph, neighborhood). 34 K-degree generalization [Liu et. al.]

Feature Preserving Randomization 35

Framework 36 Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy Feature Preserving Randomization Spectrum preserving randomization General feature preserving randomization (Markov chain based) Attacks to feature preserving randomization Reconstruction from Randomized Graphs Spectrum Based Fraud Detection A spectral framework to quantify non-randomness of social networks Spectrum based fraud detection Future Work

Feature Preserving Randomization Topological and spectral features change a lot along the perturbation. 37 (Networks of US political books, 105 nodes and 441 edges) Can we better preserve the network structure?

Features in Social Network Data Two important eigenvalues: and 1. The maximum degree, chromatic number, clique number etc. are related to ; 2. Epidemic threshold for virus propagates in the network is related to [Wang et al., KDD03]; 3. indicates the community structure of the graph: clear community structure ⇔ ≈ 0. 38

Spectrum Preserving Randomization Spectrum preserving approach [Ying, Wu, SDM08] Intuition: since spectrum is related to many graph topological features, can we preserve more structural features by controlling the movement of eigenvalues? 39

Spectrum Preserving Randomization Spectral Switch (apply to adjacency matrix): To increase the eigenvalue: To decrease the eigenvalue: 40

Spectrum Preserving Randomization Spectral Switch (apply to Laplacian matrix): To decrease the eigenvalue: To increase the eigenvalue: 41

Spectrum Preserving Randomization Evaluation: 42 (Networks of US political books, 105 nodes and 441 edges)

Markov Chain Based Feature Preserving Randomization Markov chain generation [Ying, Wu, SDM09] Data owner puts feature range constrains in switching Feature range constrains: The data owner publish the feature range constraint. 43

Markov chain generation [Ying, Wu, SDM09] Markov chain with feature range constraint (uniformity for accessible graphs) Markov Chain Based Feature Preserving Randomization 44

Markov chain generation [Ying, Wu, SDM09] Problem: accessibility is not guaranteed We propose the relaxed algorithm with feature range constraint (accessibility, approximate uniformity) The relaxed algorithm also has applications in testing the significance data mining results Markov Chain Based Feature Preserving Randomization 45

Data owner puts feature range constrains in switching Feature range constrains: Can attackers utilize the feature constrains to breach link privacy? 46 Attacks to Feature Preserving Randomization

Markov chain approach [Ying, Wu, SDM09] Markov chain with feature range constraint Graph space = {G: with a given deg. seq. & S(G) in R} 1. Starting with the randomized data, repeat the switch procedure many times and get one sample graph 2. Generate N graphs Attacks in Feature Preserving Randomization 47

Attacks in Utility Preserving Randomization Markov chain approach [Ying, Wu, SDM09] Evaluation Polbooks (r=8%) Enron (r=8%) 48 Future work: what cause the difference? What features will (not) release privacy?

Reconstruction from Randomized Graphs 49 Motivation Low Rank Approximation on Graph Data Reconstruction from Randomized Graph Privacy Issue SDM10 paper

Motivation We focus on whether we can reconstruct a grpah from s.t. 50 Our Focus

Revisit of LRA in Numerical Data Spectral Filter derive estimation of U from perturbed data Calculate covariance matrix which is symmetric and positive definite Apply spectral decomposition to Derive the eigenvalues information from the covariance matrix of noise V and choose a proper number of dimensions, r Let and, obtain the estimated data set using 51

52 Why it works Original data are correlated Noise are not correlated noise 2 nd principal vector 1 st principal vector original signal perturbed + = 2-d estimation 1-d estimation

53 Determining r Strategy 1: (Huang and Du SIGMOD05 ) Strategy 2: (Guo, Wu and Li, PKDD 2006) The estimated data using is approximate optimal

Graph Data Matrix Representation of Network Adjacency Matrix A (symmetric) Adjacency Spectrum 54

Low Rank Approximation Low Rank Approximation by eigen-docomposition: This provide a best r rank approximation to A To keep the structure of adjacency matrix, discretize 55

New Challenges A is a 0-1 adjacency matrix whereas U is a numerical matrix and is positive covariance matrix has only non-negative eigenvalues whereas A has both positive and negative eigenvalues. Can not define the covariance matrix for graph data The strategy of determining the number of eigen components to use in numerical data does not work for graph data since the first eigenvalue of the noise matrix could be very large. 56

Leading Eigenpairs vs. Graph Topology Here we examine the role of positive and negative eigenvalues in graph topology Without loss of generality, we partition the node set into two groups and the adjacency matrix can be partitioned as where and represent the edges within the two groups and represents the edges between the groups 57

Leading Eigenpairs vs. Graph Topology 58 r = 1 r = 2 Original

Leading Eigenpairs vs. Graph Topology 59 Original r = 1 r = 2

Leading Eigenpairs vs. Graph Topology 60 Originalr = 1 r = 4 r = 2

Algorithm 61

Reconstructed Features (Political Blogs 40% Noise) 62

Determine Number of Eigenpairs It is essential to find a best number of r with the randomized graph and the perturbation magnitude. Choose as the indicator since it is closely related to the other features and there exists an explicit moment estimator 63

Data Sets Political Blogs Based on incoming and outgoing links and posts during the time of 2004 presidential election links among 1222 US political blogs Political Books Based on the political books sold by Amazon.com where nodes represent the books and edges represent the co-purchasing of books 105 nodes and 441 edges Enron Based on corpus of a real organization covering 3 years period where an edge represents there are at least 5 s sent between two people 151 nodes and 869 edges 64

Effect of Noise (Political Blogs) The method works well to a certain level of noise Even with high level of noise, the reconstructed features are still closer to the original than the randomized ones 65

Reconstructed Features on 3 real network data 66 Reconstruction Quality When, the reconstructed features are closer to the original ones than the randomized ones All positive for the three data sets

Privacy Issue Question 1: Can this reconstruction be used by attackers? Define the normalized Frobenius distance between A and as 67 Political Books Enron Political Blogs Normalized F Norm

Privacy Issue Question 2: Which type of graphs would have privacy breached? For low rank graphs which have, the distance between the reconstructed graph and the original graph can be very small 68 Randomizing Social Network: a Spectrum Preserving Approach, SDM08

Synthetic Low Rank Graphs Here is a set of synthetic low rank graphs generated from Political Blogs and you can see that the reconstruction works on both the distance and features 69

Conclusion We have shown the close relationship between graph topological structure and spectral spaces determined by eigen-pairs of the adjacency matrix We have presented a low rank approximation based reconstruction algorithm and a novel solution to determine the optimal rank in reconstruction We find for most social networks, the reconstructed networks do not incur further disclosure risks of individual privacy than the released randomized graphs, only networks with low ranks or a small number of dominant eigenvalues may incur further privacy disclosure due to reconstruction 70

Spectrum Based Fraud Detection 71

Framework 72 Background & Motivation Privacy in Randomized Graph Link privacy (3 method to quantify link privacy) Node privacy Feature Preserving Randomization Spectrum preserving randomization General feature preserving randomization (Markov chain based) Attacks to feature preserving randomization Reconstruction from Randomized Graphs Spectrum Based Fraud Detection A spectral framework to quantify non-randomness of social networks Spectrum based fraud detection Future Work

A Spectral Framework to Quantify Graph Non-randomness Adjacency Matrix A (symmetric) Adjacency Spectrum 73

A Spectral Framework on Quantifying Graph Non-randomness 74 Graph non-randomness [Ying, Wu, SDM09] Spectral coordinates: Link non-randomness: Node non-randomness: Graph non-randomness:

A Spectral Framework to Quantify Graph Non-randomness 75 Graph non-randomness [Ying, Wu, SDM09] Spectral coordinates:

Background & Motivation 76 Laplacian spectral spaceNormal spectral space

Graph randomness [Ying, Wu, SDM09] Link non-randomness: A Spectral Framework to Quantify Graph Non-randomness 77

Graph randomness [Ying, Wu, SDM09] Node non-randomness: A Spectral Framework to Quantify Graph Non-randomness 78

Graph randomness [Ying, Wu, SDM09] Graph non-randomness: A Spectral Framework to Quantify Graph Non-randomness 79 Property Normally distributed with mean equals to ER-graph; The complete and regular graph reach the positive and negative extreme values; Randomization reduces the non- randomness value. Normalized by the mean and standard deviation for ER-graphs

A Spectral Framework to Quantify Graph Non-randomness Application: spectral switch (apply to adjacency matrix): To preserve the non-randomness of the whole graph (eigenvalues), deleted edges and added fake edges has comparable edge non- randomness values. 80

81 Collaborative Attacks Some attackers join the social network Attackers create links to regular users (victims) Attacks form some inner structure among themselves

Graph Perturbation 82

Collaborative Attacks 83

first order second order 84 Regular nodes are approximately unchanged Collaborative Attacks Approximate the entries in the eigenvector

Collaborative Attacks 85 Regular nodes are approximately unchanged first order second order The entry is expressed by the victims approximately Inner structure among attackers affects the eigenvector in the second order term Approximate the entries in the eigenvector

Problem We do not know attackers/victims in advance, hence their specific spectral coordinates are unknown. For Random Link Attacks, we can derive the distribution of attacking nodes’ spectral coordinates. 86

87 The attacker creates some fake nodes, and control the fake nodes to connect to randomly selected regular nodes; Fake nodes can mimic the real graph structure among themselves to evade detection. Random Link Attacks

88 Idea count out triangles around nodes --- regular connections produce many triangles, random connections do not create many triangles Algorithm Detecting suspects clustering test and neighborhood independence test Detecting RLAs GREEDY and TRWALK Limitation difficult to detect when attackers create a dense subgraph among them Too many parameters Topology approach -- Shirvastava et al. icde08

For Random Link Attacks (RLA): has the normal distribution with mean and variance bounded by: We can get the region in the spectral space where RLA attackers appear in high probability Spectrum based RLA detection 89 Inner structure of attackers does not affect the region!!!

For Random Link Attacks (RLA): has the normal distribution with mean and variance bounded by: We can get the region in the spectral space where RLA attackers appears in high probability Spectrum based RLA detection 90 Inner structure of attackers does not affect the region!!! 20 attackers, each attacks 30 victims averagely

Combine k dimensions together: We can get the upper bounds of mean and variance of R and get the decision line: 91 Using node non-randomness Nodes below the decision line are suspects

Example I 92 Spectral properties of normal nodes and attackers 20 attackers join the Polblogs network. Each attacker connects 50 randomly selected victims. Attackers form a random graph among themselves

Example II 93 Spectral properties of normal nodes and attackers 40 attackers join the Polblogs network. They totally attack 1000 randomly selected victims. Attackers mimic real network structure among themselves

Comparison Topology based RLA detection approach – Shrivastava et al. ICDE08 clustering test and neighborhood independence test GREEDY and TRWALK Experimental Setting Web Spam Challenge data (114K nodes and 1.8M links) Add 8 RLAs with varied sizes and connection patterns. 94

Accuracy 95

Execution time 96

Distributed Denial Of Service Attacks 97 Spectral properties of victim nodes Attacker controls 200 normal nodes to attack one victim node.

Fraud Detection: Bipartite Core Attacks Attacker creates two type of nodes: Accomplices: connect to normal nodes and pretend to be normal. Accomplices also connect to fraudsters (and enhance fraudsters’ rating). Fraudsters: nodes that actually do frauds, mostly connect to accomplices Figure from: Duen Horng Chau et. al., Detecting Fraudulent Personalities in Networks of Online Auctioneers 98 Bipartite core

Future work Compare randomization and k-anonymity Combine link privacy and node privacy Link and node privacy issue for feature preserving randomization Spectral based fraud detection for various random attacks 99

Thank you! Questions? X. Wu, X.Ying, K. Liu and L. Chen. "A Survey of Algorithms for Privacy-Preservation of Graphs and Social Networks". Invited book chapter. Managing and Mining Graph Data. August X. Ying, X. Wu, K.Pan, and L. Guo. "On the Quantification of Identity and Link Disclosures in Randomizing Social Networks". Invited book chapter. Advances in Information & Intelligent Systems. Springer, X. Wu, X. Ying and L. Wu. "Analyzing Socio-technical Networks: a Spectrum Perspective". Invited book chapter. Socio-technical Networks: Science and Engineering Design, X. Ying, K. Pan,X. Wu and L. Guo. "Comparisons of Randomization and K-degree Anonymization Schemes for Privacy Preserving Social Network Publishing ", (SNA-KDD09). X. Ying and X. Wu. “Graph Generation with Prescribed Feature Constraints”, (SDM09). X. Ying and X. Wu. "On Randomness Measures for Social Networks", (SDM09). X. Ying and X. Wu. "On Link Privacy in Randomizing Social Networks". (PAKDD09, Best Student Paper Runner-up Award) X. Ying and X. Wu. "Randomizing Social Networks: a Spectrum Preserving Approach". (SDM08). 100

Evaluation 101

Node randomness: Future Work: Random Attack Detection 102

Fraud Detection: Bipartite Attacks 103 Algorithm outline: Find the suspect according to node non-randomness measure; Compute the common neighbor (CN) matrix of suspects: Susp_CN(i,j) = # CN of i and j Susp_CN is a weighted undirected graph! Find dense subgraphs in Susp_CN graph.

Fraud Detection: Bipartite Attacks 104 Spectral space of Susp_CN graph Polblogs network, 20 accomplices, and 15 fraudsters

Future Work: Node Identity Privacy Re-identification risks reduces as k increases; Add/Del strategy can efficiently reduce the risk. 105

Link Privacy: Prior & Posterior Beliefs 106 M ethod III [Ying, Wu, SDM09] 1. Uniform switch procedure [Taylor, 1981] 2. Starting with the randomized data, repeat the uniform switch procedure many times and get one sample graph 3. Generate N graphs

Link Privacy: Prior & Posterior Beliefs 107 M ethod III [Ying, Wu, SDM09] 1. Uniform switch procedure [Taylor, 1981] 2. Starting with the randomized data, repeat the uniform switch procedure many times and get one sample graph 3. Generate N graphs