Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong ICONIP 2010, Sydney, Australia
Motivation Online social network is an important way to interact with friends A large number users are attracted by it – 500 million active users (Facebook) – 700 billion minutes (Facebook) The security of users’ information attracts much attention from researchers and developers ICONIP 2010, Sydney, Australia2
Problem Hidden Information ? Group & Network Friendship Users’ Profile 3ICONIP 2010, Sydney, Australia U3U3 U3U3 U1U1 U1U1 U2U2 U2U2 U4U4 U4U4 U5U5 U5U5 Group Network
Example On Facebook Given: – Users’ profiles, e.g., age, location and phone – Friendship relationship – Member lists of groups and networks Output – Predict the university information ICONIP 2010, Sydney, Australia4
Objective Build a model with proper algorithm to predict the hidden information Better utilize community information Related works – Graph Theory [G. Flake et al., SIGKDD 2000] – Supervised Learning [E. Zheleva et al., WWW2009] – Semi-Supervised Learning [M. Mo et al., IJCNN2010] 5ICONIP 2010, Sydney, Australia
Contributions Propose a novel community-based model – Predict hidden information more accurately Provide two algorithms – Be able to deal with different conditions Help to understand the security level in social networks. 6ICONIP 2010, Sydney, Australia
Preparation for Modeling Definition – Online social network: G(V, E) Profile P i Friendship W ij – Two sets Labeled data V l Unlabeled data V u P3P3 P3P3 P1P1 P1P1 P2P2 P2P2 P4P4 P4P4 P5P5 P5P5 P3P3 P3P3 P1P1 P1P1 P2P2 P2P2 P4P4 P4P4 P5P5 P5P5 Y5Y5 Y1Y1 W 1,3 W 3,4 W 3,5 W 4,5 W 2,4 W 1,2 7ICONIP 2010, Sydney, Australia
Consistency on Graph ICONIP 2010, Sydney, Australia8 Community Consistency Community-Based Graph (CG) SSL Model 3 Local Consistency Global Consistency Basic Graph-Based SSL with Harmonic Function Local and Global Consistency (LGC) Graph SSL Model 1 Model 2 U3U3 U3U3 U1U1 U1U1 U2U2 U2U2 U4U4 U4U4 U5U5 U5U5 Y2Y2 Y1Y1 Local Consistency label Y 1 should be similar to label Y 2 Global Consistency Predicted label should be closed to the true label Y 2 Network Community Consistency Predicted label should be closed to the true label, if user 2 and user 4 are in the same network.
Community-based Graph (CG) Model Input: basic graph, community graph Output: predicted labels Objective is the Laplacian Matrix of community info, and Local & Global Consistency (LGC) Learning Community Term 9ICONIP 2010, Sydney, Australia True LabelsParameter 1Parameter 2
Community-based Graph (CG) Model Generating – Clustering vertices “Distance” is measured by Group and Network info. – Mark down each cluster in a matrix E.g., a cluster contains the vertex 1, 2 and 3 – _, n c is the total number of clusters 10ICONIP 2010, Sydney, Australia
Algorithms Algorithm one – Closed form algorithm – Simple and time-saving Input Output Process 11ICONIP 2010, Sydney, Australia
Algorithms Algorithm two – Iterative algorithm – Able to deal with large-scale data Input Output Process True False 12ICONIP 2010, Sydney, Australia
Experiments Datasets – One synthetic dataset: TwoMoons – Two real-world datasets: StudiVZ & Facebook Objectives – Classification in TwoMoons – Predict university names in StudiVZ & Facebook Comparison – Supervised learning – Basic and LGC graph learning Evaluation – Accuracy and confidence 13ICONIP 2010, Sydney, Australia
Datasets Statistic Visualization – TwoMoons 14ICONIP 2010, Sydney, Australia
Experimental Results TwoMoons (200 vertices) 15ICONIP 2010, Sydney, Australia The community information does help in prediction in term of accuracy The CG SSL is stably better than the others Observations
Experimental Results StudiVZ (1,423 users) 16ICONIP 2010, Sydney, Australia All graph-based SSL outperforms supervised learning The CG SSL keeps stably better than the others Observations
Experimental Results Facbook (10,410 users) 17ICONIP 2010, Sydney, Australia In most cases, CG SSL outperforms other learning methods There is little instability in CG SSL model Observations
Experiments CG SSL performs best in most cases The curves of CG SSL are not increasing monotonically The accuracies on Facebook dataset are less than the others 18ICONIP 2010, Sydney, Australia
Conclusion Community-based Graph SSL model describes the real world more exactly CG SSL predicts the hidden information of online social networks with higher accuracy and confidence The security of users’ information becomes in lower level 19ICONIP 2010, Sydney, Australia
THANK YOU Q & A 20ICONIP 2010, Sydney, Australia