Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering, The Chinese University of Hong Kong
Problem Hidden Information ? Some Technique Users’ Profile Friendship Group & Network Friendship Users’ Profile Some Technique ICONIP 2010, Sydney
Objective Build a model with proper algorithm to predict the unknown information Better utilize community information Previous Works Graph Theory [G. Flake et al, SIGKDD 2000] Supervised Learning [E. Zheleva et al, WWW2009] Semi-Supervised Laerning [M. Mo et al, IJCNN2010] ICONIP 2010, Sydney
Contributions A novel model and algorithms are proposed Help to understand the security level in social networks. Community Consistency Community-Based Graph (CG) SSL Model 3 Local Consistency Global Consistency Basic Graph-Based SSL with Harmonic Function Local and Global Consistency (LGC) Graph SSL Model 1 Model 2 ICONIP 2010, Sydney
Model Preparation Online Social Network: G(V, E) Two Sets: labeled data & unlabeled data P3 P3 W1,3 W3,5 Y1 Y5 P1 P1 W3,4 P5 P5 W1,2 W4,5 P2 P2 P4 P4 W2,4 ICONIP 2010, Sydney
Local & Global Consistency (LGC) Learning Model Objective is the Laplace Matrix of community info . , and Local & Global Consistency (LGC) Learning Community Term Improve….. ICONIP 2010, Sydney
Model Generating Clustering vertices “Distance” is measured by Group and Network info. Mark down each cluster in a matrix _ , nc is the number of clusters ICONIP 2010, Sydney
Model Algorithms Closed form algorithm Simple and time-saving Input Process Output ICONIP 2010, Sydney
Model Algorithms Iterative algorithm Able to deal with large-scale data Input Process False True Output ICONIP 2010, Sydney
Experiments Datasets Objectives Evaluation One synthetic dataset: TwoMoons Two real-world datasets: StudiVZ & Facebook Objectives classification in TwoMoons Predict university names in StudiVZ & Facebook Evaluation Accuracy and Confidence ICONIP 2010, Sydney
Experiments Datasets Improve….. ICONIP 2010, Sydney
Experiments Results – Accuracy of Prediction (a) TwoMoons (b) StudiVZ (c) Facebook ICONIP 2010, Sydney
Experiments CG SSL performs best in most cases The curves of CG SSL are not monotone increasing The accuracies on Facebook dataset are less than the others ICONIP 2010, Sydney
Conclusion Community-based Graph SSL model describes the real world exactly CG SSL predicts the hidden information of online social networks with higher accuracy and confidence The security of users’ information is no longer secure ICONIP 2010, Sydney
Thank you Q & A ICONIP 2010, Sydney