1 Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University

2 Online social networks (OSNs) Explosive growth of online communities enables study of social processes and behavior at a larger scale than ever before Facebook: 200 mil active users MySpace: 125 mil active users LinkedIn: 40 mil users User-contributed data is much more extensive than hand- collected networks previously studied in social science

3 OSNs are larger and more heterogeneous than manually-collected social networks Min degree=1 Median degree=81 Max degree=2173 Purdue Facebook Network Min degree=1 Median degree=7 Max degree=10 UNC National Longitudinal Study of Adolescent Health In-School Survey

4 High median degree implies the presence of many weak, or spurious, friendship links. Conjecture: Strong relationships can be identified automatically from transactional link information

5 OSNs contain additional information about user interactions Wall communications Group membership Photo postings

6 Purdue Facebook network 56061 public users in March 2008 Undergrads, grad students, faculty, staff, alumni

7 Information about strong relationships Top Friends application allows users to nominate some of their friends as “best friends” This provides us with positive and negative training examples of strong relationships 4900 Purdue users have Top Friends application visible publicly (9%) 17,393 Purdue users are nominated as a Top Friend Max out-degree=40 max in-degree=14



10 Automatically identifying top friends Formulate this as a link strength prediction task For each friend pair (u,v), predict whether they are “top friends” given their attributes, interactions, and network information. Use supervised learning methods: Logistic regression, naïve Bayes classifiers, and bagged decision tress Consider features from four different categories: attribute similarity, topological connectivity, transactional connectivity, and network-transactional connectivity. Evaluate on data from the public Purdue Facebook network Use basic attribute information from profile, friendship links, wall postings, picture postings, group memberships, and “top friend” nominations

11 Related work Link prediction Focuses on predicting future links between any (u,v) pair in a network with a single edge type (i.e., friendship) Previous methods primarily use attribute similarity features (e.g., Taskar et al. ‘03) or topological features of the network (e.g., Liben-Nowell & Kleinberg ‘04) Adamic and Adar (‘03) used ancillary network information for link prediction but they focused on similarity-based features instead of transactions/interactions Pruning spurious links Singh et al. (’05) and Hill et al. (‘07) sample nodes and edges based on structural properties but they do not consider transactional information

12 Feature types (1) Attribute-based features Assess attribute similarity between users (e.g., number of matches) UV Gender: Male Religious: Christian Political: Moderate Gender: Male Religious: Agnostic Political: Conservative (2) Topological features UV Assess connectivity of users in friendship network (e.g., number of common neighbors)

13 Feature types (3) Transactional features Assess transactional activity between user pairs (e.g., number of bi- directional posts) UV Wall post Photo post Same group (4) Network-transactional features UV Assess connectivity of users in transaction networks (i.e., moderate transactional activity by interactions with other users)

14 Methodology Models Bagged decision trees, naïve Bayes classifiers, and logistic regression Experiments Feature ranking Feature type comparison Link type comparison Overall classification Performance measure: area under the ROC curve (AUC) Measures the quality of (probability) rankings produced by the model

15 Facebook sample Random sample of 500 users with top friends application Consider all friends of those 500 users Top friends  positive training example Other friends  negative training example Restrict attention to pairs that have values for  4 common attributes Final sample consisted of 8766 linked friends with 896 (10.2%) positive examples

16 Experiment 1: Feature rankings Compare relative importance of each of the 50 features Measures: Information gain Chi-square statistic Compute average rank of each feature and look at top 15: 12 are network-transactional features, 3 are transactional 12 use wall information, 3 use picture information

17 Experiment 2: Feature type comparison Network-transactional AUC=84% Transactional AUC=74% Topological AUC=75% Ablation study using features of each type separately Attribute-based Topological Transactional Network-transactional Network-transactional features achieve best performance Attribute-based AUC=50%

18 Experiment 3: Link type comparison Wall AUC=82% Group AUC=63% Picture AUC=62% Ablation study using data from each link type separately (all features) Wall Picture Groups Friendship Wall information results in best performance Friends AUC=77% Why doesn’t picture information improve performance?… sparsity. 28% of user pairs have  1 wall link 4% of user pairs have  1 picture link

19 Experiment 4: Overall classification results Uses 50 features, compares performance of three different models Bagged decision trees achieve best performance Network-transactional features account for 97% of the performance observed using all features Bagged Decision Trees AUC=87% Naïve Bayes AUC=81% Logistic Regression AUC=82%

20 Conclusion Formulated a link strength prediction task to automatically identify stronger relationships among existing friendships. Compared the utility of attribute-based, topological, transactional, and network- transactional features Showed that in addition to good accuracy overall, network-transactional features had the largest impact on model performance Results indicate that transactional events are useful for predicting link strength However, it is also necessary to consider the transactional events in the context of user behavior within the larger social network

21 Future work Exploit temporal aspect of transactions to improve predictions Address the more general link-strength prediction task by formulating a latent variable model

22 Thank you! Indika Kahanda: Jennifer Neville: Questions?

