Download presentation
Presentation is loading. Please wait.
Published byAlexandra Hood Modified over 8 years ago
1
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Predicting Links and Link Change in Friends Networks: Supervised Time Series Learning with Imbalanced Data William H. Hsu, Tim Weninger and Martin S.R. Paradesi Department of Computing and Information Sciences Kansas State University, Manhattan KS ANNIE 2008 St. Louis, MO USA
2
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Outline Introduction Background ›Friends networks ›Methodologies for link mining Atemporal data set Temporal data set ›Link prediction vs change prediction Experiment Design ›LJCrawler 3.0 ›Prediction Tasks ›Handling imbalanced data Results Conclusions and Future Work
3
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Introduction Problem Definition ›Given: records of users of a social network service ›Discover: Features of entities: users, communities Relationships: friendship, membership Explanations and predictions for relationships
4
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Background What are friends networks?
5
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Rationale Friendships change: From: Andrew Chen - http://andrewchenblog.com/2008/01/07/does-facebook-reflect-your-true-friendships-how-about-e-mail/ We would like to predict those changes
6
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Methodologies for Link Prediction [1] uvuvuvuvuv Node-Dependent Features: specific to one node (vertex) within candidate pair Indegree (u) “Source popularity” Outdegree (u) “Source fertility” Outdegree (v) “Target fertility” Indegree (v) “Target popularity” Pair-Dependent Features: specific to one candidate pair of nodes (vertices) Link-Dependent Features: specific to one link (edge) in directed graph uv Common entities: interests, friends, schools, etc. Attributes of common entities Computed from relational query on entities u, v Past, predicted duration Diagnosed cause Computed and stored with relationship set Atemporal features:
7
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Methodologies for Link Prediction [2] Temporal Features ›There are none! (we must discover them) Link Prediction vs Change Prediction We know how to do link prediction ›(see our work in AAAI-SS'06, ICWSM'07) This work is specifically on detecting changes in links
8
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Experiment Design [1] LJCrawler v3 ›Crawled LiveJournal in a breadth-first manner ›Gathering friends, interests, communities, etc. Max bandwidth allowed for 200 users/sec. per computer ›LiveJournal terms of service allowed for 5 users/sec. (I obliged) Crawler was run for three hours, every six hours for seven days ›09/27/2007 00:00 CST – 10/03/2007 18:00 CST ›28 total crawls. Mostly of the same users
9
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Prediction Tasks First: ›Identify changes from the first crawl and the final crawl ›A true label is annotated iff (u, v) ∉ E i and (u, v) ∈ E f. Second: ›Learning from incremental differences: ›Where f t-k is the feature tuple from the crawl at time t-k (k crawls ago)
10
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Handling Imbalanced Data Relatively few changes in six hours. Therefore there are relatively few positive examples ›Downsample negative examples ›Upsample positive examples Fixed Ratio (FR) ›Positive : Negative examples = 1:1 Fixed Count (FC) ›Random sample: Negative >> Postive Kubat M., Matwin S., 1997, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” In Proceedings of the 14th International Conference on Machine Learning (ICML'97), pp. 179-186.
11
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Experiments Data for 1000, 2000, 4000 pairs ›Fixed Ratio (FR) with Graph Features ›FR without Graph Features ›Fixed Count (FC) with Graph Features ›FC without Graph Features WEKA was used to train classifiers FR/FC ›Train with 1:1 data test with random data FC/FC ›Test with 10-fold cross validation
12
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results [1] J48 – Atemporal data ›Predicting becomeFriends Class: yes iff u, v were not friends before T 0 and were friends after T 28 FR/FCFC/FC Graph Features (%)All Features (%)Graph Features (%)All Features (%) mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. 100083.02.0100.083.02.0100.099.50.0 99.30.0 200090.74.690.092.15.090.099.50.0 99.433.32.0 400095.12.066.792.62.0100.099.90.0 99.711.116.7
13
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[2] Incremental time series task ›The realignment allows the inducer to learn temporal actions that may lead to changes in the friendship status of a pair. FR/FR - cross validation J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Features 100099.0100.087.5100.0 200098.9100.090.193.271.760.699.4100.094.4 400099.499.897.877.958.427.395.188.990.9 All Features 100099.0100.087.5100.0 200098.9100.090.195.275.084.599.4100.094.4 400099.399.597.583.175.143.795.188.790.9
14
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[3] FR/FC – train with FR test with FC J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Features 100059.91.575.067.31.874.050.71.275.0 200085.52.949.380.10.614.180.21.737.4 400084.32.341.080.60.48.777.52.154.3 All Features 100059.91.575.059.91.575.050.71.275.0 200084.92.090.189.10.984.579.61.094.4 400084.01.020.079.30.513.976.91.546.7
15
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[4] FC/FC – train with FR test with FC J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Feat. 1000100.0 99.2100.03.599.483.736.0 200099.9100.099.1 65.54.399.393.626.2 All Feat. 1000100.0 99.369.929.099.467.945.5 2000100.0 99.466.235.599.578.957.9
16
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Conclusions We are able to learn a predictor for the time series problem. Handling imbalanced data: ›FR/FR – Does not represent the data ›FC/FC – Does not train a classifier very well ›FR/FC – Probably the most appropriate predictor High recall, low precision, and decent accuracy Future Work ›Better, more descriptive, time series features should be discovered. ›Apply prepossessing to the feature vector
17
Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.