Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Predicting Links and Link Change in Friends Networks: Supervised.

Slides:



Advertisements
Similar presentations
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
ADBIS 2007 A Clustering Approach to Generalized Pattern Identification Based on Multi-instanced Objects with DARA Rayner Alfred Dimitar Kazakov Artificial.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Large-Scale Entity-Based Online Social Network Profile Linkage.
STUN: SPATIO-TEMPORAL UNCERTAIN (SOCIAL) NETWORKS Chanhyun Kang Computer Science Dept. University of Maryland, USA Andrea Pugliese.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Classification of the aesthetic value of images based on histogram features By Xavier Clements & Tristan Penman Supervisors: Vic Ciesielski, Xiadong Li.
Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.
Trust Relationship Prediction Using Online Product Review Data Nan Ma 1, Ee-Peng Lim 2, Viet-An Nguyen 2, Aixin Sun 1, Haifeng Liu 3 1 Nanyang Technological.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
DEPARTMENT OF COMPUTER SCIENCE SOFTWARE ENGINEERING, GRAPHICS, AND VISUALIZATION RESEARCH GROUP 15th International Conference on Information Visualisation.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Introduction to Machine Learning Approach Lecture 5.
To Trust of Not To Trust? Predicting Online Trusts using Trust Antecedent Framework Viet-An Nguyen 1, Ee-Peng Lim 1, Aixin Sun 2, Jing Jiang 1, Hwee-Hoon.
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, April 3, 2000 DingBing.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data mining and machine learning A brief introduction.
1 Contact Prediction, Routing and Fast Information Spreading in Social Networks Kazem Jahanbakhsh Computer Science Department University of Victoria August.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris Lin, Neeraj Koul, and Vasant.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Predicting Positive and Negative Links in Online Social Networks
The Matrix: Using Intermediate Features to Classify and Predict Friends in a Social Network Michael Matczynski Status Report April 14, 2006.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Computer Science Department, Peking University
Prediction of Influencers from Word Use Chan Shing Hei.
Algorithmic Detection of Semantic Similarity WWW 2005.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Graphs G = (V,E) V is the vertex set. Vertices are also called nodes and points. E is the edge set. Each edge connects two different vertices. Edges are.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Classification using Co-Training
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
1 Blog Cascade Affinity: Analysis and Prediction 2009 ACM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Incorporating Site-level Knowledge for Incremental Crawling of Web Forums: A List-wise Strategy KDD 2009 Jiang-Ming Yang, Rui Cai, Chunsong Wang, Hua Huang,
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Sofus A. Macskassy Fetch Technologies
DATA MINING APPLICATION IN CRIME ANALYSIS AND CLASSIFICATION
Collective Network Linkage across Heterogeneous Social Platforms
Using Friendship Ties and Family Circles for Link Prediction
Roberto Battiti, Mauro Brunato
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
iSRD Spam Review Detection with Imbalanced Data Distributions
A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Leverage Consensus Partition for Domain-Specific Entity Coreference
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Predicting Links and Link Change in Friends Networks: Supervised Time Series Learning with Imbalanced Data William H. Hsu, Tim Weninger and Martin S.R. Paradesi Department of Computing and Information Sciences Kansas State University, Manhattan KS ANNIE 2008 St. Louis, MO USA

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Outline Introduction Background ›Friends networks ›Methodologies for link mining Atemporal data set Temporal data set ›Link prediction vs change prediction Experiment Design ›LJCrawler 3.0 ›Prediction Tasks ›Handling imbalanced data Results Conclusions and Future Work

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Introduction Problem Definition ›Given: records of users of a social network service ›Discover: Features of entities: users, communities Relationships: friendship, membership Explanations and predictions for relationships

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Background What are friends networks?

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Rationale Friendships change: From: Andrew Chen - We would like to predict those changes

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Methodologies for Link Prediction [1] uvuvuvuvuv Node-Dependent Features: specific to one node (vertex) within candidate pair Indegree (u) “Source popularity” Outdegree (u) “Source fertility” Outdegree (v) “Target fertility” Indegree (v) “Target popularity” Pair-Dependent Features: specific to one candidate pair of nodes (vertices) Link-Dependent Features: specific to one link (edge) in directed graph uv Common entities: interests, friends, schools, etc. Attributes of common entities Computed from relational query on entities u, v Past, predicted duration Diagnosed cause Computed and stored with relationship set Atemporal features:

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Methodologies for Link Prediction [2] Temporal Features ›There are none! (we must discover them) Link Prediction vs Change Prediction We know how to do link prediction ›(see our work in AAAI-SS'06, ICWSM'07) This work is specifically on detecting changes in links

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Experiment Design [1] LJCrawler v3 ›Crawled LiveJournal in a breadth-first manner ›Gathering friends, interests, communities, etc. Max bandwidth allowed for 200 users/sec. per computer ›LiveJournal terms of service allowed for 5 users/sec. (I obliged) Crawler was run for three hours, every six hours for seven days ›09/27/ :00 CST – 10/03/ :00 CST ›28 total crawls. Mostly of the same users

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Prediction Tasks First: ›Identify changes from the first crawl and the final crawl ›A true label is annotated iff (u, v) ∉ E i and (u, v) ∈ E f. Second: ›Learning from incremental differences: ›Where f t-k is the feature tuple from the crawl at time t-k (k crawls ago)

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Handling Imbalanced Data Relatively few changes in six hours. Therefore there are relatively few positive examples ›Downsample negative examples ›Upsample positive examples Fixed Ratio (FR) ›Positive : Negative examples = 1:1 Fixed Count (FC) ›Random sample: Negative >> Postive Kubat M., Matwin S., 1997, “Addressing the Curse of Imbalanced Training Sets: One-Sided Selection,” In Proceedings of the 14th International Conference on Machine Learning (ICML'97), pp

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Experiments Data for 1000, 2000, 4000 pairs ›Fixed Ratio (FR) with Graph Features ›FR without Graph Features ›Fixed Count (FC) with Graph Features ›FC without Graph Features WEKA was used to train classifiers FR/FC ›Train with 1:1 data test with random data FC/FC ›Test with 10-fold cross validation

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results [1] J48 – Atemporal data ›Predicting becomeFriends Class: yes iff u, v were not friends before T 0 and were friends after T 28 FR/FCFC/FC Graph Features (%)All Features (%)Graph Features (%)All Features (%) mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[2] Incremental time series task ›The realignment allows the inducer to learn temporal actions that may lead to changes in the friendship status of a pair. FR/FR - cross validation J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Features All Features

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[3] FR/FC – train with FR test with FC J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Features All Features

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Results[4] FC/FC – train with FR test with FC J48LogisticOneR mAcc.Prec.Rec.Acc.Prec.Rec.Acc.Prec.Rec. Graph Feat All Feat

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Conclusions We are able to learn a predictor for the time series problem. Handling imbalanced data: ›FR/FR – Does not represent the data ›FC/FC – Does not train a classifier very well ›FR/FC – Probably the most appropriate predictor High recall, low precision, and decent accuracy Future Work ›Better, more descriptive, time series features should be discovered. ›Apply prepossessing to the feature vector

Computing and Information Sciences Kansas State University ANNIE Conference November 10, 2008 Questions?