Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University,

Slides:



Advertisements
Similar presentations
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Advertisements

Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Confluence: Conformity Influence in Large Social Networks
Analysis and Modeling of Social Networks Foudalis Ilias.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
A Probabilistic Framework for Semi-Supervised Clustering
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
Markov Networks.
Belief Propagation on Markov Random Fields Aggeliki Tsoli.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Conditional Random Fields
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Models of Influence in Online Social Networks
1 Yuxiao Dong *, Jie Tang $, Tiancheng Lou #, Bin Wu &, Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction.
Computer vision: models, learning and inference
Social Network Analysis via Factor Graph Model
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
1 From Sentiment to Emotion Analysis in Social Networks Jie Tang Department of Computer Science and Technology Tsinghua University, China.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Predicting Positive and Negative Links in Online Social Networks
Markov Random Fields Probabilistic Models for Images
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)
Measuring Behavioral Trust in Social Networks
Lecture 2: Statistical learning primer for biologists
Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Mining information from social media
1 CoupledLP: Link Prediction in Coupled Networks Yuxiao Dong #, Jing Zhang +, Jie Tang +, Nitesh V. Chawla #, Bai Wang* # University of Notre Dame + Tsinghua.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Progress Report ekker. Problem Definition In cases such as object recognition, we can not include all possible objects for training. So transfer learning.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Chapter 6 Activity Recognition from Trajectory Data Yin Zhu, Vincent Zheng and Qiang Yang HKUST November 2011.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
1 Zi Yang Tsinghua University Joint work with Prof. Jie Tang, Prof. Juanzi Li, Dr. Keke Cai, Jingyi Guo, Chi Wang, etc. July 21, 2011, CASIN 2011, Tsinghua.
Cross-lingual Knowledge Linking Across Wiki Knowledge Bases
Markov Networks.
Weakly Learning to Match Experts in Online Community
Markov Random Fields Presented by: Vladan Radosavljevic.
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Discriminative Probabilistic Models for Relational Data
Markov Networks.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Who Will Follow You Back? Reciprocal Relationship Prediction* 1 John Hopcroft, 2 Tiancheng Lou, 3 Jie Tang 1 Department of Computer Science, Cornell University, 2 Institute for Interdisciplinary Information Sciences, Tsinghua University 3 Department of Computer Science, Tsinghua University

Motivation Two kinds of relationships in social network,  one-way(called parasocial) relationship and,  two-way(called reciprocal) relationship Two-way(reciprocal) relationship  usually developed from a one-way relationship  more trustful. Try to understand(predict) the formation of two- way relationships  micro-level dynamics of the social network.  underlying community structure?  how users influence each other? after 3 days prediction reciprocal parasocial

100% 30% 1% 60% Example : real friend relationship On Twitter : Who Will Follow You Back? Ladygaga ? ? ? ? Shiteng Obama Huwei JimmyQiao

Several key challenges How to model the formation of two-way relationships?  SVM & LRC How to combine many social theories into the prediction model?

Outline Previous works Our approach Experimental results Conclusion & future works

Link prediction Unsupervised link prediction  Scores & intution, such as preferential attachment [N01]. Supervised link prediction  supervised random walks [BL11].  logistic regression model to predict positive and negative links [L10]. Main differences:  We predict a directed link instead of only handles undirected social networks.  Our model is dynamic and learned from the evolution of the Twitter network.

Social behavior analysis Existing works on social behavior analysis:  The difference of the social influence on difference topics and to model the topic- level social influence in social networks. [T09]  How social actions evolve in a dynamic social network? [T10] Main differences:  The proposed methods in previous work can be used here  but the problem is fundamentally different.

Twitter study The twitter network.  The topological and geographical properties. [J07]  Twittersphere and some notable properties, such as a non-power-law follower distribution, and low reciprocity. [K10] The twitter users.  Influential users.  Tweeting behaviors of users. The tweets.  Utilize the real-time nature to detect a target event. [S10]  TwitterMonitor, to detect emerging topics. [M10]

Outline Previous works Our approach Experimental results Conclusion & future works

Factor graph model Problem definition  Given a network at time t, i.e., G t = (V t, E t, X t, Y t )  Variables y are partially labeled.  Goal : infer unknown variables. Factor graph model  P(Y | X, G) = P(X, G|Y) P(Y) / P(X, G) = C 0 P(X | Y) P(Y | G)  In P(X | Y), assuming that the generative probability is conditionally independent,  P(Y | X, G) = C 0 P(Y | G)ΠP(x i |y i )  How to deal with P(Y | G) ?

Random Field Given an undirected graph G=(V, E) such that Y={Y v |v ∈ V}, if the probability of Y v given X and those random variables corresponding to nodes neighboring v in G. Then (X, Y) is a conditional random field. undirected graphical model globally conditioned on X

Conditional random field(CRF) Model them in a Markov random field, by the Hammersley-Clifford theorem,  P(x i |y i ) = 1/Z 1 * exp {Σα j f j (x ij, y i )}  P(Y|G) = 1/Z 2 * exp {Σ c Σ k μ k h k (Y c )}  Z 1 and Z 2 are normalization factors. So, the likelihood probability is :  P(Y | X, G) = 1/Z 0 * exp {Σα j f j (x ij, y i ) + Σ c Σ k μ k h k (Y c )}  Z 0 is normalization factors.

Maximize likelihood Objective function  O( θ ) = log P θ (Y | X, G) = Σ i Σ j α j f j (x ij, y i ) + ΣΣμ k h k (Y c ) – log Z Learning the model to  estimate a parameter configuration θ= { α, μ } to maximize the objective function :  that is, the goal is to compute θ * = argmax O(θ)

Learning algorithm Goal : θ * = argmax O(θ) Newton methold. The gradient of each μ k with regard to the objective function.  dθ/ dμ k = E[h k (Y c )] – E Pμ k (Yc|X, G) [h k (Y c )]

Learning algorithm One challenge : how to calculate the marginal distribution P μ k (y c |x, G).  Intractable to directly calculate the marginal distribution.  Use approximation algorithm : Belief propagation.

Belief propagation Sum-product  The algorithm works by passing real values functions called “messages” along the edges between the nodes.  Extend edges to triangles

Belief propagation A “message” from a variable node v to a factor node u is μ v->u (x v ) = Π u* ∈ N(v)\{u} μ u*->v (x v ) A “message” from a factor node u to a variable node v is μ u->v (x u ) = Σ x ’ u x ’ v =x v Π u* ∈ N(u)\{v} μ v*->u (x ’ u )

Learning algorithm(TriFG model) Input : network G t, learning rateη Output : estimated parametersθ Initalize θ= 0; Repeat Perform LBP to calculate marginal distribution of unknown variables P(y i |x i, G); Perform LBP to calculate marginal distribution of triad c, i.e. P(y c |X c, G); Calculate the gradient of μ k according to : dθ/ d μ k = E[h k (Y c )] – E Pμ k (Yc|X, G) [h k (Y c )] Update parameter θ with the learning rate η: θ new = θ old + ηd θ Until Convergence;

LocalGlobal Prediction features Geographic distance  Global vs Local Homophily  Link homophily  Status homophily Implicit structure  Retweet or reply  Retweeting seems to be more helpful Structural balance  Two-way relationships are balanced (88%),  But, one-way relationships are not (only 29%). Users who share common links will have a tendency to follow each other. Elite users have a much stronger tendency to follow each other (A) and (B) are balanced, but (C) and (D) are not.

Our approach : TriFG TriFG model  Features based on observations  Partially labeled  Conditional random field  Triad correlation factors

Outline Previous works Our approach Experimental results Conclusion & future works

Data collection Huge sub-network of twitter  13,442,659 users and 56,893,234 following links.  Extracted 35,746,366 tweets. Dynamic networks  With an average of 728,509 new links per day.  Averagely 3,337 new follow-back links per day.  13 time stamps by viewing every four days as a time stamp

Prediction performance Baseline algorithms  SVM & LRC & CRF Accurately infer 90% of reciprocal relationships in twitter. DataAlgotithmPrecisionRecallF1MeasureAccuracy Test Case 1 SVM LRC CRF TriFG Test Case 2 SVM LRC CRF TriFG

Convergence property BLP-based learning algorithm can converge in less than 10 iterations. After only 7 learning iterations, the prediction performance of TriFG becomes stable.

Effect of Time Span Distribution of time span in which follow back action is performed.  60% of follow-backs are performed in the next time stamp,  37% of follow-backs would be still performed in the following 3 time stamps. How different settings of the time span.  The prediction performance of TriFG drops sharply when setting the time span as two or less.  The performance is acceptable, when setting it as three time stamps.

Factor contribution analysis We consider five different factor functions  Geographic distance(G)  Link homophily(L)  Status homophily(S)  Implicit network correlation(I)  Structural balance correlation(B)

Outline Previous works Our approach Experimental results Conclusion & future works

Conclusion Reciprocal relationship prediction in social network Incorporates social theories into prediction model. Several interesting phenomena.  Elite users tend to follow each other.  Two-way relationships on Twitter are balanced, but one-way relationships are not.  Social networks are going global, but also stay local.

Future works Other social theories for reciprocal relationship prediction. User feedback. Incorporating user interactions. Building a theory for different kinds of networks.

Thanks! Q & A

Reference [BL11] L.Backstrom and J.Leskovec. Supervised random walks : predicting and recommending links in social networks. In WSDM’11 [C10] D.J.Crandall, L.Backstrom, D. Cosley, S.Suri, D.Huttenlocher, and J. Kleinberg. Inferring social ties from geographic coincidences. PNAS, Dec [W10] C.Wang, J. Han, Y.Jia, J.Tang, D.Zhang, Y. Yu and J.Guo. Mining advisor-advisee relationships from research publication networks. In KDD’10. [N01]M.E.J. Newman. Clustering and preferential attachment in growing networks. Phys. Rev. E, 2001 [L10] J.Leskovec, D.Huttenlocher, and J.Kleinberg. Predicting positive and negative links in online social networks. In WWW10. [T10] C.Tan, J. Tang, J. Sun, Q.Lin, and F.Wang. Social action tracking via noise tolerant time-varying factor graphs. In KDD10 [T09] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD09.

Reference [J07]A. Java, X.Song, T.Finin, and B.L. Tseng. Why we twitter : An analysis of a microblogging community. In KDD2007. [K10]H. Kwak, C.Lee, H.Park, and S.B. Moon. What is twitter, a social network or a news media? In WWW2010. [M10]M.Mathioudakis and N.Koudas. Twittermonitor : trend detection over the twitter stream. In SIGMOD10. [S10]T. Sakaki, M. Okazaki, and Y.Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW10.