On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Viral Marketing – Learning Influence Probabilities.
Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Spread of Influence through a Social Network Adapted from :
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Maximizing the Spread of Influence through a Social Network
Suqi Cheng Research Center of Web Data Sciences & Engineering
In Search of Influential Event Organizers in Online Social Networks
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Brian Baingana, Gonzalo Mateos and Georgios B. Giannakis Dynamic Structural Equation Models for Tracking Cascades over Social Networks Acknowledgments:
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.
CIKM’2008 Presentation Oct. 27, 2008 Napa, California
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Planning under Uncertainty
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Network A/B Testing: From Sampling to Estimation
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Maximizing Product Adoption in Social Networks
Models of Influence in Online Social Networks
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 David Balduzzi 1 Bernhard Schölkopf 1 UNCOVERING THE TEMPORAL DYNAMICS.
Personalized Influence Maximization on Social Networks
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Information Spread and Information Maximization in Social Networks Xie Yiran 5.28.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Thang N. Dinh, Dung T. Nguyen, My T. Thai Dept. of Computer & Information Science & Engineering University of Florida, Gainesville, FL Hypertext-2012,
December 7-10, 2013, Dallas, Texas
ACM International Conference on Information and Knowledge Management (CIKM) Analysis of Physical Activity Propagation in a Health Social Network.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
Online Social Networks and Media
Slides for “Data Mining” by I. H. Witten and E. Frank.
I NFORMATION C ASCADE Priyanka Garg. OUTLINE Information Propagation Virus Propagation Model How to model infection? Inferring Latent Social Networks.
On Bharathi-Kempe-Salek Conjecture about Influence Maximization Ding-Zhu Du University of Texas at Dallas.
Manuel Gomez Rodriguez Bernhard Schölkopf I NFLUENCE M AXIMIZATION IN C ONTINUOUS T IME D IFFUSION N ETWORKS , ICML ‘12.
Single-Pass Belief Propagation
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
1 1 MPI for Intelligent Systems 2 Stanford University Manuel Gomez Rodriguez 1,2 Bernhard Schölkopf 1 S UBMODULAR I NFERENCE OF D IFFUSION NETWORKS FROM.
F EATURE -E NHANCED P ROBABILISTIC M ODELS FOR D IFFUSION N ETWORK I NFERENCE Stefano Ermon ECML-PKDD September 26, 2012 Joint work with Liaoruo Wang and.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Inferring Networks of Diffusion and Influence
Wenyu Zhang From Social Network Group
Nanyang Technological University
Independent Cascade Model and Linear Threshold Model
MEIKE: Influence-based Communities in Networks
Link Prediction and Network Inference
Learning Influence Probabilities In Social Networks
Independent Cascade Model and Linear Threshold Model
CASE − Cognitive Agents for Social Environments
Cost-effective Outbreak Detection in Networks
GANG: Detecting Fraudulent Users in OSNs
Independent Cascade Model and Linear Threshold Model
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012

Agenda  Part I. Motivation and Background  Part II. Learning Influence Model and Probabilities  Part III. Learning Social Rank and Hierarchy  Part IV. Research Challenges

Part I Motivation and Background

Social Influence is Everywhere  Stay connected, stay influenced [Nguyen, 2012]  Real-world story: 12K people, 50k links, medical records from 1997 to 2003 Obese Friend  57% increase in chances of obesity Obese Sibling  40% increase in chances of obesity Obese Spouse  37% increase in chances of obesity [Christakis and Fowler, New England Journal of Medicine, 2007]

Top Influencers (by Klout)

How Ranking and Influence Are Related?  Conventional beliefs Higher rank  more influence Higher rank  less response delay (e.g.: reply) Higher rank  more (quality) followers  How many of them are true?  What is the true underlying relationship?  The impact is big Devising a new influence model (with ranking) Improve influence maximization results Novel ranking algorithms

Influence Maximization (IM) Problem iPhone 5 is great

Independent Cascade (IC) Model  Spread probability associated with each edge  Influence spread = expected number of influenced nodes Seed

Traditional Solutions  As good as ~63% of the optimal solution  Problem Influence spread computation Too many evaluations after each iteration

Part II Learning Influence Models and Probabilities

Learning Influence Models  Where do the numbers come from?  Which propagation model is correct? LT, IC, N-IC, SIS, SIR, …  Real world social networks don’t have probabilities Can we learn the probs. from the action log?  Sometimes we don’t even know the social network Can we learn the social network too?  Influence probability does change over time How can we take time into account?

Naïve Weight Assignment Models [Nguyen & Zheng, ECML-PKDD 2012]

Weight Inference Problems

P2. Social Network is Not Given  Observe activation time E.g.: product purchase, blogs, virus infection  Assume Independent cascade model Probability of a successful activation decays (exponentially) with time [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

Cascade Generation Model [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010] c c c c e e f f e e f f c c b b a a b b a a a a b b d d tata tbtb tctc Δ1Δ1 Δ2Δ2 Δ3Δ3 Δ4Δ4 tete tftf

Likelihood of a Cascade  If u infected v in a cascade c, its transmission probability is: P c (u, v) ~ f(t v - t u ) with t v > t u and (u, v) are neighbors  To model that in reality any node v in a cascade can have been infected by an external influence m: P c (m, j) = ε  Prob. that cascade c propagates in a tree T: b b d d e e a a c c a a c c b b e e m m εε ε [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

Finding the Diffusion Network  There are many possible propagation trees: c: (a, 1), (c, 2), (b, 3), (e, 4)  Need to consider all possible propagation tree T supported by G  Likelihood of a set of cascades C on G:  Want to find: b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

An Alternative Formulation  We consider only the most likely tree  Maximum log-likelihood for a cascade c under a graph G:  Log-likelihood of G given a set of cascades C:  Problem is NP-Hard (Max-k-Cover)  Devise an algorithm to solve nearly optimal in O(N 2 ) [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

P3. Social Network is Given  Input data: (1) social graph and (2) action log of past propagations  Find: propagation weight on edges

Constant Weight Model  Assume independent cascade model  Assume weights remain constant over time  Given Network graph G D(0), D(1), … D(t)  newly activated nodes at time t  For a link (v,w), node w is activated at (t+1) with prob [Saito et al., KES 2008] Parent set Diffusion prob Current active set

Constant Weight Model [Saito et al., KES 2008] Success probFailure prob

Static Models [Goyal, Bonchi, & Lakshmanan, WSDM 2010] Actions spread u  v Total actions of u Actions of either u or v

Time Varying Models [Goyal, Bonchi, & Lakshmanan, WSDM 2010] Max strength of u influence v mean life time (parameter) Time difference

Data-based Influence Maximization

Why Learning from Data Matters [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Why Learning from Data Matters

Direct Mining THE SPARSITY ISSUE [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Credit Distribution Model [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Credit Distribution Model [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Key Takeaways  Influence network and weights not always available  Can be learned from the action log [Gomez-Rodriguez et al. 2010] Infer social network [Saito et al. 2008] Infer edge weights using EM [Goyal et al. 2010] Infer static and time-conscious model [Goyal et al. 2012] IM directly from the action log  Watch out for the sparsity issue

Part III Learning Social Rank and Hierarchy

Social Rank and Hierarchy  Hierarchical vs. non-hierarchical networks E.g.: corporation network vs. Twitter  Real world social networks don’t have rank (or do they?) Can we study the ranking of each individual? Do current ranking systems correct?  What is the best way to rank people on social networks? # followers, influenceability, actions, recommendations, acknowledgement?  What kind of data is needed?

PageRank  Named after Larry Page (not because it ranks pages!)  The importance of a page is given by the importance of the pages that link to it  Two steps calculation Initialize same value for all pages Repeat until converge  Same concept can be applied for social ranking [Page & Brin, 1998] importance of page i pages j that link to page i number of outlinks from page j importance of page j

Finding Maximum Likelihood Hierarchy [Maiya & Berger-Wolf, CSE 2009]

Finding Maximum Likelihood Hierarchy  For any pair of (v,w), LL function for the weight:  LL function of the entire hierarchy:  Using Greedy to find the hierarchy H with highest LL score & its model M [Maiya & Berger-Wolf, CSE 2009] weight(v,w) Prob. of interaction under the given model

Finding Maximum Likelihood Hierarchy  Weight(x,y) = google “x told y” High accuracy Small scale data experiment [Maiya & Berger-Wolf, CSE 2009]

Hierarchy by Network Analysis [Rowe, Creamer, Hershkop, & Stolfo, SNA-KDD 2007]

Hierarchy by Network Analysis  Inferred hierarchy is not even close to the ground truth [Rowe, Creamer, Hershkop, & Stolfo, SNA-KDD 2007]

Hierarchy by Social Network Direction [Gupte et al., WWW 2011]

Hierarchy Score of Different Networks [Gupte et al., WWW 2011]

Finding the Rank  Find rank r to maximize the hierarchy score  Modeled as an integer program problem  Form a dual problem  Problem solved [Gupte et al., WWW 2011]

Key Takeaways  Hierarchy affects social ranking  Many possible problem formulations and techniques Make observations and assumptions carefully  There is no ground truth on social ranking Obtaining a dataset with ranking is difficult Difficult to say one method outperforms another  Scalability is an important factor Should be considered when design a solution

Part IV Research Challenges

Data Availability  Data availability limits research  Often you have to pick two of those:  Data availability classification Proprietary, impossible or very hard to reproduce (e.g. shopping history)  increasingly being rejected in IR, DM communities Proprietary, reproducible (e.g. web crawl of a public website) Existing open dataset – extensively studied New open dataset

Value for Business and Social Sciences  Measuring effectiveness of influence and ranking is not easy in general Compare viral vs. traditional marketing? How does ranking help except for “showing off”?  Online data may be huge, but it is often neither representative nor complete Can someone prove the effectiveness of Obama’s 2012 presidential campaign by Twitter?  Offline data (human interaction) is difficult to obtain Also suffers from external influence (e.g. mass media, online …) Lab experiment?

Learn to Design for Virality  What makes a product/idea/technology viral? Role of content? Role of seeds? Other factors?  How can we artificially design something that goes viral or achieve high ranking?  What do we know about the factors behind successful viral phenomena (e.g. Gangnam style, Justin Beiber …) ?

Misc. Technical Challenges  Algorithmic challenge: O(n 2 ) algorithms are not feasible for large graph (e.g. n = 1 bil) Need near-linear time algorithms (O(n.log(n)) maybe?)  Many ranking systems exist Which one should we trust?  Dynamic factor of social networks Influenceability and rank changes over time  Competitive diffusion and ranking Measure the effect of adversaries?

Concluding Remarks  Great advances in theory, analysis, and algorithms  Many challenges exist down the line  Many problems are yet to be defined and solved  Big thanks if you haven’t fall asleep :)