On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012.

On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012

Agenda  Part I. Motivation and Background  Part II. Learning Influence Model and Probabilities  Part III. Learning Social Rank and Hierarchy  Part IV. Research Challenges

Part I Motivation and Background

Social Influence is Everywhere  Stay connected, stay influenced [Nguyen, 2012]  Real-world story: 12K people, 50k links, medical records from 1997 to 2003 Obese Friend  57% increase in chances of obesity Obese Sibling  40% increase in chances of obesity Obese Spouse  37% increase in chances of obesity [Christakis and Fowler, New England Journal of Medicine, 2007]

Top Influencers (by Klout)

How Ranking and Influence Are Related?  Conventional beliefs Higher rank  more influence Higher rank  less response delay (e.g.: email reply) Higher rank  more (quality) followers  How many of them are true?  What is the true underlying relationship?  The impact is big Devising a new influence model (with ranking) Improve influence maximization results Novel ranking algorithms

Influence Maximization (IM) Problem iPhone 5 is great

Independent Cascade (IC) Model  Spread probability associated with each edge  Influence spread = expected number of influenced nodes 0.6 0.4 0.7 0.2 Seed

Traditional Solutions  As good as ~63% of the optimal solution  Problem Influence spread computation Too many evaluations after each iteration

Part II Learning Influence Models and Probabilities

Learning Influence Models  Where do the numbers come from?  Which propagation model is correct? LT, IC, N-IC, SIS, SIR, …  Real world social networks don’t have probabilities Can we learn the probs. from the action log?  Sometimes we don’t even know the social network Can we learn the social network too?  Influence probability does change over time How can we take time into account?

Naïve Weight Assignment Models [Nguyen & Zheng, ECML-PKDD 2012]

Weight Inference Problems

P2. Social Network is Not Given  Observe activation time E.g.: product purchase, blogs, virus infection  Assume Independent cascade model Probability of a successful activation decays (exponentially) with time [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

Cascade Generation Model [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010] c c c c e e f f e e f f c c b b a a b b a a a a b b d d tata tbtb tctc Δ1Δ1 Δ2Δ2 Δ3Δ3 Δ4Δ4 tete tftf

Likelihood of a Cascade  If u infected v in a cascade c, its transmission probability is: P c (u, v) ~ f(t v - t u ) with t v > t u and (u, v) are neighbors  To model that in reality any node v in a cascade can have been infected by an external influence m: P c (m, j) = ε  Prob. that cascade c propagates in a tree T: b b d d e e a a c c a a c c b b e e m m εε ε [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

Finding the Diffusion Network  There are many possible propagation trees: c: (a, 1), (c, 2), (b, 3), (e, 4)  Need to consider all possible propagation tree T supported by G  Likelihood of a set of cascades C on G:  Want to find: b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e b b d d e e a a c c a a c c b b e e [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

An Alternative Formulation  We consider only the most likely tree  Maximum log-likelihood for a cascade c under a graph G:  Log-likelihood of G given a set of cascades C:  Problem is NP-Hard (Max-k-Cover)  Devise an algorithm to solve nearly optimal in O(N 2 ) [Gomez-Rodriguez, Leskovec, & Krause, KDD 2010]

P3. Social Network is Given  Input data: (1) social graph and (2) action log of past propagations  Find: propagation weight on edges

Constant Weight Model  Assume independent cascade model  Assume weights remain constant over time  Given Network graph G D(0), D(1), … D(t)  newly activated nodes at time t  For a link (v,w), node w is activated at (t+1) with prob [Saito et al., KES 2008] Parent set Diffusion prob Current active set

Constant Weight Model [Saito et al., KES 2008] Success probFailure prob

Static Models [Goyal, Bonchi, & Lakshmanan, WSDM 2010] Actions spread u  v Total actions of u Actions of either u or v

Time Varying Models [Goyal, Bonchi, & Lakshmanan, WSDM 2010] Max strength of u influence v mean life time (parameter) Time difference

Data-based Influence Maximization

Why Learning from Data Matters [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Why Learning from Data Matters

Direct Mining THE SPARSITY ISSUE [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Credit Distribution Model [Goyal, Bonchi, & Lakshmanan, VLDB 2012]

Key Takeaways  Influence network and weights not always available  Can be learned from the action log [Gomez-Rodriguez et al. 2010] Infer social network [Saito et al. 2008] Infer edge weights using EM [Goyal et al. 2010] Infer static and time-conscious model [Goyal et al. 2012] IM directly from the action log  Watch out for the sparsity issue

Part III Learning Social Rank and Hierarchy

Social Rank and Hierarchy  Hierarchical vs. non-hierarchical networks E.g.: corporation network vs. Twitter  Real world social networks don’t have rank (or do they?) Can we study the ranking of each individual? Do current ranking systems correct?  What is the best way to rank people on social networks? # followers, influenceability, actions, recommendations, acknowledgement?  What kind of data is needed?

PageRank  Named after Larry Page (not because it ranks pages!)  The importance of a page is given by the importance of the pages that link to it  Two steps calculation Initialize same value for all pages Repeat until converge  Same concept can be applied for social ranking [Page & Brin, 1998] importance of page i pages j that link to page i number of outlinks from page j importance of page j

Finding Maximum Likelihood Hierarchy [Maiya & Berger-Wolf, CSE 2009]

Finding Maximum Likelihood Hierarchy  For any pair of (v,w), LL function for the weight:  LL function of the entire hierarchy:  Using Greedy to find the hierarchy H with highest LL score & its model M [Maiya & Berger-Wolf, CSE 2009] weight(v,w) Prob. of interaction under the given model

Finding Maximum Likelihood Hierarchy  Weight(x,y) = google “x told y” High accuracy Small scale data experiment [Maiya & Berger-Wolf, CSE 2009]

Hierarchy by Email Network Analysis [Rowe, Creamer, Hershkop, & Stolfo, SNA-KDD 2007]

Hierarchy by Email Network Analysis  Inferred hierarchy is not even close to the ground truth [Rowe, Creamer, Hershkop, & Stolfo, SNA-KDD 2007]

Hierarchy by Social Network Direction [Gupte et al., WWW 2011]

Hierarchy Score of Different Networks [Gupte et al., WWW 2011]

Finding the Rank  Find rank r to maximize the hierarchy score  Modeled as an integer program problem  Form a dual problem  Problem solved [Gupte et al., WWW 2011]

Key Takeaways  Hierarchy affects social ranking  Many possible problem formulations and techniques Make observations and assumptions carefully  There is no ground truth on social ranking Obtaining a dataset with ranking is difficult Difficult to say one method outperforms another  Scalability is an important factor Should be considered when design a solution

Part IV Research Challenges

Data Availability  Data availability limits research  Often you have to pick two of those:  Data availability classification Proprietary, impossible or very hard to reproduce (e.g. shopping history)  increasingly being rejected in IR, DM communities Proprietary, reproducible (e.g. web crawl of a public website) Existing open dataset – extensively studied New open dataset

Value for Business and Social Sciences  Measuring effectiveness of influence and ranking is not easy in general Compare viral vs. traditional marketing? How does ranking help except for “showing off”?  Online data may be huge, but it is often neither representative nor complete Can someone prove the effectiveness of Obama’s 2012 presidential campaign by Twitter?  Offline data (human interaction) is difficult to obtain Also suffers from external influence (e.g. mass media, online …) Lab experiment?

Learn to Design for Virality  What makes a product/idea/technology viral? Role of content? Role of seeds? Other factors?  How can we artificially design something that goes viral or achieve high ranking?  What do we know about the factors behind successful viral phenomena (e.g. Gangnam style, Justin Beiber …) ?

Misc. Technical Challenges  Algorithmic challenge: O(n 2 ) algorithms are not feasible for large graph (e.g. n = 1 bil) Need near-linear time algorithms (O(n.log(n)) maybe?)  Many ranking systems exist Which one should we trust?  Dynamic factor of social networks Influenceability and rank changes over time  Competitive diffusion and ranking Measure the effect of adversaries?

Concluding Remarks  Great advances in theory, analysis, and algorithms  Many challenges exist down the line  Many problems are yet to be defined and solved  Big thanks if you haven’t fall asleep :)

On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012.

Similar presentations

Presentation on theme: "On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012.

Similar presentations

Presentation on theme: "On Ranking and Influence in Social Networks Huy Nguyen Lab seminar November 2, 2012."— Presentation transcript:

Similar presentations

About project

Feedback