Modeling the Spread of Influence on the Blogosphere Akshay Java, Pranam Kolari, Tim Finin, and Tim Oates UMBC Tech Report 04/12/06
Outline What is influence? Basic Influence Model Influence models for the blogosphere Results Conclusions
What is Influence? Main Entry: in·flu·ence Pronunciation: 'in-"flü-&n(t)s, esp Southern in-' Function: noun Etymology: Middle English, from Middle French, from Medieval Latin influentia, from Latin influent-, influens, present participle of influere to flow in, from in- + fluere to flow -- more at FLUID 1 a : an ethereal fluid held to flow from the stars and to affect the actions of humans b : an emanation of occult power held to derive from stars 2 : an emanation of spiritual or moral force 3 a : the act or power of producing an effect without apparent exertion of force or direct exercise of command b : corrupt interference with authority for personal gain 4 : the power or capacity of causing an effect in indirect or intangible ways : SWAY 5 : one that exerts influence - under the influence : affected by alcohol : DRUNK FLUID NOT This Kind of Influence! ;-)
Motivation Influence models studied for cocitation graphs David Kempe, Jon Kleinberg, Eva Tardos Maximizing the Spread of Influence through a Social Network, KDD 2003 Applies to blogs also. Recent Examples: Startups, Microsoft Origami, Walmart,DoD GOAL: Predict influential blogs Target nodes to help achieve a “Tipping Point” * * The Tipping Point: Malcolm Gladwell
Influence on the Blogosphere Post was Influenced by NPR, eWeek
Influence Models for the Blogosphere Blog GraphInfluence Graph /5 1/5 2/5 1/3 1 1/2 1 W u,v = C u,v / d v U V U links to V => U is Influenced by V
Basic Influence Models Linear Threshold Model Σ b vw ≥ θ v w is the active neighbor of v Cascade Model P vw - probability with which a node can activate each of its neighbors, independent of history. Influence Graph /5 1/5 2/5 1/3 1 1/2 1 θvθv Active Inactive
Node Selection Heuristics Inlinks Easily spammed Centrality Expensive to compute for every large graphs PageRank Requires link information However, is easy to compute Greedy Heuristic Computationally expensive However performs better
Effect of Splogs on Node Selection (indegree vs pagerank) Almost 54% of the links were from splogs/failed to splogs/failed!
Effect of Splogs on Inlinks rankURL#inlinks 1http:// 2http:// 3http:// 4http:// 5http://profiles.blogdrive.com1526 6http://michellemalkin.com1242 7http:// 8http://instapundit.com1187 9http://slashdot.org http:// 11http:// 12http://corner.nationalreview.com853 13http:// 14http:// 15http://espn-presents2003-world-seriesofpoker.blogspot.com711 16http://3-world-series-of-poker-online-3.blogspot.com711 17http://worldseries-of-poker-network-tv-show.blogspot.com711 18http://wsop2003.blogspot.com711 19http://wsop-bracelet1.blogspot.com711 20http://worldseries-poker.blogspot.com711 21http://worldseries-of-poker-official.blogspot.com711 22http://worldseries-of-poker-wsop.blogspot.com711 23http://world-series-of-poker-nocd-patch66.blogspot.com711 24http://4-world-series-of-poker-past-winners.blogspot.com711 25http://7-wsop-games-7.blogspot.com711 Tightly Knit Community of Splog
Influence Models (without splog detection) Number of nodes selected
Influence Models (After splog removal)
Influence Models (w.r.t. Technorati Ranks)
Conlusions Influence models can be applied to blogs not just cocitation graphs Splogs are a problem Greedy heuristics work well, pagerank is an inexpensive approximation
Ideas for CIKM 06 Good or bad influence? Associating sentiment with links. Finding influential blogs for a topic. (SVM accuracy %) Community structure of blogs.
Questions Comments/ Feedback? Thanks! Acknowledgement: Buzzmetrics/Blogpulse for the dataset.