Information Diffusion in Social Media

Slides:



Advertisements
Similar presentations
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Advertisements

Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
THE EMERGING SCIENCE OF NETWORKS Duncan Watts Yahoo! Labs.
1 KSIDI June 9, 2010 Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Max Planck Institute for Software Systems (MPI-SWS)
Self-introduction Name:  鲍鹏 (Peng Bao) Research Interests:  Popularity Prediction, Information Diffusion, Social Network , etc… Grade:  In the third.
Spread of Influence through a Social Network Adapted from :
Stelios Lelis UAegean, FME: Special Lecture Social Media & Social Networks (SM&SN)
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Nodes, Ties and Influence
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
TWITTER EFFECT: A S OCIAL N ETWORK ? OR A N EWS MEDIA ? Presented by: Bohyun Kim Under the Guidance of: Augustin Chaintreau.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
CS 599: Social Media Analysis University of Southern California1 Information Diffusion Kristina Lerman University of Southern California.
How to Analyse Social Network? : Part 2 Power Laws and Rich-Get-Richer Phenomena Thank you for all referred contexts and figures.
CS 599: Social Media Analysis University of Southern California1 Influence in Social Media Kristina Lerman University of Southern California.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Models of Influence in Online Social Networks
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Online Social Networks and Media Epidemics and Influence.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Contagion in Networks Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
V5 Epidemics on networks
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
TWITTER What is Twitter, a Social Network or a News Media? Haewoon Kwak Changhyun Lee Hosung Park Sue Moon Department of Computer Science, KAIST, Korea.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Microblogs: Information and Social Network Huang Yuxin.
School of Computer Science Carnegie Mellon University 1 The dynamics of viral marketing Jure Leskovec, Carnegie Mellon University Lada Adamic, University.
Using a Model of Social Dynamics to Predict Popularity of News Kristina Lerman Tad Hogg USC Information Sciences Institute HP Labs WWW 2010.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Winner-takes-all: Competing Viruses or Ideas on fair-play Networks B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos Carnegie Mellon University,
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
ANALYZING THE SOCIAL WEB an introduction 1. OUTLINE 1.Introduction 2.Network Structure and Measures 3.Social Information Filtering 2.
The Emergence of Conventions in Online Social ‡ MPI-SWS * KAIST † Stevens Institute.
 Probability in Propagation. Transmission Rates  Models discussed so far assume a 100% transmission rate to susceptible individuals (e.g. Firefighter.
Tracking Critical-Mass Outbreaks in Social Contagions (FA DEF)
Structual Trend Analysis for Online Social Networks Ceren Budak Divyakant Agrawal Amr El Abbadi Science,UCSB SantaBarbara,USA Reporter: Qi Liu.
CS 590 Term Project Epidemic model on Facebook
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Networks are connections and interactions. Networks are present in every aspect of life. Examples include economics/social/political sciences. Networks.
Steffen Staab 1WeST Web Science & Technologies University of Koblenz ▪ Landau, Germany Network Theory and Dynamic Systems Cascading.
Internet Economics כלכלת האינטרנט Class 9 – social networks (based on chapter 3 from Easely & Kleinberg’s books) 1.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
Using a model of Social Dynamics to Predict Popularity of News Kristina Lerman, Tad Hogg USC Information Sciences Institute, Institute for Molecular Manufacturing.
Contagion in Networks Networked Life NETS 112 Fall 2015 Prof. Michael Kearns.
Social networks that matter: Twitter under the microscope
Inferring Networks of Diffusion and Influence
Stochastic Models of User-Contributory Web Sites
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
What Stops Social Epidemics?
User Joining Behavior in Online Forums
Networked Life NETS 112 Fall 2018 Prof. Michael Kearns
Networked Life NETS 112 Fall 2017 Prof. Michael Kearns
Networked Life NETS 112 Fall 2016 Prof. Michael Kearns
A Network Science Approach to Fake News Detection on Social Media
Diffusion in Networks Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale 1/17/2019.
Networked Life NETS 112 Fall 2019 Prof. Michael Kearns
Presentation transcript:

Information Diffusion in Social Media Kristina Lerman University of Southern California Access to data allows us to ask new questions, empirically measure effects CS 599: Social Media Analysis University of Southern California

Information diffusion on Twitter follower graph

Diffusion on networks The spread of disease, ideas, behaviors, … on a network can be described as a contagion process where an active node (infected/informed/adopted) activates its non-active neighbors with some probability … creates a cascade on a network How large do cascades become? What determines their growth?

Diffusion models Complex response: infection requires multiple exposures. Non-monotonic exposure response Exposure response function Threshold model Complex contagion 1 1 infection prob. infection prob. fiki number infected neighbors number infected neighbors

Epidemic diffusion model Infected nodes propagate contagion to susceptible neighbors with probability m (transmissibility or virality of contagion) Exposure response function 1 A popular metaphor used to study information spread infection prob. infected exposed number infected neighbors

Epidemic threshold Epidemic threshold t: For m < t, localized cascades (epidemic dies out) For m > t, global cascades Epidemic threshold depends on topology only: largest eigenvalue of adjacency matrix of the network True for any network Cascade size N Epidemic threshold Transmissibility, m

Daniel M Romero, Brendan Meeder and Jon Kleinberg Differences in the Mechanics of Information Diffusion across Topics: Idioms, Political Hashtags and Complex Contagion on Twitter Daniel M Romero, Brendan Meeder and Jon Kleinberg Presentation by Aswin Rajkumar

Motivation and Contribution Information Diffusion and Topics - Eg: Controversial political topics have high information diffusion. - Scientific study of the variation in diffusion mechanics across topics. Contribution of the paper - Empirical analysis of real world data - Observation that the mechanics of spread can be defined using two variables, stickiness and persistence. - Confirmation of sociological theories found in the offline world – diffusion of innovations

The Study – How? Twitter – Dataset, a snapshot covering a large number of tweets over a period of several months (Aug 09 to Jan 10) 3 billion messages from over 60 million users #Hashtag – Tokens, Top 500 Hashtags @Mention – Network, Neighbor Set t mentions from X to Y, t = 3 Why? Shows X’s attention to Y.

The Study – What? Adoption and Spread of Hashtags - Diffusion Topics – Politics, Celebrity, Music, Movies, Games, Idioms, Sports and Technology Stickiness - the probability that a piece of information will pass from a person who knows or mentions it to another person who is exposed to it. Persistence and “Complex Contagion”, a principle from sociology. Persistence - the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects on adoption. Rate of decay.

Complex Contagion Complex contagion refers to the phenomenon in social networks in which multiple sources of exposure to an innovation are required before an individual adopts the change of behavior. - Wikipedia

P(K) Stickiness Persistence

Analysis – Stickiness and Persistence Take the top 500 hashtags Classify them into 8 topics or categories Construct p(k) curves for each hashtag and average them separately within each category Compare the shapes Political Hashtags – High Stickiness and Persistence Twitter Idioms – High Stickiness, Low Persistence #mw2, #mafiawars #lost, #newmoon #mj, #brazilwantsjb #pandora, #thisiswar #obama, #hcr #cricket, #nhl #photoshop, #digg

Twitter Idioms #cantlivewithout #musicmonday #iloveitwhen #followfriday

Analysis – Subgraph Structure Interconnections among early adopters Subgraphs for political hashtags - High in-degree, large number of triangles. Tie Strength – Strong, Weak. Credit : Bridge-talent.com

Exposure Curve - Definitions K-exposed – A user is k-exposed to a tag h if he has not used h, but is connected to k other users who have used h in the past. What’s the probability that a k-exposed user u will use hashtag h in the future? 1) Ordinal Time Estimate Probability of a k-exposed user u using hashtag h before becoming k+1 exposed. P(k) = I(k) / E(k) E(k) – number of k-exposed users I(k) – number of k-exposed users who used h before becoming k+1 exposed. 2) Snapshot Estimate Similar, but based on time. E(k) – numer of users k-exposed at t1. I(k) – number of users k-exposed at t1 and used h before t2 P(k) = I(k) / E(k) -> Exposure Curve

Comparison Parameters Persistence Parameter F(P) = A(P) / R(P) A(P) – Area under P curve. R(P) – Area under the rectangle of length K and height max(P(k)) Curve comparisons Increases rapidly and falls vs Increases slowly and saturates Increases slowly and saturates vs Rapid Increase Stickiness Parameter M(P) = Max(P(K))

Plots F(P) = A(P) / R(P) -> Persistence Parameter M(P) = Max(P(K)) -> Stickiness Parameter

Improvements and Related Work @Mention network is not very representative. Also, attention should be from Y to X. Considers only average persistence. Median and variance should be analyzed too. Other types of networks. Eg: Blogs. [Gruhl, Guha, Nowell, Tomkins - Information Diffusion through Blogspace]. Influence on Online Behavior. Eg: Games. [Woo, Kang, Kim – The Contagion of Malicious Behaviors in Online Games] Network structure is dynamic in real life. [Bano, Holthoefer, Wang, Moreno, Bailon – Diffusion Dynamics with Changing Network Composition ]

Conclusion Hashtags of different topics exhibit different mechanics of spread. Politically controversial hashtags have the highest diffusion. Information diffusion depends on the probability of users adopting a hashtag after repeated exposure to it. Depends on the magnitude of the probabilities as well as the rate of decay Confirms the sociological theory of complex contagion Higher in-degree and stronger ties results in better spread.

Questions?

What Stops Social Epidemics? (Ver Steeg et al.) Why do information cascades in social media Grow quickly initially But remain much smaller than predicted by epidemic models? Information cascades differ from viral contagion: Response to repeated exposure is important on Digg (and Twitter) Drastically alters predictions about size of epidemics

Social news: Users submit or vote for (infected by) news stories Social network Users follow ‘friends’ to see Stories friends submit Stories friends vote for Trending stories Digg promotes most popular stories to its Top News page

How large are cascades in social media? Number of people who share a message (with a URL) Digg Twitter 3.5K URLs 258K users 1.7M edges 70K URLs 700K users 36M edges Most cascades less than 1% of total network size! [Lerman et al. “Social Contagion: An Empirical Study of Information Spread on Digg and Twitter Follower Graphs” arXiv:1202.3162]

Why are these cascades so small? Standard model of epidemic growth (Heterogenous mean field theory, SIR model, same degree distribution as Digg) Most cascades fall in this range Standard epidemic model, how should cascade sizes look as a function of lambda? First, we should have a threshold, predicted by several models. Now, look at cascade sizes, what does this tell us about transmissibility of our stories. (!) Transmissibility, m Transmissibility of almost all Digg stories fall within width of this line?!

Maybe graph structure is responsible? ← Mean field prediction (same degree dist.) ← Simulated cascades on a random graph with same degree dist. Simulated cascades on the observed Digg graph epidemic threshold First explanation might be graph structure. Mean field neglects rich cluster structure and finite graph effects. In this graph, we changed back from a log-log to see these small differences. We simulated cascades on graphs to see how structure affected cascade size. Finite, random graph, same dd. Slightly reduces cascade sizes. On a finite graph, unavoidable loops and clustering. On Digg graph, there is even more structure. Reduces threshold, smaller cascades. And yet, still doesn’t jibe with our observed cascade sizes… kL: We also observe the existence of a threshold of transmissibility both in case of random and actual graph, below which the cascades die out. Clustering leads to the lowering of this threshold. We observe that clustering does limit the size of the cascades with the cascade sizes on the random graph being bigger than in the actual digg graph with the same transmissibility. The golden line shows the expected size of cascades Using the heterogeneous mean field (HMF) theory which predicts the cascade size taking the log tail degree distribution into account in the limit of large graphs . For the random graph, the epidemic threshold and the cascade sizes are very close to what is predicted by the heterogeneous mean field theory on epidemic spread, shown by the golden line Because the randomized graph is still finite, some clustering inevitably occurs (it has a clustering coefficient of about 0.02), decreasing the cascade size from the HMF prediction. Transmissibility m clustering reduces epidemic threshold and cascade size, but not enough!

What about the spreading mechanism? Infected Not Infected ? If not structure, maybe the spreading mechanism has some effect? Very simple mechanism, but not without choices We have to decide what to do about repeat exposure. Is this a big effect?

Are repeat exposures a big effect? Yes, more than half of the users are exposed to the same information more than once! On this graph we have the probability of having exactly n friends voting. It’s on a log-log plot with a longish tail. Significant probability of having 10 friends voting. More than half are exposed more than once. Clearly, repeat exposures are important. So how do people respond?

How do people respond to repeated exposure? Exposure response Not much. We have similar results for Twitter ------- Also noted by Romero, et al, WWW 2011 The answer is not much. If we look at the probability to vote on a story given that n friends have voted one it, you see that having 30 friends voting doesn’t make you significantly more likely to vote than if you have 1 friend voting. In the ICM, for example, if the probability after one friend voting is lambda, there is an independent probability lambda for each subsequent friend voting. We’ve also noticed a deviation from ICM for our data on Twitter, and a similar observation has been reported. But, now the important question; what is the effect of this observation?

Big consequences for cascade growth Most people are exposed to a story more than once Repeated exposures have little effect Growth of epidemics is severely curtailed (especially compared to Ind. Cascade Model)

Weak response to repeated exposures suppresses outbreaks Take effect of repeat exposure into account: Actual Digg cascades Result of simulations Epidemic threshold unchanged Back to our graph of cascade size as a function of transmissibility. Now, we simulate cascades again, only taking into account the graph structure and fact that repeated exposure doesn’t increase prob vote Predicts threshold And match cascade size perfectly, orders of magnitude smaller than viral epidemic predictions. That’s really nice, essentially a zero parameter fit. But we can explain more than just cascade size. λ* m*, Transmissibility

How Limited Visibility and Divided Attention Constrain Social Contagion (Hodas & Lerman, 2012) Questions How do people respond to exposures to information by friends on social media? What role does content play in information diffusion? Findings Users have finite ability to process information Most recently received messages are retweeted, the rest are overlooked Highly connected users (hubs) are far less likely to retweet any message they receive than poorly connected people Reduced susceptibility of hubs to “infections” explains why cascades are small

Mechanics of information diffusion User must see an item and find it interesting before he/she can spread it (e.g., by retweeting it, voting for or liking it, …) See? Interesting? Respond Cognitive Tastes Retweet Interface Content

Cognitive factors: Position bias People pay more attention to items at the top of the screen or a list of items [Payne, The Art of Asking Questions (1951) ] [Buscher et al, CHI’09] [Counts & Fisher ICWSM’11] … limits how far down the list/page the user navigates

Measuring position bias Amazon Mechanical Turk experiments Users were asked to recommend science stories We controlled the order stories were presented to users Position bias: stories at top list positions received more recommendations Can we control user attention – through story ordering – so as to improve outcomes of peer recommendation? [Lerman & Hogg (2014) “Leveraging position bias to improve peer recommendation” in Plos One.

Position bias creates a “limited attention” prob. to view post position post visibility new post at top of user’s screen post near the top is most likely to be seen showVisibility[1, BaseStyle -> 12]

Position bias creates a “limited attention” … some time later: newer posts appear at the top prob. to view post position post is less likely to be seen showVisibility[21, BaseStyle -> 12]

Position bias and number of friends few friends many friends … some time later: newer posts appear at the top post is less likely to be seen showVisibility[21, BaseStyle -> 12] same age post is even less visible to a highly connected user

Friends are a source of distraction users with more friends are more active users with more friends are distracted by more content nf Limited attention makes hubs less susceptible to ‘infection’

Users retweet most recent messages high connectivity users “Time Response Function” low connectivity users Users retweet newest messages (at the top of their screen) Hubs are much less likely to retweet an older message

Does content matter? visibility probability to tweet a message “virality” Estimated virality

Do “viral” messages spread farther? ln(“virality”) … “viral” messages can reach many or few people

How do people respond to multiple exposures? Exposure response Number of tweeting friends Is this evidence for complex contagion?

“Complex contagion”- artifact of heterogeneity low connectivity users high connectivity users Breaking down exposure response by different sub-populations, separated according to number of friends they follow, reveals simple, monotonic response

Summary “A meme is not a virus” Information spread ≠ Disease spread Big consequences for modeling information spread in social media Highly connected people (hubs) act as fire walls to information spread They have a hard time finding messages in their stream  People have a finite capacity to process information; the more messages they receive, the less likely they are to respond to any given one Information overload actually reduces the size of information cascades