Information Diffusion Mary McGlohon CMU 10-802 3/23/10.

Slides:



Advertisements
Similar presentations
Probability in Propagation
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
M6D Targeting Model - paper reading 7/23/2014.
Analysis and Modeling of Social Networks Foudalis Ilias.
School of Information University of Michigan Network resilience Lecture 20.
Maximizing the Spread of Influence through a Social Network
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.
Advanced Topics in Data Mining Special focus: Social Networks.
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Hasan T Karaoglu. Introduction Blogs are different! Methods are different! Contents are different! Some methods on Some Content of Some Blogs Discussion.
Spring INTRODUCTION There exists a lot of methods used for identifying high risk locations or sites that experience more crashes than one would.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Cascading Behavior in Large Blog Graphs Patterns and a Model Leskovec et al. (SDM 2007)
Internet Intrusions: Global Characteristics and Prevalence Presented By: Elliot Parsons Using slides from Vinod Yegneswaran’s presentation at SIGMETRICS.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
U. Michigan participation in EDIN Lada Adamic, PI E 2.1 fractional immunization of networks E 2.1 time series analysis approach to correlating structure.
1 Exploring Blog Networks Patterns and a Model for Information Propagation Mary McGlohon In collaboration with Jure Leskovec, Christos Faloutsos Natalie.
Slides 13b: Time-Series Models; Measuring Forecast Error
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Models of Influence in Online Social Networks
Information Networks Power Laws and Network Models Lecture 3.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Jure Leskovec, CMU Eric Horwitz, Microsoft Research.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Information Spread and Information Maximization in Social Networks Xie Yiran 5.28.
Directed-Graph Epidemiological Models of Computer Viruses Presented by: (Kelvin) Weiguo Jin “… (we) adapt the techniques of mathematical epidemiology to.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Microsoft Instant Messenger Communication Network How does the world communicate? Jure Leskovec Machine Learning Department
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.
Online Social Networks and Media
1 Graph mining techniques applied to blogs Mary McGlohon Seminar on Social Media Analysis- Oct
School of Information University of Michigan SI 614 Livejournal Lecture 23.
Community Structure and Information Flow in Usenet: Improving Analysis with a Thread Ownership Model Mary McGlohon, Carnegie Mellon* Matthew Hurst, Microsoft.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
 Probability in Propagation. Transmission Rates  Models discussed so far assume a 100% transmission rate to susceptible individuals (e.g. Firefighter.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CS 590 Term Project Epidemic model on Facebook
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Siddhartha Gunda Sorabh Hamirwasia.  Generating small world network model.  Optimal network property for decentralized search.  Variation in epidemic.
1 Lecture 16 Epidemics University of Nevada – Reno Computer Science & Engineering Department Fall 2015 CS 791 Special Topics: Network Architectures and.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
The simultaneous evolution of author and paper networks
Inferring Networks of Diffusion and Influence
What Stops Social Epidemics?
Worm Origin Identification Using Random Moonwalks
Lecture 13 Network evolution
Network Screening & Diagnosis
Large Graph Mining: Power Tools and a Practitioner’s guide
Lecture 21 Network evolution
Malik Magdon-Ismail, Konstantin Mertsalov, Mark Goldberg
Navigation and Propagation in Networks
Presentation transcript:

Information Diffusion Mary McGlohon CMU /23/10

Outline Intro: Models for diffusion ▫Epidemiological: SIS/SIR/SIRS ▫Threshold models Case studies ▫SIR: Info diffusion in blogs ▫SIS: Cascades in blogs ▫Timing: Cascades in chain letters ▫A closer look: Network-based Marketing

Epidemiological: SIS Susceptible, Infected, Susceptible ▫Infected for t I timesteps ▫While infected, transmits with probability  ▫After t I steps, returns to susceptible

Epidemiological: SIR Susceptible, Infected, Removed ▫Infected for t I timesteps ▫While infected, transmits with probability  ▫After t I steps, goes to removed/recovered

Epidemiological: SIRS Susceptible, Infected, Removed, Susceptible ▫Combination of SIS+SIR ▫After t I steps, goes to removed/recovered ▫After t R steps, returns to susceptible

Epidemiological: Networks Historically, SIS/SIR assumed a person could infect anybody else, full clique. There is an epidemic threshold in SIS. For random power-law networks, threshold=0 [Pastor-Satorras+Vespignani] ▫(But not for PL networks with high clustering coefficients [Egu´ıluz and Klemm])

Threshold Models Each node in network has weighted threshold If adopted neighbors reaches threshold, the node adopts.

Outline Intro: Models for diffusion ▫Epidemiological: SIS/SIR/SIRS ▫Threshold models Case studies ▫SIR: Info diffusion in blogs ▫SIS: Cascades in blogs ▫Timing: Cascades in chain letters ▫A closer look: Network-based Marketing

Info Diffusion in Blogs D. Gruhl, R. Guha, Liben D. Nowell, A. Tomkins. Information Diffusion Through Blogspace. In WWW '04 (2004). Goal: How do topics trend in blogs, and how can we model diffusion of topics?

Info Diffusion in Blogs Data: Crawled 11K blogs, 400K posts. Found 34o topics: ▫apple arianna ashcroft astronaut blair boykin bustamante chibi china davis diana farfarello guantanamo harvard kazaa longhorn schwarzenegger udell siegfried wildfires zidane gizmodo microsoft saddam

Info Diffusion in Blogs Topics = Chatter + Spikes ▫Chatter: Alzheimer ▫Spike: Chibi ▫Spiky Chatter: Microsoft

Info Diffusion in Blogs Modeled as SIR ▫Some set of authors is infected to write about a topic ▫Then propagate, as others write new posts on that topic ▫Measure the topic over time and other properties Fit using EM ▫Compute probability of propagation along each edge

Info Diffusion in Blogs Validation: ▫Synthetic  Used modified Erdos-Renyi graph, created propagation  Found that EM was able to identify transmission of most edges ▫Real  Found “internet-only” topics  Looked at most highly ranked expected transmission links, identified a real link in 90% of cases

Info Diffusion in Blogs Limitations of SIR ▫No multiple postings ▫No “stickiness”, which topics resonate with whom ▫No time limiting factor in topics ▫“Closed world assumption”  No outside influences after initial infection

Outline Intro: Models for diffusion ▫Epidemiological: SIS/SIR/SIRS ▫Threshold models Case studies ▫SIR: Info diffusion in blogs ▫SIS: Cascades in blogs ▫Timing: Cascades in chain letters ▫A closer look: Network-based Marketing

Cascades in Blogs Jure Leskovec, Mary Mcglohon, Christos Faloutsos, Natalie Glance, Matthew Hurst. Cascading Behavior in Large Blog Graphs: Patterns and a Model. In Society of Applied and Industrial Mathematics: Data Mining (SDM07) (2007) Goal: What do cascades (conversation trees) in blogs look like, and how can we model them?

Cascades in Blogs Data: ▫Gathered from August-September 2005 ▫Used set of 44,362 blogs, 2.4 million posts ▫245,404 blog-to-blog links 17 Time [1 day] Number of posts Jul 4 Aug 1 Sep 29

Cascades in Blogs 18 Blogosphere B1B1 B2B2 B4B4 B3B3 Cascades d e b c e a a b c d e “Star” “Chain” What is the timing of links? What are cascade sizes? What are cascade shapes?

19 Cascades in Blogs What is the timing of links? Does popularity decay at a constant rate? With an exponential (“half life”)? Linear-linear scaleLog-linear scaleLog-log scale

Cascades in Blogs Observation: The probability that a post written at time t p acquires a link at time t p + Δ is: p(t p + Δ ) ∝ Δ log(days after post) log( # in-links) slope=-1.5 (Linear-linear scale)

21 Cascades in Blogs How are cascade sizes distributed? Geometric distribution? Linear-linear scaleLog-linear scaleLog-log scale d e b c e a

Cascades in Blogs Q: What size distribution do cascades follow? Are large cascades frequent? Observation: The probability of observing a cascade of n blog posts follows a Zipf distribution: p(n) ∝ n log(Cascade size) (# of nodes) log(Count) slope=-2 d e b c e a

23 Cascades in Blogs How are cascade shapes distributed? More stars? More chains? d e b c e a

log(Size) of chain (# nodes) log(Count) a=-8.5 log(Size) of star (# nodes) log(Count) a=-3.1 Cascades in Blogs Q: What is the distribution of particular cascade shapes? Observation: Stars and chains in blog cascades also follow a power law, with different exponents (star -3.1, chain -8.5). 24

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 25 B1B1 B2B2 B4B4 B3B3

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 26 B1B1 B2B2 B4B4 B3B3 p 1,1

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 27 B1B1 B2B2 B4B4 B3B3 p 1,1

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 28 B1B1 B2B2 B4B4 B3B3 p 1,1 p 4,1 p 2,1

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 29 B1B1 B2B2 B4B4 B3B3 p 1,1 p 4,1 p 2,1

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 30 B1B1 B2B2 B4B4 B3B3 p 1,1 p 4,1 p 2,1

Cascades in Blogs Based on SIS model in epidemiology ▫Randomly pick blog to infect, add post to cascade ▫Infect each in-linked neighbor with probability  ▫Add infected neighbors’ posts to cascade. ▫Set old infected node to uninfected. 31 B1B1 B2B2 B4B4 B3B3 p 1,1 p 4,1 p 2,1 p 4,1

Cascades in Blogs 32 Most frequent cascades model data log(Cascade size) (# nodes) log(Count) log(Star size) log(Count) log(Chain size) Data Model

Cascades in Blogs Limitations of SIS ▫Closed world assumption ▫Forced to set infection probability low to avoid large epidemics– possibly limits stars. ▫No time limit, possibly overestimates chains.

Outline Intro: Models for diffusion ▫Epidemiological: SIS/SIR/SIRS ▫Threshold models Case studies ▫SIR: Info diffusion in blogs ▫SIS: Cascades in blogs ▫Timing: Cascades in chain letters ▫A closer look: Network-based Marketing

Chain Letter Cascades David Liben-Nowell, Jon Kleinberg. Tracing the Flow of Information on a Global Scale Using Internet Chain-Letter Data. Proceedings of the National Academy of Sciences, Vol. 105, No. 12. (March 2008), pp Goal: How can we trace the path of a meme, and explain these paths?

Chain Letter Cascades Data: NPR chain letter records. ▫People directed to sign and send back to admin ▫Had several copies of lists, overlaps ▫Reconstructed the trees using edit distance

Chain Letter Cascades A reconstruction:

Chain Letter Cascades The tree:

Chain Letter Cascades How to model? ▫These trees have much longer paths ▫2 considerations  Spatial distance (geographic)  Timing

Chain Letter Cascades Model: based on a delay distribution Nodes reply-to-all, so latecomers just append.

Chain Letter Cascades Validation: Simulated on a real social network (Livejournal), produced similar trees. Limitations: ▫The chain letter mechanism is somewhat nontraditional diffusion ▫Closed-world assumption is perhaps OK

Outline Intro: Models for diffusion ▫Epidemiological: SIS/SIR/SIRS ▫Threshold models Case studies ▫SIR: Info diffusion in blogs ▫SIS: Cascades in blogs ▫Timing: Cascades in chain letters ▫A closer look: Network-Based Marketing

Network-Based Marketing Shawndra Hill, Foster Provost, Chris Volinsky. Network-based marketing: Identifying likely adopters via consumer networks. Statistical Science, Vol. 22, No. 2. (2006), pp Question: Is there statistical evidence that network linkage directly affects product adoption?

Network-Based Marketing Data: Direct-mail marketing campaign for adopting a new communications service. ▫21 target segments, millions of customers ▫Divided based on:  Loyalty  Previous adoptions  Predictive scores based on other demographics  Different marketing campaigns (postcards, calls)

Network-Based Marketing

Hypothesis: A customer who has had direct communication with a subscriber is more likely to adopt. ▫Data: (incomplete) network information  ID of users, Timestamp, Duration To test, added a “NN” (network neighbor) flag to features if a customer had communicated with a subscriber. (0.3% overall)

Network-Based Marketing Created baseline statistical model based on node attributes. ▫“Loyalty”- how consumer used services in past ▫Geographic - city, state, etc. ▫Demographic- census-type data, credit score Added a variable for NN, performed logistic regression on each segment, with response variable being “take rate”.

Network-Based Marketing Log-odds ratio for NN variable

Network-Based Marketing Take ratesLift ratios

Network-Based Marketing Added a “segment 22” consisting of only NN, but made up of less promising customers.

Network-Based Marketing What about causality? What if the adoption is due to homophily? To address this, sample from non-NN to make a similar data set to the NN group. Performed logistic regression, showed that network impact is highest for the least loyal group.

Network-Based Marketing Lift curve for NN

Network-Based Marketing What about other network features? ▫Degree, transactions, connectedness, etc. Added network features to existing regression model, tested lift.

Network-Based Marketing Lift in sales for both models

Conclusion Several ways of approaching the study of diffusion No model is perfect. Considerations: ▫Closed world assumption vs. external effects ▫Homophily and node attributes ▫Network structure Network information is valuable, but (usually) does not account for everything.