Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014.

Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014

Thanks! Ali Pinar Ben Miller Prakash 20142

Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] Prakash 20143

Dynamical Processes over networks are also everywhere! Prakash 20144

Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology........ Prakash 20145

Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks SI Model Prakash 20146

Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 20147

Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 20148

Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 20149

Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers Social Media Marketing Prakash 201410

Why do we care? (3: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action Prakash 201411

High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? epidemic out-breaks products/viruses transmit s/w patches Prakash 201412

Research Theme DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing/Utili zing Prakash 201413

Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? Prakash 201414

Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? Prakash 201415

In this talk Q1: How to ‘zoom-out’ of graphs? Q2: How to control out- breaks? POLICY/ ACTION Utilizing Prakash 201416

In this talk DATA Large real-world networks & processes Q3: How does ‘activity’ evolve over time? Prakash 201417

Outline Motivation Part 1: Policy and Action (Algorithms) Part 2: Learning Models (Empirical Studies) Conclusion Prakash 201418

Part 1: Algorithms Q1: How to zoom-out of a network? Q2: How to control out-breaks? (Broad theme: Network Topology Manipulation) Prakash 201419

“Zoom-out” of the network “Zoom-out” of the cascade graph to get a quick picture (= summarization) Coarsening Big graph Zoom-out A F E D C B Smaller representation of the network A C B E F D [Purohit, Prakash, et, al. SIGKDD 2014] Prakash 201420

Challenges C1: How do we maintain diffusive characteristics when coarsening networks? C2: How do we merge node to get the coarse network? C3: how do we find the best node to merge fast? Prakash 201421

C1: Modeling diffusion Information spreads over networks – e.g.:, rumor/meme spreads over Twitter following network Independent cascade model (IC ) [Kempe+, KDD03] – Weights p ij : propagation prob. from i to j – Each node has only one chance to infect its neighbors Meme spreading Prakash 201422

Diffusive characteristics First eigenvalue λ 1 (of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ 1 is the epidemic threshold (will there be an epidemic?) “ Safe” “Vulnerable” “Deadly” Increasing λ 1, Increasing vulnerability Prakash 201423

C1: maintain diffusive characteristics Goal: maintain the diffusive characteristics of the original network in the coarsened network Original network coarsen A F E D C B Coarsened network A C B E F D Make the coarsened network have the least change in the first eigenvalue Prakash 201424

C2: How to merge nodes Goal: Merge nodes of graph G to get the coarsened graph that “approximates” G with respect to diffusion Merge b and a can get the least change of λ 1 Is this correct? 0.375! Original network Influence from d to b: 0.5 Influence from d to a: 0.25 Average: 0.375 Prakash 201425

In general: C2: How to merge nodes Merging a,b Prakash 201426

Problem Definition Prakash 201427

C3: which nodes to merge Goal: – Find the best nodes to merge – Fast, scalable to large network Original network coarsen A F E D C B Coarsened network A C B E F D Prakash 201428

Naive Greedy Heuristic Step: Score every edge by the change in eigenvalue Greedily choose the edge (a,b) with the least score, and merge (a,b) Re-evaluate the scores of every edge and repeat Too slow! O(m 2 ) time to score all edges Lose time benefits of analyzing the smaller graph Prakash 201429

CoarseNet: idea Can we approximate the edge scores faster? – Yes! Use matrix perturbation arguments to estimate (up to first order terms) the score of an edge in constant time (skipping details) Score all edges in O(m) time – Naive Heuristic: O(m 2 ) time Prakash 201430

CoarseNet: Complete algorithm Step 1: compute scores for all edge pairs 2: Merge nodes with smallest score 3. Goto step 1 until αn nodes left Original Network (weight=0.5) Assigning scores Merging edges Coarsened Network Prakash 201431

How do we perform? The first eigenvalue gets preserved well up to large coarsening factors! Amazon (See more results in the paper) DBLP Higher is better Prakash 201432

Application 1: Influence Maximization Methodology: Step 1: Coarsen the large social network using CoarsenNet Step 2: Solve influence maximization on the coarsened network Step 3: Randomly select one node from each selected “supernode” Step 1: Coarsen A C B E F D Step 2: Solve influence maximization A C B E F D Step 3: Randomly select one node from C We call it CSPIN Prakash 201433

We can merge up to 95% of the vertices are merged without significantly affecting the influence spread! Prakash 201434

Application 2: Diffusion Characterization Goal: use Graph Coarsening to understand information cascades Dataset: Flixster – a fridendship network with movie ratings – Cascade: the same movie rating from friends Methodology – coarsen the network using CoarseNet with the reduction factor α=0.5 – study the formed groups (supernodes) – Can get non-network surrogates Prakash 201435

Diffusion observation Observation 1: a very large fraction of movies propagate in a small number of groups Observation 2: a multi-modal distribution Stats: 1891 groups mean group size: 16.6 the largest group: 22061 nodes (roughly 40% of nodes) Prakash 201436 (See more results in the paper)

Future work… How is it related to community structure? More applications, like Visualization… Parallelization Prakash 201437

Part 1: Algorithms Q1: How to zoom-out of a network? Q2: How to control out-breaks? Prakash 201438

Immunization (= Interventions) Different Flavors: – Pre-emptive – Data-aware Prakash 201439

Pre-emptive: Vulnerability (Again!) First eigenvalue λ 1 (of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ 1 is the epidemic threshold “ Safe” “Vulnerable” “Deadly” Increasing λ 1, Increasing vulnerability Prakash 201440

Goal Decrease λ 1 as much as possible Node based [Tong, P., + ICDM 2010] Edge-based [Tong, P., Eliassi-Rad+ CIKM 2012, Best Paper Award] Edge-Manipulation [P., Adamic+ SDM 2013] Prakash 201441

Latest results First (provable) approximation algorithms for edge-based problem (under submission [Saha, Adiga, P., Vullikanti 2014]) – O(log^2 n)--factor (can be improved to O(log n)) Based on the idea of removing closed walks – Semi-Definite Programming Rounding-based O(1) factor Prakash 201442

Data-aware Immunization Dominator tree Graph with infected nodes Given: Graph and Infected nodes Find: ‘best’ nodes for immunization Complexity – NP-hard – Hard to approximate within an absolute error DAVA-tree – Optimal solution on the tree DAVA and DAVA-fast – Merging infected nodes – Build a “dominator tree”, and run DAVA-tree Running time: subquadratic – DAVA: O(k(|E|+ |V|log|V|)) – DAVA-fast: O(|E|+|V|log|V|) [Zhang and Prakash, SDM 2014] Prakash 201443

Extensions Can be extended to Uncertain and noisy initial data as well! [Zhang and Prakash, CIKM 2014] Twitter Firehose API 1% sample Prakash 201444

Outline Motivation Part 1: Policy and Action (Algorithms) Part 2: Learning Models (Empirical Studies) Conclusion Prakash 201445

Part 2: Empirical Studies Q3: How does activity evolve over time? Prakash 201446

Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 201447

Patterns X Y Prakash 201448

Patterns X Y More Data Prakash 201449

Patterns X Y Anomaly ? Prakash 201450

Patterns X Y Anomaly ? Extrapolation Prakash 201451

Patterns X Y Anomaly Imputation Extrapolation Prakash 201452

Patterns Anomaly Imputation Extrapolation Compression Prakash 201453

Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media Prakash 201454

Rise and fall patterns in social media Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] Prakash 201455

Rise and fall patterns in social media Answer: YES! We can represent all patterns by single model In Matsubara, Sakurai, Prakash+ SIGKDD 2012 Prakash 201456

Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function (how infective a blog posting is) Time n=0Time n=n b Time n=n b +1 β Power Law Prakash 201457

-1.5 slope J. G. Oliveira et. al. Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF (also in Leskovec, McGlohon+, SDM 2007) Prakash 201458

SpikeM - with periodicity Full equation of SpikeM Periodicity 12pm Peak activity 3am Low activity Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Prakash 201459

Tail-part forecasts SpikeM can capture tail part Prakash 201460

“What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 201461

“What-if” forecasting – SpikeM can forecast not only tail-part, but also rise-part ! SpikeM can forecast upcoming spikes (1) First spike(2) Release date(3) Two weeks before release Prakash 201462

Modeling Malware Penetration Worldwide Intelligence Network – Which machine got which malware (or legitimate files) – 1 Billion nodes – 37 Billion edges Q: Temporal patterns? [Papalexakakis et. al. + 2013] Prakash 201463

Q: Temporal Patterns Looks familiar? Prakash 201464

SpikeM again (or SharkFin) 7 parameters only! ~ 400 points Prakash 201465

Latent Propagation Patterns Prakash 201466

Bonus: Protest Predictions Can Twitter provide a lead time? South American twitter dataset – Language: Spanish/Portuguese – Idea 1.Look for trending keywords. 2.Predict event type for protest using SpikeM parameters! A political tweet Violent Protest (VP) Non Violent Protest (P) [Sundereisan et al. ASONAM 2014] [Jin et al. SIGKDD 2014] Prakash 201467 VP P

Outline Motivation Part 1: Learning Models (Empirical Studies) Part 2: Understanding Epidemics (Theory) Part 3: Policy and Action (Algorithms) Conclusion and Future Plans Prakash 201468

Future Plans DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing Prakash 201469

Scalability – Big Data Datasets of unprecedented scale – High dimensionality and sample size! Need scalable algorithms for – Learning Models – Developing Policy Leverage parallel systems – Map-Reduce clusters (like Hadoop) for data-intensive jobs (more than 6000 machines) – Parallelized compute-intensive simulations (like Condor) Prakash 201470

Uncertain Data in Cascade analysis Original, Nodes sampled off Culprits, and missing nodes filled in Sundereisan, Vreeken, Prakash. 2014 Correcting for missing data Designing More Robust Immunization Policies Zhang and Prakash. CIKM 2014 Prakash 201471

Social  Biological Contagion Automatically learn models Chen, Prakash+. 2014 Prakash 201472

References 1. Scalable Vaccine Distribution in Large Graphs given Uncertain Data (Yao Zhang and B. Aditya Prakash) -- In CIKM 2014. 2. Fast Influence-based Coarsening for Large Networks (Manish Purohit, B. Aditya Prakash, Chahhyun Kang, Yao Zhang and V. S. Subrahmanian) – In SIGKDD 2014 3. DAVA: Distributing Vaccines over Large Networks under Prior Information (Yao Zhang and B. Aditya Prakash) -- In SDM 2014 4. Fractional Immunization on Networks (B. Aditya Prakash, Lada Adamic, Jack Iwashnya, Hanghang Tong, Christos Faloutsos) – In SDM 2013 5. Spotting Culprits in Epidemics: Who and How many? (B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos) – In ICDM 2012, Brussels Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 6. Gelling, and Melting, Large Graphs through Edge Manipulation (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, Michalis Faloutsos, Christos Faloutsos) – In ACM CIKM 2012, Hawaii (Best Paper Award) 7. Rise and Fall Patterns of Information Diffusion: Model and Implications (Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos) – In SIGKDD 2012, Beijing 8. Interacting Viruses on a Network: Can both survive? (Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, Christos Faloutsos) – In SIGKDD 2012, Beijing 9. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon 10. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 11. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue 12. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain 13. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011. 14. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia 15. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 16. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. Prakash 201473

Acknowledgements Collaborators Christos Faloutsos Roni Rosenfeld, Michalis Faloutsos, Lada Adamic, Theodore Iwashyna (M.D.), Dave Andersen, Tina Eliassi-Rad, Iulian Neamtiu, Varun Gupta, Jilles Vreeken, V. S. Subrahmanian John Brownstein (M.D.) Deepayan Chakrabarti, Hanghang Tong, Kunal Punera, Ashwin Sridharan, Sridhar Machiraju, Mukund Seshadri, Alice Zheng, Lei Li, Polo Chau, Nicholas Valler, Alex Beutel, Xuetao Wei Prakash 201474

Acknowledgements Students Liangzhe Chen Shashidhar Sundereisan Benjamin Wang Yao Zhang Prakash 201475

Acknowledgements Funding Prakash 201476

Analysis Policy/Action Data Making Diffusion Work for You B. Aditya Prakash http://www.cs.vt.edu/~badityap Prakash 201477

Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014.

Similar presentations

Presentation on theme: "Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014.

Similar presentations

Presentation on theme: "Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014."— Presentation transcript:

Similar presentations

About project

Feedback