Download presentation
Presentation is loading. Please wait.
Published byScott Harvey Modified over 9 years ago
1
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012
2
Networks are everywhere! Human Disease Network [Barabasi 2007] Gene Regulatory Network [Decourty 2008] Facebook Network [2010] The Internet [2005] Prakash 2012
3
Dynamical Processes over networks are also everywhere! Prakash 2012
4
Why do we care? Social collaboration Information Diffusion Viral Marketing Epidemiology and Public Health Cyber Security Human mobility Games and Virtual Worlds Ecology Localized effects: riots…
5
Why do we care? (1: Epidemiology) Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2012
6
Why do we care? (1: Epidemiology) Dynamical Processes over networks Each circle is a hospital ~3000 hospitals More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2012
7
Why do we care? (1: Epidemiology) CURRENT PRACTICEOUR METHOD ~6x fewer! [US-MEDICARE NETWORK 2005] Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2012
8
Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2012
9
Why do we care? (2: Online Diffusion) Dynamical Processes over networks Celebrity Buy Versace™! Followers Social Media Marketing Prakash 2012
10
Why do we care? (4: To change the world?) Dynamical Processes over networks Social networks and Collaborative Action Prakash 2012
11
High Impact – Multiple Settings Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? epidemic out-breaks products/viruses transmit s/w patches Prakash 2012
12
Research Theme DATA Large real-world networks & processes ANALYSIS Understanding POLICY/ ACTION Managing Prakash 2012
13
Research Theme – Public Health DATA Modeling # patient transfers ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? Prakash 2012
14
Research Theme – Social Media DATA Modeling Tweets spreading POLICY/ ACTION How to market better? ANALYSIS # cascades in future? Prakash 2012
15
In this talk Q1: How to immunize and control out-breaks better? Q2: How to find culprits of epidemics? POLICY/ ACTION Managing Prakash 2012
16
In this lecture DATA Large real-world networks & processes Q3: How do cascades look like? Q4: How does activity evolve over time? Prakash 2012
17
Outline Motivation Part 1: Policy and Action (Algorithms) Part 2: Learning Models (Empirical Studies) Conclusion Prakash 2012
18
Part 1: Algorithms Q1: Whom to immunize? Q2: How to detect culprits? Prakash 2012
19
Hanghang Tong, B. Aditya Prakash, Tina Eliassi- Rad, Michalis Faloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation” in ACM CIKM 2012 (Best Paper Award) Prakash 2012 [Thanks to Hanghang Tong for some slides!]
20
An Example: Flu/Virus Propagation HealthySick Contact 1: Sneeze to neighbors 2: Some neighbors Sick 3: Try to recover Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation 20 This paper
21
Vulnerability measure λ [ICDM 2011, PKDD2010] Increasing λ Increasing vulnerability λ is the epidemic threshold “Safe”“Vulnerable”“Deadly” Prakash 2012
22
Minimizing Propagation: Edge Deletion Given: a graph A, virus prop model and budget k; Find: delete k ‘best’ edges from A to minimize λ Bad Good
23
Q: How to find k best edges to delete efficiently? Left eigen-score of source Right eigen-score of target
24
Minimizing Propagation: Evaluations Time Ticks Log (Infected Ratio) (better) Our Method Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)
25
Discussions: Node Deletion vs. Edge Deletion Observations: Node or Edge Deletion λ Decrease Nodes on A = Edges on its line graph L(A) Questions? Edge Deletion on A = Node Deletion on L(A)? Which strategy is better (when both feasible)? Original Graph ALine Graph L(A)
26
Discussions: Node Deletion vs. Edge Deletion Q: Is Edge Deletion on A = Node Deletion on L(A)? A: Yes! But, Node Deletion itself is not easy: 26 Theorem: Hardness of Node Deletion. Find Optimal k-node Immunization is NP-Hard Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A)
27
Discussions: Node Deletion vs. Edge Deletion Q: Which strategy is better (when both feasible)? A: Edge Deletion > Node Deletion 27 (better) Green: Node Deletion (e.g., shutdown a twitter account) Red: Edge Deletion (e.g., un-friend two users)
28
Maximizing Propagation: Edge Addition Given: a graph A, virus prop model and budget k; Find: add k ‘best’ new edges into A. By 1 st order perturbation, we have λ s - λ ≈Gv(S)= c ∑ eєS u(i e )v(j e ) So, we are done need O(n 2 -m) complexity Left eigen-score of source Right eigen-score of target Low Gv High Gv 28
29
λ s - λ ≈Gv(S)= c ∑ eєS u(i e )v(j e ) Q: How to Find k new edges w/ highest Gv(S) ? A: Modified Fagin’s algorithm k k #3: Search space k+d Search space :existing edgeTime Complexity: O(m+nt+kt 2 ), t = max(k,d) #1: Sorting Sources by u #2: Sorting Targets by v Maximizing Propagation: Edge Addition
30
Maximizing Propagation: Evaluation Time Ticks Log (Infected Ratio) (better) 30 Our Method
31
Fractional Immunization of Networks B. Aditya Prakash, Lada Adamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash 2012
32
? ? Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 Previously: Full Static Immunization Prakash 2012
33
Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012
34
Now: Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012
35
Fractional Asymmetric Immunization Fractional Effect [ f(x) = ] Asymmetric Effect # antidotes = 3 Prakash 2012
36
Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) Prakash 2012
37
Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) = f Prakash 2012
38
Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Prakash 2012
39
Our Algorithm “SMART-ALLOC” CURRENT PRACTICESMART-ALLOC [US-MEDICARE NETWORK 2005] Each circle is a hospital, ~3000 hospitals More than 30,000 patients transferred ~6x fewer! Prakash 2012
40
Running Time ≈ SimulationsSMART-ALLOC > 1 week 14 secs > 30,000x speed-up! Wall-Clock Time Lower is better Prakash 2012
41
Experiments K = 200K = 2000 PENN-NETWORK SECOND-LIFE ~5 x ~2.5 x Lower is better Prakash 2012
42
Part 1: Algorithms Q2: Whom to immunize? Q3: How to detect culprits? Prakash 2012
43
B. Aditya Prakash, Jilles Vreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels 43Prakash and Faloutsos 2012
44
Culprits: Problem definition 44 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012
45
Culprits: Problem definition 45 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012 Prior work: [Lappas et al. 2010, Shah et al. 2011]
46
Culprits: Exoneration 46Prakash and Faloutsos 2012
47
Culprits: Exoneration 47Prakash and Faloutsos 2012
48
Who are the culprits Two-part solution – use MDL for number of seeds – for a given number: exoneration = centrality + penalty Running time = – linear! (in edges and nodes) 48Prakash and Faloutsos 2012
49
Modeling using MDL Minimum Description Length Principle == Induction by compression Related to Bayesian approaches MDL = Model + Data Model – Scoring the seed-set Number of possible |S|- sized sets En-coding integer |S|
50
Modeling using MDL Data: Propagation Ripples Original Graph Infected Snapshot Ripple R2 Ripple R1
51
Modeling using MDL Ripple cost Total MDL cost How the ‘frontier’ advances How long is the ripple Ripple R
52
How to optimize the score? Two-step process – Given k, quickly identify high-quality set – Given these nodes, optimize the ripple R
53
Optimizing the score High-quality k-seed-set – Exoneration Best single seed: – Smallest eigenvector of Laplacian sub-matrix Exonerate neighbors Repeat
54
Optimizing the score Optimizing R – Just get the MLE ripple! Finally use MDL score to tell us the best set NetSleuth: Linear running time in nodes and edges
55
Experiments Evaluation functions: – MDL based – Overlap based (JD == Jaccard distance) Closer to 1 the better
56
Experiments
58
Part 2: Empirical Studies Q4: How do cascades look like? Q5: How does activity evolve over time? Prakash 2012
59
Cascading Behavior in Large Blog Graphs How does information propagate over the blogosphere? Blogs Posts Links Information cascade J. Leskovec, M.McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading Behavior in Large Blog Graphs. SDM 2007.
60
Cascades on the Blogosphere Cascade is graph induced by a time ordered propagation of information (edges) Cascades B1B1 B2B2 B4B4 B3B3 a b c d e B1B1 B2B2 B4B4 B3B3 1 1 2 1 3 1 d e b c e a Blogosphere blogs + posts Blog network links among blogs Post network links among posts Prakash 2012
61
Blog data 45,000 blogs participating in cascades All their posts for 3 months (Aug-Sept ‘05) 2.4 million posts ~5 million links (245,404 inside the dataset) Time [1 day] Number of posts
62
Popularity over time Post popularity drops-off – exponentially? lag: days after post # in links 1 2 3 @t @t + lag Prakash 2012
63
Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links (log) days after post (log) Prakash 2012
64
Popularity over time Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 close to -1.5: Barabasi’s stack model and like the zero-crossings of a random walk # in links (log) -1.6 days after post (log) Prakash 2012
65
-1.5 slope Prakash 2012 J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF
66
Part 2: Empirical Studies Q4: How do cascades look like? Q5: How does activity evolve over time? Prakash 2012
67
Meme (# of mentions in blogs) – short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media Prakash 2012
68
Rise and fall patterns in social media Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] Prakash 2012
69
Rise and fall patterns in social media Answer: YES! We can represent all patterns by single model In Matsubara+ SIGKDD 2012 Prakash 2012
70
Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function (how infective a blog posting is) Time n=0Time n=n b Time n=n b +1 β Power Law Prakash 2012
71
-1.5 slope Prakash 2012 J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF
72
SpikeM - with periodicity Full equation of SpikeM Periodicity 12pm Peak activity 3am Low activity Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity Prakash 2012
73
Tail-part forecasts SpikeM can capture tail part Prakash 2012
74
“What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? ? ? (1) First spike(2) Release date(3) Two weeks before release Prakash 2012
75
“What-if” forecasting – SpikeM can forecast not only tail-part, but also rise-part ! SpikeM can forecast upcoming spikes (1) First spike(2) Release date(3) Two weeks before release Prakash 2012
76
Outline Motivation Part 1: Understanding Epidemics (Theory) Part 2: Policy and Action (Algorithms) Part 3: Learning Models (Empirical Studies) Conclusion Prakash 2012
77
Conclusions Fast Immunization – Min/Max. drop in eigenvalue, NetGel, NetMelt Finding Culprits Automatically – MDL+Exoneration, Linear Time Algo Bursts: SpikeM model – Exponential growth, Power-law decay Prakash 2012
78
ML & Stats. Comp. Systems Theory & Algo. Biology Econ. Social Science Engg. Propagation on Networks Prakash 2012
79
References 1. Winner-takes-all: Competing Viruses or Ideas on fair-play networks (B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos) – In WWW 2012, Lyon 2. Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks (B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos) - In IEEE ICDM 2011, Vancouver (Invited to KAIS Journal Best Papers of ICDM.) 3. Times Series Clustering: Complex is Simpler! (Lei Li, B. Aditya Prakash) - In ICML 2011, Bellevue 4. Epidemic Spreading on Mobile Ad Hoc Networks: Determining the Tipping Point (Nicholas Valler, B. Aditya Prakash, Hanghang Tong, Michalis Faloutsos and Christos Faloutsos) – In IEEE NETWORKING 2011, Valencia, Spain 5. Formalizing the BGP stability problem: patterns and a chaotic model (B. Aditya Prakash, Michalis Faloutsos and Christos Faloutsos) – In IEEE INFOCOM NetSciCom Workshop, 2011. 6. On the Vulnerability of Large Graphs (Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad and Christos Faloutsos) – In IEEE ICDM 2010, Sydney, Australia 7. Virus Propagation on Time-Varying Networks: Theory and Immunization Algorithms (B. Aditya Prakash, Hanghang Tong, Nicholas Valler, Michalis Faloutsos and Christos Faloutsos) – In ECML-PKDD 2010, Barcelona, Spain 8. MetricForensics: A Multi-Level Approach for Mining Volatile Graphs (Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash and Hanghang Tong) - In SIGKDD 2010, Washington D.C. 9. Parsimonious Linear Fingerprinting for Time Series (Lei Li, B. Aditya Prakash and Christos Faloutsos) - In VLDB 2010, Singapore 10. EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In PAKDD 2010, Hyderabad, India 11. BGP-lens: Patterns and Anomalies in Internet-Routing Updates (B. Aditya Prakash, Nicholas Valler, David Andersen, Michalis Faloutsos and Christos Faloutsos) – In ACM SIGKDD 2009, Paris, France. 12. Surprising Patterns and Scalable Community Detection in Large Graphs (B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju and Christos Faloutsos) – In IEEE ICDM Large Data Workshop 2009, Miami 13. FRAPP: A Framework for high-Accuracy Privacy-Preserving Mining (Shipra Agarwal, Jayant R. Haritsa and B. Aditya Prakash) – In Intl. Journal on Data Mining and Knowledge Discovery (DKMD), Springer, vol. 18, no. 1, February 2009, Ed: Johannes Gehrke. 14. Complex Group-By Queries For XML (C. Gokhale, N. Gupta, P. Kumar, L. V. S. Lakshmanan, R. Ng and B. Aditya Prakash) – In IEEE ICDE 2007, Istanbul, Turkey. Prakash 2012
80
Understanding and Managing Cascades on Large Networks B. Aditya Prakash http://www.cs.vt.edu/~badityap Prakash 2012 Sounds Interesting? I am looking for Ph.D. students---drop me an email with your CV!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.