On the Vulnerability of Large Graphs

Slides:



Advertisements
Similar presentations
Christos Faloutsos CMU
Advertisements

COMP 482: Design and Analysis of Algorithms
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Max Flow Problem Given network N=(V,A), two nodes s,t of V, and capacities on the arcs: uij is the capacity on arc (i,j). Find non-negative flow fij for.
Dynamical Processes on Large Networks B. Aditya Prakash Carnegie Mellon University MMS, SIAM AN, Minneapolis, July 10,
Spread of Influence through a Social Network Adapted from :
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
DAVA: Distributing Vaccines over Networks under Prior Information
Information Networks Failures and Epidemics in Networks Lecture 12.
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Scalable Vaccine Distribution in Large Graphs given Uncertain Data Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech CIKM, Shanghai,
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Approximating Maximum Edge Coloring in Multigraphs
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
Finding the best immunization strategy Yiping Chen Collaborators: Shlomo Havlin Gerald Paul Advisor: H.Eugene Stanley.
Matroids, Secretary Problems, and Online Mechanisms Nicole Immorlica, Microsoft Research Joint work with Robert Kleinberg and Moshe Babaioff.
1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
U. Michigan participation in EDIN Lada Adamic, PI E 2.1 fractional immunization of networks E 2.1 time series analysis approach to correlating structure.
Fast Random Walk with Restart and Its Applications
CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec,
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
CMU SCS Large Graph Mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Interacting Viruses in Networks: Can Both Survive? Authors: Alex Beutel, B. Aditya Prakash, Roni Rosenfeld, and Christos Faloutsos Presented by: Zachary.
December 7-10, 2013, Dallas, Texas
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Winner-takes-all: Competing Viruses or Ideas on fair-play Networks B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos Carnegie Mellon University,
Spotting Culprits in Epidemics: How many and Which ones? B. Aditya Prakash Virginia Tech Jilles Vreeken University of Antwerp Christos Faloutsos Carnegie.
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Influence propagation in large graphs - theorems and algorithms B. Aditya Prakash Christos Faloutsos
ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,
I NFORMATION C ASCADE Priyanka Garg. OUTLINE Information Propagation Virus Propagation Model How to model infection? Inferring Latent Social Networks.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Propagation on Large Networks B. Aditya Prakash Christos Faloutsos Carnegie Mellon University.
Reorganization in Network Regions for Optimality and Fairness Robert E. Beverly IV, MSc Thesis.
Efficient Semi-supervised Spectral Co-clustering with Constraints
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Controlling Propagation at Group Scale on Networks Yao Zhang*, Abhijin Adiga +, Anil Vullikanti + *, and B. Aditya Prakash* *Department of Computer Science.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Inferring Networks of Diffusion and Influence
MEIKE: Influence-based Communities in Networks
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Large Graph Mining: Power Tools and a Practitioner’s guide
The Importance of Communities for Learning to Influence
Effective Social Network Quarantine with Minimal Isolation Costs
Epidemiology.
Coverage Approximation Algorithms
Cost-effective Outbreak Detection in Networks
3.3 Network-Centric Community Detection
Large Graph Mining: Power Tools and a Practitioner’s guide
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Proximity in Graphs by Using Random Walks
Presentation transcript:

On the Vulnerability of Large Graphs Joint work by Hanghang Tong, B. Aditya Prakash, Charalampos Tsourakakis Tina Eliassi-Rad, Christos Faloutsos, Duen Horng (Polo) Chau ICDM 2010, December 13-17, 2010, Sydney, Australia

Motivation: Immunization Given: a graph A, virus prop model and budget k; Find: k ‘best’ nodes for immunization. 16 23 24 15 12 14 25 22 13 26 11 21 9 27 34 20 10 1 4 28 8 33 19 2 29 3 7 18 5 30 6 17 31 32 SARS costs 700+ lives; $40+ Bn; H1N1 costs Mexico $2.3bn

Motivation: Immunization Given: a graph A, virus prop model and budget k; Find: k ‘best’ nodes for immunization. 16 23 24 15 12 14 25 22 13 26 11 21 9 27 34 20 10 1 4 28 8 33 19 2 29 3 7 18 5 30 6 17 31 32 SARS costs 700+ lives; $40+ Bn; H1N1 costs Mexico $2.3bn

Challenges Given a graph A, and budget k, Q1 (Measure): how to measure the `shield-value’ for a subset of nodes (S)? Q2 (Algorithm): how to find a subset of k nodes with highest ‘shield-value’?

Roadmap Motivation (Q1) ‘Shield-value’ (Q2) Netshield Algorithm Experimental Results Conclusion

Background: SIS Virus Model [Chakrabarti+ 2008] ‘Flu’ like: Susceptible-Infectious-Susceptible If the virus ‘strength’ s < 1/ λ , an epidemic can not happen Intuition s ≈ # of sneezes before heal Largest eigenvalue λ ≈ # of edges/paths λ indicates the vulnerability of the whole graph A !

Eigen-Drop: an Ideal `shield-value’? Eigen-Drop(S)=λ-λs 9 9 11 9 10 10 1 1 4 4 6 2 8 8 2 7 3 7 3 5 5 6 Original Graph: λ Without {2, 6}: λs

Why not Use Eigen-Drop as `Shield-Value’ ? Select k nodes, whose absence creates the largest drop in λ We need in time Example: 1,000 nodes, with 10,000 edges It takes 0.01 seconds to compute λ It takes 2,615 years to find best-5 nodes ! λ-λs Largest eigenvalue w/o subset of nodes S

Our `Shield-Value’: Approx Eigen-Drop Theorem: (1) λ - λs≈Sv(S)= ∑iєS 2λu(i)2-∑i,jєS A(i,j)u(i)u(j) A u u = λ X u(i): eigen-score Footnote: u(i) ~ PageRank(i) ~ in-degree(i)

Intuition of the proposed `Shield-value’ Theorem: (Tong+ 2009) (1) λ - λs≈Sv(S)= ∑iєS 2λu(i)2-∑i,jєS A(i,j)u(i)u(j) find a set of nodes S (e.g. k=4), which (C1) each has high eigen-scores (C2) diverse among themselves Select by C1 Select by C1+C2 Original Graph

The Proposed Alg.: Netshield Theorem: (1) λ - λs≈Sv(S)= ∑iєS 2λu(i)2-∑i,jєS A(i,j)u(i)u(j) (2) Sv(S) is sub-modular Example: 1,000 nodes, with 10,000 edges Netshield takes < 0.1 seconds to find best-5 nodes ! … as opposed to 2,615 years Corollary: (3) Netshield is near-optimal (wrt max Sv(S)) (4) Netshield is O(nk2+m) Footnote: near-optimal means Sv(S Netshield) >= (1-1/e) Sv(S Opt)

Why Netshield is Near-Optimal? Marginal benefit of deleting {5,6} Marginal benefit of deleting {5,6} 3 9 10 3 9 1 10 5 1 δ 5 2 6 2 8 7 6 8 7 4 4 Benefit of deleting {1,2} Benefit of deleting {1,2, 3,4} Sub-Modular (i.e., Diminishing Returns) >= δ

Why Netshield is Near-Optimal? 3 9 10 3 9 1 10 5 1 δ 5 2 6 2 8 7 6 8 7 4 4 >= Sub-Modular (i.e., Diminishing Returns) δ Theorem: k-step greedy alg. to maximize a sub-modular function guarantees (1-1/e) optimal [Nemhauster+ 78]

T4: Why Sv(S) is sub-modular? Newly deleted 3 9 10 1 5 2 6 8 7 4 Already deleted

T4: Why Sv(S) is sub-modular? Newly deleted Marginal Benefit of deleting {5,6} Sv(Sgreen U Sblue) - Sv(Sgreen) ∑iєSblue 2λu(i)2-∑i,jєSblue A(i,j)u(i)u(j) (∑iєSblue, jєSgreen A(i,j)u(i)u(j)+∑iєSgreen, jєSblue A(i,j)u(i)u(j)) = 3 9 - 10 1 Pure benefit from {5,6} 5 2 Interaction between {5,6} and {1,2} 6 8 7 4 Already deleted Only purple term depends on {1, 2}!

T4: Why Sv(S) is sub-modular? 3 3 9 9 10 10 1 1 5 5 2 2 6 6 8 7 8 7 4 4 Marginal Benefit = Blue –Purple More Green More Purple Less Red Marginal Benefit of Left >= Marginal Benefit of Right Footnote: greens are nodes already deleted; blue {5,6} nodes are nodes to be deleted

Experiment: Quality of Netshield Optimal (better) Netshield Eig-Drop (1-1/e) x Optimal # of vaccines

Experiment: Comparison of Immunization Log(fraction of infected nodes) (better) Abnormality PageRank Between (short) Degree Between (RW) Acquaintance Eigs (=HITS) Netshield Time Epoch

Experiment: Speed of Netshield > 10 days >=1,000,000 Com-Eigs 100,000 10,000 Com-Eval Time 10,000,000x 1,000 (better) 100 10 Netshield 1 0.1 seconds 0.1 # of vaccines 0.01 NIPS co-authorship Network: 3K nodes, 15K edges

Experiment: Scalability of Netshield Time (better) # of edges X 108

Conclusion Prob. Dfn (Immunization) Our Contributions Given: a graph A, virus prop model and budget k; Find: k ‘best’ nodes for immunization. Our Contributions Measure (‘shield-value’) : Approx Eigen-Drop Algorithm: Scalable & Near-optimal