Download presentation
Presentation is loading. Please wait.
Published byLesly Plume Modified over 9 years ago
1
DAVA: Distributing Vaccines over Networks under Prior Information
Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014 Zhang and Prakash, SDM 2014
2
Motivation: Epidemiology
Virus spreads over contact networks SIR model [Anderson+ 1991] Susceptible-Infectious-Recovered Weights pij: propagation prob. from i to j Recovered prob. δ for each node (models mumps-like infections) Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
3
Motivation: Social Media
Meme/Rumor spreads over friendship networks E.g.: Twitter following network Independent cascade model (IC) [Kempe+ KDD2003] Each node has only one chance to infect its neighbors Special case of SIR model Zhang and Prakash, SDM2014
4
Immunization Centers for Disease Control (CDC) cares about containing epidemic diseases E.g: ~400 million dollars used for vaccines for children in 2013 Twitter tries to stop rumor spread E.g.: rumors of victims after the Boston Marathon bombs in 2013 How to choose best nodes to vaccinate (remove)? Zhang and Prakash, SDM2014
5
Immunization Good for baseline strategies
Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] pick a random person, immunize one of its neighbors at random Netshield [Tong+ 2010] Minimize the epidemic threshold (point when the virus takes-off) Good for baseline strategies Zhang and Prakash, SDM2014
6
In reality ? Typically the epidemic has already started! this paper
Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] Netshield [Tong+ 2010] ? Typically the epidemic has already started! More realistic intervention Which nodes to vaccinate now? We call it Data-Aware Immunization this paper Zhang and Prakash, SDM2014
7
Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014
8
Data-Aware Vaccination Problem
Problem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic? D D Best solution A A E E B B 1 vaccine? F F C Remove A, save {A, D}; Remove B, save {B}; Remove C, save {C}; C pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
9
Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014
10
Complexity of DAV NP-hard Approximation algorithm?
See paper for details NP-hard Reduce from Maximum K-Intersection Problem (MaxKI: maximizing the intersection of k subsets) MaxKI is NP-Complete [Vinterbo 2004] Approximation algorithm? Not submodular Actually, DAV is hard to approximate within an absolute error! Zhang and Prakash, SDM2014
11
Outline Motivation Problem Definition Complexity Our Proposed Methods
assume IC model and undirected graph Experiments Conclusion Zhang and Prakash, SDM2014
12
1: Simplify - Merging infected nodes
Idea: merge all the infected nodes into a single ‘super infected’ node I Original Graph Merged Graph Super node I A A pA pA Equivalent pX B B pB pY Logical-OR pB=1-(1-pX)(1-pY) pC pC C C Zhang and Prakash, SDM2014
13
2: DAVA-Tree Algorithm: Idea
Select nodes with the largest “benefit” : the expected number of saved nodes after removing set S on graph G Benefit of adding additional node j into S: # of saved nodes after adding j into S Merged Infected Node Additional number of saved nodes when adding node j into S Benefit: 5 Benefit: 4 pij =1for all edges Benefit: 2 Zhang and Prakash, SDM2014
14
DAVA-Tree Alg.: Optimal on Trees
For any set S: Merged Infected Node Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I Fact 2: the benefit of each such node is independent of the rest of the set S Benefit: 2 Benefit: 5 pij =1for all edges Linear Time Benefit: 4 DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit Zhang and Prakash, SDM2014
15
3: General Case – Arbitrary Graphs
Idea We have the optimal algorithm for a tree Extract a spanning tree, then run DAVA-tree What kind of tree? Minimum spanning tree Optimal on MST by DAVA-tree Optimal solution Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. MST pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
16
3: General Case – Arbitrary Graphs
Idea We have the optimal algorithm for a tree Build a spanning tree first What kind of tree? Minimum spanning tree Software engineering We propose to use dominator tree u dominates v Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. every path from I to v contains u 4 dominates 8,9,10,11 pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
17
Dominator Tree u is immediate dominator of v
u dominates v AND every other dominator of v dominates u Dominator tree: add an edge between every such u and v Optimal from DAVA-tree Optimal solution Linear time [Buchsbaum, Tarjan 1998] pij =1 for all edges Dominator Tree Merged Graph Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution Zhang and Prakash, SDM2014
18
Weighting the dominator tree
#P-complete Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm) w1 p1 p3 w3 p6 w6 Dominator Tree Merged Graph Zhang and Prakash, SDM2014
19
DAVA algorithm Step: 1. T = Build a dominator tree
Merged Graph (pij =1 for all edges) Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Not finished |S|=2 Iteration=1 Dominator Tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
20
DAVA algorithm Step: 1. T = Build a dominator tree
Merged Graph Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Remove selected node O(k(|E|+ |V|log|V|)) Too slow for large networks! Dominator tree Not finished |S|=2 Iteration=2 Iteration=1 Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
21
DAVA-fast: a faster algorithm
Step: 1. T = Build a dominator tree 2. S = Run DAVA-tree on T with budget=k Merged Graph |S|=2 In practice, the performance of DAVA-fast is very close to DAVA Time complexity: subquadratic! DAVA-fast: O(|V|log|V|+|E|) Note finished Dominator tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
22
Extending to SIR model See the paper Zhang and Prakash, SDM2014
23
Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014
24
Experiments Virus Propagation Model
IC and SIR Settings (See more settings in the paper) Randomly uniformly chosen initial infected nodes Baseline Algorithms RANDOM: randomly uniformly chosen healthy nodes DEGREE: choose nodes with top weighted degrees PAGERANK: choose nodes with top pageranks NETSHIELD state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010] Assumes no data is given before the epidemic starts Zhang and Prakash, SDM2014
25
Experiments: datasets
Datasets are chosen from different domains Social media (IC model) OREGON: AS router graph STANFORD: hyperlink network GNUTELLA: peer-to-peer network BRIGHTKITE: friendship network Epidemiology (SIR model) PORTLAND and MIAMI: large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI |V| 633 8,929 10,876 58,228 0.5 million 0.6 million |E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million Zhang and Prakash, SDM2014
26
Experiments: Quality GNUTELLA (IC model) PORTLAND (SIR model)
Higher is better DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA. (See more results in the paper) Zhang and Prakash, SDM2014
27
Experiments: Scalability
did not finish within 10 hours Running time(sec.) Lower is better Zhang and Prakash, SDM2014
28
Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014
29
Conclusion Data-Aware Vaccination problem
Given: Graph and Infected nodes Find: ‘best’ nodes for immunization Complexity NP-hard Hard to approximate within an absolute error DAVA-tree Optimal solution on the tree DAVA and DAVA-fast Merging infected nodes Build a dominator tree, and run DAVA-tree Running time: subquadratic DAVA: O(k(|E|+ |V|log|V|)) DAVA-fast: O(|E|+|V|log|V|) Graph with infected nodes Merged graph Dominator tree Zhang and Prakash, SDM2014
30
Any Questions? Code at: http://people.cs.vt.edu/~yaozhang Yao Zhang
Graph with infected nodes Code at: Merged graph Yao Zhang B. Aditya Prakash Dominator tree Thanks for the support of NSF (Grant No. IIS ). Zhang and Prakash, SDM2014
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.