Presentation is loading. Please wait.

Presentation is loading. Please wait.

DAVA: Distributing Vaccines over Networks under Prior Information

Similar presentations


Presentation on theme: "DAVA: Distributing Vaccines over Networks under Prior Information"— Presentation transcript:

1 DAVA: Distributing Vaccines over Networks under Prior Information
Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014 Zhang and Prakash, SDM 2014

2 Motivation: Epidemiology
Virus spreads over contact networks SIR model [Anderson+ 1991] Susceptible-Infectious-Recovered Weights pij: propagation prob. from i to j Recovered prob. δ for each node (models mumps-like infections) Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

3 Motivation: Social Media
Meme/Rumor spreads over friendship networks E.g.: Twitter following network Independent cascade model (IC) [Kempe+ KDD2003] Each node has only one chance to infect its neighbors Special case of SIR model Zhang and Prakash, SDM2014

4 Immunization Centers for Disease Control (CDC) cares about containing epidemic diseases E.g: ~400 million dollars used for vaccines for children in 2013 Twitter tries to stop rumor spread E.g.: rumors of victims after the Boston Marathon bombs in 2013 How to choose best nodes to vaccinate (remove)? Zhang and Prakash, SDM2014

5 Immunization Good for baseline strategies
Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] pick a random person, immunize one of its neighbors at random Netshield [Tong+ 2010] Minimize the epidemic threshold (point when the virus takes-off) Good for baseline strategies Zhang and Prakash, SDM2014

6 In reality ? Typically the epidemic has already started! this paper
Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] Netshield [Tong+ 2010] ? Typically the epidemic has already started! More realistic intervention Which nodes to vaccinate now? We call it Data-Aware Immunization this paper Zhang and Prakash, SDM2014

7 Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014

8 Data-Aware Vaccination Problem
Problem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic? D D Best solution A A E E B B 1 vaccine? F F C Remove A, save {A, D}; Remove B, save {B}; Remove C, save {C}; C pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

9 Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014

10 Complexity of DAV NP-hard Approximation algorithm?
See paper for details NP-hard Reduce from Maximum K-Intersection Problem (MaxKI: maximizing the intersection of k subsets) MaxKI is NP-Complete [Vinterbo 2004] Approximation algorithm? Not submodular Actually, DAV is hard to approximate within an absolute error! Zhang and Prakash, SDM2014

11 Outline Motivation Problem Definition Complexity Our Proposed Methods
assume IC model and undirected graph Experiments Conclusion Zhang and Prakash, SDM2014

12 1: Simplify - Merging infected nodes
Idea: merge all the infected nodes into a single ‘super infected’ node I Original Graph Merged Graph Super node I A A pA pA Equivalent pX B B pB pY Logical-OR pB=1-(1-pX)(1-pY) pC pC C C Zhang and Prakash, SDM2014

13 2: DAVA-Tree Algorithm: Idea
Select nodes with the largest “benefit” : the expected number of saved nodes after removing set S on graph G Benefit of adding additional node j into S: # of saved nodes after adding j into S Merged Infected Node Additional number of saved nodes when adding node j into S Benefit: 5 Benefit: 4 pij =1for all edges Benefit: 2 Zhang and Prakash, SDM2014

14 DAVA-Tree Alg.: Optimal on Trees
For any set S: Merged Infected Node Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I Fact 2: the benefit of each such node is independent of the rest of the set S Benefit: 2 Benefit: 5 pij =1for all edges Linear Time Benefit: 4 DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit Zhang and Prakash, SDM2014

15 3: General Case – Arbitrary Graphs
Idea We have the optimal algorithm for a tree Extract a spanning tree, then run DAVA-tree What kind of tree? Minimum spanning tree Optimal on MST by DAVA-tree Optimal solution Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. MST pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

16 3: General Case – Arbitrary Graphs
Idea We have the optimal algorithm for a tree Build a spanning tree first What kind of tree? Minimum spanning tree Software engineering We propose to use dominator tree u dominates v Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. every path from I to v contains u 4 dominates 8,9,10,11 pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

17 Dominator Tree u is immediate dominator of v
u dominates v AND every other dominator of v dominates u Dominator tree: add an edge between every such u and v Optimal from DAVA-tree Optimal solution Linear time [Buchsbaum, Tarjan 1998] pij =1 for all edges Dominator Tree Merged Graph Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution Zhang and Prakash, SDM2014

18 Weighting the dominator tree
#P-complete Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm) w1 p1 p3 w3 p6 w6 Dominator Tree Merged Graph Zhang and Prakash, SDM2014

19 DAVA algorithm Step: 1. T = Build a dominator tree
Merged Graph (pij =1 for all edges) Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Not finished |S|=2 Iteration=1 Dominator Tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

20 DAVA algorithm Step: 1. T = Build a dominator tree
Merged Graph Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Remove selected node O(k(|E|+ |V|log|V|)) Too slow for large networks! Dominator tree Not finished |S|=2 Iteration=2 Iteration=1 Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

21 DAVA-fast: a faster algorithm
Step: 1. T = Build a dominator tree 2. S = Run DAVA-tree on T with budget=k Merged Graph |S|=2 In practice, the performance of DAVA-fast is very close to DAVA Time complexity: subquadratic! DAVA-fast: O(|V|log|V|+|E|) Note finished Dominator tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014

22 Extending to SIR model See the paper Zhang and Prakash, SDM2014

23 Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014

24 Experiments Virus Propagation Model
IC and SIR Settings (See more settings in the paper) Randomly uniformly chosen initial infected nodes Baseline Algorithms RANDOM: randomly uniformly chosen healthy nodes DEGREE: choose nodes with top weighted degrees PAGERANK: choose nodes with top pageranks NETSHIELD state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010] Assumes no data is given before the epidemic starts Zhang and Prakash, SDM2014

25 Experiments: datasets
Datasets are chosen from different domains Social media (IC model) OREGON: AS router graph STANFORD: hyperlink network GNUTELLA: peer-to-peer network BRIGHTKITE: friendship network Epidemiology (SIR model) PORTLAND and MIAMI: large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI |V| 633 8,929 10,876 58,228 0.5 million 0.6 million |E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million Zhang and Prakash, SDM2014

26 Experiments: Quality GNUTELLA (IC model) PORTLAND (SIR model)
Higher is better DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA. (See more results in the paper) Zhang and Prakash, SDM2014

27 Experiments: Scalability
did not finish within 10 hours Running time(sec.) Lower is better Zhang and Prakash, SDM2014

28 Outline Motivation Problem Definition Complexity Our Proposed Methods
Experiments Conclusion Zhang and Prakash, SDM2014

29 Conclusion Data-Aware Vaccination problem
Given: Graph and Infected nodes Find: ‘best’ nodes for immunization Complexity NP-hard Hard to approximate within an absolute error DAVA-tree Optimal solution on the tree DAVA and DAVA-fast Merging infected nodes Build a dominator tree, and run DAVA-tree Running time: subquadratic DAVA: O(k(|E|+ |V|log|V|)) DAVA-fast: O(|E|+|V|log|V|) Graph with infected nodes Merged graph Dominator tree Zhang and Prakash, SDM2014

30 Any Questions? Code at: http://people.cs.vt.edu/~yaozhang Yao Zhang
Graph with infected nodes Code at: Merged graph Yao Zhang B. Aditya Prakash Dominator tree Thanks for the support of NSF (Grant No. IIS ). Zhang and Prakash, SDM2014


Download ppt "DAVA: Distributing Vaccines over Networks under Prior Information"

Similar presentations


Ads by Google