DAVA: Distributing Vaccines over Networks under Prior Information Yao Zhang, B. Aditya Prakash Department of Computer Science Virginia Tech SDM, Philadelphia, April 24, 2014 Zhang and Prakash, SDM 2014
Motivation: Epidemiology Virus spreads over contact networks SIR model [Anderson+ 1991] Susceptible-Infectious-Recovered Weights pij: propagation prob. from i to j Recovered prob. δ for each node (models mumps-like infections) Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
Motivation: Social Media Meme/Rumor spreads over friendship networks E.g.: Twitter following network Independent cascade model (IC) [Kempe+ KDD2003] Each node has only one chance to infect its neighbors Special case of SIR model Zhang and Prakash, SDM2014
Immunization Centers for Disease Control (CDC) cares about containing epidemic diseases E.g: ~400 million dollars used for vaccines for children in 2013 Twitter tries to stop rumor spread E.g.: rumors of victims after the Boston Marathon bombs in 2013 How to choose best nodes to vaccinate (remove)? Zhang and Prakash, SDM2014
Immunization Good for baseline strategies Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] pick a random person, immunize one of its neighbors at random Netshield [Tong+ 2010] Minimize the epidemic threshold (point when the virus takes-off) Good for baseline strategies Zhang and Prakash, SDM2014
In reality ? Typically the epidemic has already started! this paper Pre-emptive immunization (choose nodes before the epidemic starts) Acquaintance strategy [Cohen+ 2003] Netshield [Tong+ 2010] ? Typically the epidemic has already started! More realistic intervention Which nodes to vaccinate now? We call it Data-Aware Immunization this paper Zhang and Prakash, SDM2014
Outline Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014
Data-Aware Vaccination Problem Problem: Given a set of infected nodes and a contact graph, how to distribute k vaccines (node removal) to minimize the expected number of infected nodes at the end of the epidemic? D D Best solution A A E E B B 1 vaccine? F F C Remove A, save {A, D}; Remove B, save {B}; Remove C, save {C}; C pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
Outline Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014
Complexity of DAV NP-hard Approximation algorithm? See paper for details NP-hard Reduce from Maximum K-Intersection Problem (MaxKI: maximizing the intersection of k subsets) MaxKI is NP-Complete [Vinterbo 2004] Approximation algorithm? Not submodular Actually, DAV is hard to approximate within an absolute error! Zhang and Prakash, SDM2014
Outline Motivation Problem Definition Complexity Our Proposed Methods assume IC model and undirected graph Experiments Conclusion Zhang and Prakash, SDM2014
1: Simplify - Merging infected nodes Idea: merge all the infected nodes into a single ‘super infected’ node I Original Graph Merged Graph Super node I A A pA pA Equivalent pX B B pB pY Logical-OR pB=1-(1-pX)(1-pY) pC pC C C Zhang and Prakash, SDM2014
2: DAVA-Tree Algorithm: Idea Select nodes with the largest “benefit” : the expected number of saved nodes after removing set S on graph G Benefit of adding additional node j into S: # of saved nodes after adding j into S Merged Infected Node Additional number of saved nodes when adding node j into S Benefit: 5 Benefit: 4 pij =1for all edges Benefit: 2 Zhang and Prakash, SDM2014
DAVA-Tree Alg.: Optimal on Trees For any set S: Merged Infected Node Fact 1: the chosen nodes in the optimal set must be neighbors of infected node I Fact 2: the benefit of each such node is independent of the rest of the set S Benefit: 2 Benefit: 5 pij =1for all edges Linear Time Benefit: 4 DAVA-tree algorithm: Select top k node from I’s neighbors with the max. benefit Zhang and Prakash, SDM2014
3: General Case – Arbitrary Graphs Idea We have the optimal algorithm for a tree Extract a spanning tree, then run DAVA-tree What kind of tree? Minimum spanning tree Optimal on MST by DAVA-tree Optimal solution Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. MST pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
3: General Case – Arbitrary Graphs Idea We have the optimal algorithm for a tree Build a spanning tree first What kind of tree? Minimum spanning tree Software engineering We propose to use dominator tree u dominates v Dom captures the ‘closeness’ of nodes to the infectious nodes, and importance of saving nodes. every path from I to v contains u 4 dominates 8,9,10,11 pij =1 for all edges Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
Dominator Tree u is immediate dominator of v u dominates v AND every other dominator of v dominates u Dominator tree: add an edge between every such u and v Optimal from DAVA-tree Optimal solution Linear time [Buchsbaum, Tarjan 1998] pij =1 for all edges Dominator Tree Merged Graph Fact 1: the optimal solution should be among the children of root I in the dominator tree for any arbitrary graph Fact 2: (for special case, k = 1, p = 1) running DAVA-tree on the dominator tree gives the optimal solution Zhang and Prakash, SDM2014
Weighting the dominator tree #P-complete Our solution: maximum propagation path probability between nodes I and v (using Dijkstra’s algorithm) w1 p1 p3 w3 p6 w6 Dominator Tree Merged Graph Zhang and Prakash, SDM2014
DAVA algorithm Step: 1. T = Build a dominator tree Merged Graph (pij =1 for all edges) Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Not finished |S|=2 Iteration=1 Dominator Tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
DAVA algorithm Step: 1. T = Build a dominator tree Merged Graph Step: 1. T = Build a dominator tree 2. v = Run DAVA-tree on T with budget=1 3. Remove v from G 4. Goto Step 1 until |S|=k Remove selected node O(k(|E|+ |V|log|V|)) Too slow for large networks! Dominator tree Not finished |S|=2 Iteration=2 Iteration=1 Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
DAVA-fast: a faster algorithm Step: 1. T = Build a dominator tree 2. S = Run DAVA-tree on T with budget=k Merged Graph |S|=2 In practice, the performance of DAVA-fast is very close to DAVA Time complexity: subquadratic! DAVA-fast: O(|V|log|V|+|E|) Note finished Dominator tree Zhang and Prakash, SDM2014 Zhang and Prakash, SDM 2014
Extending to SIR model See the paper Zhang and Prakash, SDM2014
Outline Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014
Experiments Virus Propagation Model IC and SIR Settings (See more settings in the paper) Randomly uniformly chosen initial infected nodes Baseline Algorithms RANDOM: randomly uniformly chosen healthy nodes DEGREE: choose nodes with top weighted degrees PAGERANK: choose nodes with top pageranks NETSHIELD state-of-the-art pre-emptive immunization algorithm to minimize the epidemic threshold of the graph [Tong+ ICDM 2010] Assumes no data is given before the epidemic starts Zhang and Prakash, SDM2014
Experiments: datasets Datasets are chosen from different domains Social media (IC model) OREGON: AS router graph STANFORD: hyperlink network GNUTELLA: peer-to-peer network BRIGHTKITE: friendship network Epidemiology (SIR model) PORTLAND and MIAMI: large urban social-contact graph used in national smallpox modeling studies [Eubank+, 2004] OREGON STANFORD GNUTELLA BRIGHTKITE PORTLAND MIAMI |V| 633 8,929 10,876 58,228 0.5 million 0.6 million |E| 2,172 53,829 39,994 21,4078 1.6 million 2.1 million Zhang and Prakash, SDM2014
Experiments: Quality GNUTELLA (IC model) PORTLAND (SIR model) Higher is better DAVA consistently outperforms the baseline algorithms. Further DAVA-fast performs almost as well as DAVA. (See more results in the paper) Zhang and Prakash, SDM2014
Experiments: Scalability did not finish within 10 hours Running time(sec.) Lower is better Zhang and Prakash, SDM2014
Outline Motivation Problem Definition Complexity Our Proposed Methods Experiments Conclusion Zhang and Prakash, SDM2014
Conclusion Data-Aware Vaccination problem Given: Graph and Infected nodes Find: ‘best’ nodes for immunization Complexity NP-hard Hard to approximate within an absolute error DAVA-tree Optimal solution on the tree DAVA and DAVA-fast Merging infected nodes Build a dominator tree, and run DAVA-tree Running time: subquadratic DAVA: O(k(|E|+ |V|log|V|)) DAVA-fast: O(|E|+|V|log|V|) Graph with infected nodes Merged graph Dominator tree Zhang and Prakash, SDM2014
Any Questions? Code at: http://people.cs.vt.edu/~yaozhang Yao Zhang Graph with infected nodes Code at: http://people.cs.vt.edu/~yaozhang Merged graph Yao Zhang B. Aditya Prakash Dominator tree Thanks for the support of NSF (Grant No. IIS-1353346). Zhang and Prakash, SDM2014