Simulating Diffusion Processes on Very Large Complex networks Joint work with Keith Bisset, Xizhou Feng, Madhav Marathe, and Anil Vullikanti Jiangzhuo.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Supporting Cooperative Caching in Disruption Tolerant Networks
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
School of Information University of Michigan Network resilience Lecture 20.
The Importance of Detail: Sensitivity of Household Secondary Attack Rate and Intervention Efficacy to Household Contact Structure A. Marathe, B. Lewis,
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Presentation Topic : Modeling Human Vaccinating Behaviors On a Disease Diffusion Network PhD Student : Shang XIA Supervisor : Prof. Jiming LIU Department.
National Infrastructure Simulation & Analysis Center NISAC PUBLIC HEALTH SECTOR: Disease Outbreak Consequence Management Stephen Eubank Los Alamos National.
SimDL: A Model Ontology Driven Digital Library for Simulation Systems Jonathan Leidig - Edward A. Fox Kevin Hall Madhav Marathe Henning Mortveit.
Technical Architectures
Miriam Nuño Harvard School of Public Health, USA Gerardo Chowell Los Alamos National Laboratory, USA Abba Gumel University of Manitoba, Canada AIMS/DIMACS/SACEMA.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
Modelling the control of epidemics by behavioural changes in response to awareness of disease Savi Maharaj (joint work with Adam Kleczkowski) University.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference.
Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.
Course Instructor: Aisha Azeem
Comparison of Private vs. Public Interventions for Controlling Influenza Epidemics Joint work with Chris Barrett, Jiangzhuo Chen, Stephen Eubank, Bryan.
Synthesizing Social Proximity Networks by Combining Subjective Surveys with Digital Traces Christopher Barrett*, Huadong Xia*, Jiangzhuo Chen*, Madhav.
1 Reasons for parallelization Can we make GA faster? One of the most promising choices is to use parallel implementations. The reasons for parallelization.
1 Worm Modeling and Defense Cliff C. Zou, Don Towsley, Weibo Gong Univ. Massachusetts, Amherst.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Computational Methods for Testing Adequacy and Quality of Massive Synthetic Proximity Social Networks Huadong Xia, Christopher Barrett, Jiangzhuo Chen,
Emerging Infectious Disease: A Computational Multi-agent Model.
Fault Tolerance via the State Machine Replication Approach Favian Contreras.
Stanislaus County It’s Not Flu as Usual It’s Not Flu as Usual Pandemic Influenza Preparedness Renee Cartier Emergency Preparedness Manager Health Services.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
DCE (distributed computing environment) DCE (distributed computing environment)
Interaction-Based HPC Modeling of Social, Biological, and Economic Contagions Over Large Networks Network Dynamics & Simulation Science Laboratory Jiangzhuo.
Comparing Effectiveness of Top- Down and Bottom-Up Strategies in Containing Influenza Achla Marathe, Bryan Lewis, Christopher Barrett, Jiangzhuo Chen,
A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif Ghafoor, David Ebert, Madiha Sahar Ross Maciejewski,
EpiFast: A Fast Algorithm for Large Scale Realistic Epidemic Simulations on Distributed Memory Systems Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V.S.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
Showcase /06/2005 Towards Computational Epidemiology Using Stochastic Cellular Automata in Modeling Spread of Diseases Sangeeta Venkatachalam, Armin.
Exploratory Visualization of Infectious Disease Propagation Ben Houston, Neuralsoft Zack Jacobson, Health Canada NX-Workshop on Social Network Analysis.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Scaling Agent-based Simulation of Contagion Diffusion over Dynamic Networks on Petascale Machines Keith Bisset Jae-Seung Yeom, Ashwin Aji
Coevolution of Epidemics, Social Networks, and Individual Behavior: A Case Study Joint work with Achla Marathe, and Madhav Marathe Jiangzhuo Chen Network.
1 Object Oriented Logic Programming as an Agent Building Infrastructure Oct 12, 2002 Copyright © 2002, Paul Tarau Paul Tarau University of North Texas.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
1 Epidemic Spreading Parameters: External Model based on population density and travel statistics.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
MSc in High Performance Computing Computational Chemistry Module Parallel Molecular Dynamics (i) Bill Smith CCLRC Daresbury Laboratory
Optimal Interventions in Infectious Disease Epidemics: A Simulation Methodology Jiangzhuo Chen Network Dynamics & Simulation Science Laboratory INFORMS.
Comparison of Individual Behavioral Interventions and Public Mitigation Strategies for Containing Influenza Epidemic Joint work with Chris Barrett, Stephen.
Chapter 1 Database Access from Client Applications.
Dynamic Simulation of an Influenza Pandemic: Planning Aid for Public Health Decision Makers M. Eichner 1, M. Schwehm 1, S.O. Brockmann 2 1 Department of.
Fast Parallel Algorithms for Edge-Switching to Achieve a Target Visit Rate in Heterogeneous Graphs Maleq Khan September 9, 2014 Joint work with: Hasanuzzaman.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Efficient Implementation of Complex Interventions in Large Scale Epidemic Simulations Network Dynamics & Simulation Science Laboratory Jiangzhuo Chen Joint.
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
1 Preparedness for an Emerging Infection Niels G Becker National Centre for Epidemiology and Population Health Australian National University This presentation.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Bayesian Biosurveillance of Disease Outbreaks RODS Laboratory Center for Biomedical Informatics University of Pittsburgh Gregory F. Cooper, Denver H.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Internet Quarantine: Requirements for Containing Self-Propagating Code
TensorFlow– A system for large-scale machine learning
Network Science in NDSSL at Virginia Tech
Effective Social Network Quarantine with Minimal Isolation Costs
Chapter 6 – Architectural Design
Chapter 5 Architectural Design.
Markus Schwehm and Martin Eichner free  traced, quarantine
Presentation transcript:

Simulating Diffusion Processes on Very Large Complex networks Joint work with Keith Bisset, Xizhou Feng, Madhav Marathe, and Anil Vullikanti Jiangzhuo Chen Network Dynamics & Simulation Science Laboratory SIAM Conference on Parallel Processing for Scientific Computing (PP10) February 25, 2010

Network Dynamics & Simulation Science Laboratory Talk Outline Background: simulation of infectious disease propagation on social contact networks HPC-based parallel epidemic simulation tools developed at NDSSL: –EpiSims –EpiSimdemics –EpiFast –Indemics Summarize

Network Dynamics & Simulation Science Laboratory Epidemic Simulation This talk will focus on simulating a major diffusion process – epidemic evolution – on large scale social contact networks Models and ideas can be extended to other diffusion processes on large scale networks like: –Norms and fads in social networks –Worms in communication networks

Network Dynamics & Simulation Science Laboratory Disease Spread in a Social Network Within-host disease progression is modeled as a local state transition function, called PTTS (probabilistic timed transition system) –Health states include: susceptible, infected but not infectious (incubating), infectious but asymptomatic, infectious and symptomatic, recovered. –State transitions are probabilistic and timed Between-host disease transmission occurs along edges of a social contact network –Represented by people-location graph or people-people graph –Transmissions are probabilistic –Probability of a person getting infected depends only on his local properties and people that he has contact with: symmetry assumption

Network Dynamics & Simulation Science Laboratory Interventions Pharmaceutical interventions (PI’s): vaccination or antiviral changes an individual’s role in the transmission chain –Lower susceptibility to infection –Lower infectiousness if infected –The degree these are lowered depends on the efficacy of the vaccine or antiviral Non-pharmaceutical interventions (NPI’s): social distancing measures change people activities and change the social network –Generic social distancing, school closure, isolation, etc. Interventions are often associated with trigger conditions and selected subpopulations –When, how, and to whom these are applied can have different impact on the course of the epidemic

Network Dynamics & Simulation Science Laboratory EpiSims: Precise Simulation Parallel discrete event simulation (PDES) Within-host disease progression: coupled PTTS (probabilistic timed transistion system) Between-host disease transmission: people- location bipartite graph Complex interventions: –NPI’s change people activity schedules, so change people-location graph –PI’s change people susceptibility/infectivity

Network Dynamics & Simulation Science Laboratory EpiSims: Parallel Algorithm Symmetric computations among processors Locations are partitioned and assigned to processors Each person v has corresponding data D v (demographics, health state, activity schedule, etc.) D v is moved from the processor which has location A to the processor which has location B if v moves from A to B System synchronizes at every event: person changes activity location, person health state changes, etc.

Network Dynamics & Simulation Science Laboratory EpiSims: Performance Too many events: too many synchronizations High communication cost for moving data between processors Scales poorly for large urban populations (size > 10 million) Simulation running time: magnitude of hours~days for Chicago (9 million people) Good for small populations or small number of replicates

Network Dynamics & Simulation Science Laboratory EpiSimdemics: Fast Simulation Parallel discrete time simulation Within-host disease progression: coupled PTTS (highly configurable disease model) Between-host disease transmission: people-location bipartite graph Complex interventions specified by scenario scripting language: –NPI’s change people activity schedules, so change people-location graph –PI’s change people susceptibility/infectivity

Network Dynamics & Simulation Science Laboratory EpiSimdemics: Peformance Parallel algorithm: –Partition locations and assign them to processors –Partition people and assign them to processors –At each time step, for each location compute a serial DES; system synchronization Approximations (from PDES): –discrete time simulation –Relaxation of causality constraint: system synchronization at every time step (every simulation day) C++/MPI implementation Scaling: 100 million people on 1k cores Simulation running time: magnitude of hours for large urban populations

Network Dynamics & Simulation Science Laboratory EpiSimdemics Algorithm Generate the population Set initial infections Based on activities move the people to the locations Compute interactions among the people at the locations Some exposed people may become infected After their activities, the people are moved back to their home PE Update state of person at his home PE

Network Dynamics & Simulation Science Laboratory EpiSimdemics: Use Case Flu in Alabama –Similar to studies done for sponsors (NIH, CDC, DTRA, etc.) –Population size = 4.3 million 4 interventions (16 combinations) –Vaccination (prevaccinate children, critical workers) –School closure (close/reopen on per county basis) –Quarantine of critical workers –Self isolation (when global attack rate is high)

Network Dynamics & Simulation Science Laboratory Disease Model

Network Dynamics & Simulation Science Laboratory Results

Network Dynamics & Simulation Science Laboratory Results for Critical Workers Quarantine of critical workers has no effect on the general population When critical workers are vaccinated and quarantined, their infection rate drops from 40% to 18% May be important for continuous functioning of society

Network Dynamics & Simulation Science Laboratory EpiFast: Faster Simulation Parallel discrete time simulation based on percolation model Within-host disease progression: standard SEIR disease model Between-host disease transmission: through edges of people-people contact network Interventions –Pharmaceutical or non-pharmaceutical: predefined impacts on network nodes and edges –On day or when a given threshold is met, apply intervention on subpopulation

Network Dynamics & Simulation Science Laboratory EpiFast: Social Contact Network From people-location graph to people-people contact network: –People follow daily activity schedules –Activities take them to locations –At locations they interact with each other –Interactions form contact network –Nodes are people, edges are contacts, edge weights are contact durations Interactions in a population can get very complex –e.g. New York city has 18 million people and a total of 1 billion interactions Disease spread in contact network depends on –Duration of contact –Types of activities while in contact –Characteristics of the infectious person –Characteristics of the susceptible person EpiFast assumes the network remains the same from day to day unless with interventions

Network Dynamics & Simulation Science Laboratory EpiFast: Algorithm Parallel implementation: –Master-slave model –Partition contact network: each slave processor is assigned a subset of nodes and all outgoing edges –Single master processor: communication; many slave processors: computation –Highly portable C++/MPI implementation Approximations (from PDES): –Network edges (contacts) are not ordered by time –Network remains the same from day to day unless with interventions –Synchronizes every simulation day –Interventions change the contact network (node or edge properties); changes are approximate

Network Dynamics & Simulation Science Laboratory EpiFast: Performance By far the fastest (to our best knowledge) among all epidemic simulations that can handle realistic synthetic populations and provide comparable support for realistic intervention measures. –Network of 16 million nodes and 900 million edges: <20 minutes per replicate on as few as 32 processors Scales well on distributed memory systems –Good strong and weak scaling properties

Network Dynamics & Simulation Science Laboratory EpiFast Performance: Strong Scaling

Network Dynamics & Simulation Science Laboratory EpiFast Performance: Week Scaling PopulationPopulation SizeCPU NumberRunning Time (seconds) per simulation day Miami Boston Chicago

Network Dynamics & Simulation Science Laboratory EpiFast: Use Case Factorial design: 2x2x2x2 x 25 replicates = 400 runs

Network Dynamics & Simulation Science Laboratory Indemics: Interactive Simulation Indemics: Interactive Epidemic Simulation and Modeling Environment New data-centric architecture for interactive epidemic simulation environments Decouples the data, disease diffusion, intervention and user interaction –Simplify design and implementation of simulation engine –performance can be optimized separately High performance computing service architecture –User can access the system via a web server from anywhere –HPC-based system supports coordination, data management and disease diffusion –Reduction in speed is easily compensated by ease of interaction and rich feature set

Network Dynamics & Simulation Science Laboratory Indemics: System Architecture HPC Epidemic Simulator (e.g. EpiFast) Indemics Adapater Indemics Server New Interventions New disease state Indemics Adapter Queries & Interventions Interactive Client Batch Client Indemics web-interface Client,on PC Analyst sees only this module Indemics database running on a data server Indemics Server, running on head node of HPC Relational database Temporal database Semi- structured database

Network Dynamics & Simulation Science Laboratory Indemics: Abstractions Data Models: –Relational data about individuals (P) –Social contact network (N) –Transmission network/dendrogram (D) Models of Interaction between user, data and model –Query: function of (P,N,D) E.g. who are sick, how many of a concerned subpop are sick Does not change the disease progression Can be expressed by SQL script –Intervention: active interaction makes a change to the social network, individual behavior, or disease model; and moves the simulation forward apply intervention to subpopulation with parameter

Network Dynamics & Simulation Science Laboratory Indemics: Queries Queries on a single data type –(P) Find all school-age people in Seattle –(N) Find all network neighbors (contacts) of a specific person –(D) Find all people infected in last week Queries across multiple data types –Count number of infected persons in zip code (Blacksburg, VA) –Find infectious students in Blacksburg high school and their family members Users interact with the system using well-defined languages –Indemics commands: count infected persons : group = seniors, infected day = between 20 and 22 –SQL statements: select edge_head from network table SN and infection table INF where SN.edge_tail = INF.infected_pid and infection_day = 20 (find contacts of people infected on day 20) –Libraries of queries can be pre-defined by expert users

Network Dynamics & Simulation Science Laboratory Indemics: Interventions Intervention abstraction apply intervention to subpopulation with parameter Intervention types vaccination, antiviral, school closure, work closure, generic social distancing, etc. Subpopulations (to which interventions are applied) –Predefined groups: preschool, school age, adult, senior, critical workers, etc –Dynamic group: result of any query e.g.: group g = {family members of persons who were infected on day 10} Indemics command –apply interventions: type = antivirus, duration = 20, group = school age, infected_day = between 24 and 30 –apply interventions: type=work closure, duration = 20, group = adults, infected day = between 20 and 21; type = school closure, duration = 5, group = school age

Network Dynamics & Simulation Science Laboratory Indemics: Performance Simulates what the original simulator can simulate, with small overhead –E.g. same scenario: vaccinate preschools on threshold condition. Indemics occurs 70% overhead running time on top of EpiFast. Far larger modeling capabilities with reasonably good performance –E.g. household level interventions were not supported in EpiFast. –Indemics can handle them easily. Cost of interaction and data communication is marginal (~20%) comparing with simulation cost. –Overhead for interactions is easily offset by infinitely larger capability and flexibility provided by Indemics.

Network Dynamics & Simulation Science Laboratory Indemics Performance: Cost of Interaction/Communication

Network Dynamics & Simulation Science Laboratory Indemics: Use Case Public intervention versus private intervention: Ring vaccination –Administered by public health authorities –All direct contacts of any infectious individual are identified and vaccinated (by government) D1 vaccination –Individual self-motivated –People voluntarily take vaccines when they find the number of infectious people among their direct contacts exceed some threshold

Network Dynamics & Simulation Science Laboratory Results

Network Dynamics & Simulation Science Laboratory Summary of Our Epidemic Simulation Tools EpiSims: precise but slow; good for smaller populations and complicated scenarios and interventions. EpiSimdemics: fast, good scaling; good for large populations and complicated scenarios and interventions but only a few replicates EpiFast: fastest, good scaling; good for large population with simple scenarios and predefined interventions but large number of replicates Indemics: most capable and flexible, extra running time; good for exploring various intervention strategies

Network Dynamics & Simulation Science Laboratory Thanks!

Network Dynamics & Simulation Science Laboratory Work funded in part by NIGMS, NIH MIDAS program, CDC, Center of Excellence in Medical Informatics, DTRA CNIMS, NSF, NeTs, NECO and OCI program, VT Foundation.