Computational Methods for Testing Adequacy and Quality of Massive Synthetic Proximity Social Networks Huadong Xia, Christopher Barrett, Jiangzhuo Chen,

Slides:



Advertisements
Similar presentations
A Synthetic Environment to Evaluate Alternative Trip Distribution Models Xin Ye Wen Cheng Xudong Jia Civil Engineering Department California State Polytechnic.
Advertisements

Doc.: IEEE /1387r0 Submission Nov Yan Zhang, et. Al.Slide 1 HEW channel modeling for system level simulation Date: Authors:
Section 1.3 Experimental Design © 2012 Pearson Education, Inc. All rights reserved. 1 of 61.
Section 1.3 Experimental Design.
The Importance of Detail: Sensitivity of Household Secondary Attack Rate and Intervention Efficacy to Household Contact Structure A. Marathe, B. Lewis,
Generated Waypoint Efficiency: The efficiency considered here is defined as follows: As can be seen from the graph, for the obstruction radius values (200,
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Presentation Topic : Modeling Human Vaccinating Behaviors On a Disease Diffusion Network PhD Student : Shang XIA Supervisor : Prof. Jiming LIU Department.
SimDL: A Model Ontology Driven Digital Library for Simulation Systems Jonathan Leidig - Edward A. Fox Kevin Hall Madhav Marathe Henning Mortveit.
Session 11: Model Calibration, Validation, and Reasonableness Checks
More routing protocols Alec Woo June 18 th, 2002.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Simulation Science Laboratory Modeling Disease Transmission Across Social Networks DIMACS seminar February 7, 2005 Stephen Eubank Virginia Bioinformatics.
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Network Dynamics and Simulation Science Laboratory A Data-driven Epidemiological Model Stephen Eubank, Christopher Barrett, Madhav V. Marathe GIACS Conference.
Alain Bertaud Urbanist Module 2: Spatial Analysis and Urban Land Planning The Spatial Structure of Cities: International Examples of the Interaction of.
Comparison of Private vs. Public Interventions for Controlling Influenza Epidemics Joint work with Chris Barrett, Jiangzhuo Chen, Stephen Eubank, Bryan.
Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing.
An Experimental Procedure for Mid Block-Based Traffic Assignment on Sub-area with Detailed Road Network Tao Ye M.A.Sc Candidate University of Toronto MCRI.
Models of Influence in Online Social Networks
Synthesizing Social Proximity Networks by Combining Subjective Surveys with Digital Traces Christopher Barrett*, Huadong Xia*, Jiangzhuo Chen*, Madhav.
Evaluating strategies for pandemic response in Delhi using realistic social networks Huadong Xia Joint work with Kalyani Nagaraj, Jiangzhuo Chen and Madhav.
The Impact of Court Decentralization on Domestic Violence Against Women Raúl Andrade Jimena Montenegro March 2009.
Health promotion and health education programs. Assumptions of Health Promotion Relationship between Health education& Promotion Definition of Program.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Complex Network Analysis of the Washoe County Water Distribution System Presentation By: Eric Klukovich Date: 11/13/2014.
Hongyu Gong, Lutian Zhao, Kainan Wang, Weijie Wu, Xinbing Wang
Experimental Design 1 Section 1.3. Section 1.3 Objectives 2 Discuss how to design a statistical study Discuss data collection techniques Discuss how to.
Exploring Metropolitan Dynamics with an Agent- Based Model Calibrated using Social Network Data Nick Malleson & Mark Birkin School of Geography, University.
Timothy Reeves: Presenter Marisa Orr, Sherrill Biggers Evaluation of the Holistic Method to Size a 3-D Wheel/Soil Model.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Comparing Effectiveness of Top- Down and Bottom-Up Strategies in Containing Influenza Achla Marathe, Bryan Lewis, Christopher Barrett, Jiangzhuo Chen,
A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif Ghafoor, David Ebert, Madiha Sahar Ross Maciejewski,
Berna Keskin1 University of Sheffield, Department of Town and Regional Planning Alternative Approaches to Modelling Housing Market Segmentation: Evidence.
Architecture David Levinson. East Asian Grids Kyoto Nara Chang-an Ideal Chinese Plan.
EpiFast: A Fast Algorithm for Large Scale Realistic Epidemic Simulations on Distributed Memory Systems Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V.S.
Mobility energy use for different residential urban patterns in India Anil Kashyap, Jim Berry, Stanley McGreal, School of the Built Environment.
Aemen Lodhi (Georgia Tech) Amogh Dhamdhere (CAIDA)
Evaluating Transportation Impacts of Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld Kouros Mohammadian Taha.
1 Chapter Two: Sampling Methods §know the reasons of sampling §use the table of random numbers §perform Simple Random, Systematic, Stratified, Cluster,
Showcase /06/2005 Towards Computational Epidemiology Using Stochastic Cellular Automata in Modeling Spread of Diseases Sangeeta Venkatachalam, Armin.
Modeling and Forecasting Household and Person Level Control Input Data for Advance Travel Demand Modeling Presentation at 14 th TRB Planning Applications.
Influenza epidemic spread simulation for Poland – A large scale, individual based model study.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Simulating Diffusion Processes on Very Large Complex networks Joint work with Keith Bisset, Xizhou Feng, Madhav Marathe, and Anil Vullikanti Jiangzhuo.
Coevolution of Epidemics, Social Networks, and Individual Behavior: A Case Study Joint work with Achla Marathe, and Madhav Marathe Jiangzhuo Chen Network.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Bangor Transfer Abroad Programme Marketing Research SAMPLING (Zikmund, Chapter 12)
A Spatial-Temporal Model for Identifying Dynamic Patterns of Epidemic Diffusion Tzai-Hung Wen Associate Professor Department of Geography,
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Section 1.3 Experimental Design.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Optimal Interventions in Infectious Disease Epidemics: A Simulation Methodology Jiangzhuo Chen Network Dynamics & Simulation Science Laboratory INFORMS.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Comparison of Individual Behavioral Interventions and Public Mitigation Strategies for Containing Influenza Epidemic Joint work with Chris Barrett, Stephen.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Urban Planning Group Implementation of a Model of Dynamic Activity- Travel Rescheduling Decisions: An Agent-Based Micro-Simulation Framework Theo Arentze,
Efficient Implementation of Complex Interventions in Large Scale Epidemic Simulations Network Dynamics & Simulation Science Laboratory Jiangzhuo Chen Joint.
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
Transportation Modeling – Opening the Black Box. Agenda 6:00 - 6:05Welcome by Brant Liebmann 6:05 - 6:10 Introductory Context by Mayor Will Toor and Tracy.
Sangeeta Venkatachalam, Armin R. Mikler
Effective Social Network Quarantine with Minimal Isolation Costs
Epidemiological Modeling to Guide Efficacy Study Design Evaluating Vaccines to Prevent Emerging Diseases An Vandebosch, PhD Joint Statistical meetings,
Susceptible, Infected, Recovered: the SIR Model of an Epidemic
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Computational Methods for Testing Adequacy and Quality of Massive Synthetic Proximity Social Networks Huadong Xia, Christopher Barrett, Jiangzhuo Chen, Madhav Marathe IEEE BDSE2013 Network Dynamics and Simulation Science Laboratory Virginia Tech NDSSL TR

We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA Grant HDTRA , DTRA CNIMS Contract HDTRA1-11-D , NIH MIDAS Grant 2U01GM , NSF PetaApps Grant OCI , NSF NetSE Grant CNS Acknowledgement

Background and Contributions Methods: Network Synthesis Comparison of Large Scale Networks Conclusions Outline

Pandemics cause substantial social, economic and health impacts – 1918 flu pandemic, killed million people or 3 to 5 percent of world population. – … – SARS 2003, H1N1 2009, Avian flu (H7N9) 2013 Mathematical and Computational models have played an important role in understanding and controlling epidemics – controlled experiments are not allowed for ethic consideration. – understand the space-time dynamics of epidemics Importance of Computational Epidemiological Models

Heterogeneous Spatial-Temporal features of populations Massive, Irregular, Dynamic and Unstructured Social contact networks are usually synthesized Networked Epidemiology (Figure From the Internet)

Volume Facts in Delhi 13.85M Population 2.67M Households > 200M Contacts 2.64M Locations The Four V’s in Networked Epidemiology Velocity Interactions Change every second Node Status changes every second They are modeled in minute scale Variety Demographics Geographic Temporal Feature Virus Infectivity … … Veracity Data Do we collect enough raw data to render a clear picture? Method Do we extract all useful information out of available raw data? 9am 7am 3pm 8pm

The Veracity of the network one makes depends on: – Time available to make such a network (human, computational) – The data available to make the network – The specific question that one would like to investigate Different level of networks may be retrieved for the same region. How do we evaluate networks that span large regions? – How to compare two networks constructed for the same population? – When is the synthesized network adequate? Social Contact Network Modeling and Analysis

Propose a number of network measurements to understand and compare urban scale social contact networks which are extremely large, dynamics and unstructured. Explore quantitatively the adequacy standards in modeling proximity networks. Contributions

Background and Contributions Methods: Network Synthesis Comparison of Large Scale Networks Conclusions Outline

Synthetic Populations and Their Contact Networks Goal:  Determine who are where and when. Process:  Create a statistically accurate baseline population  Assign each individual to a home  Estimate their activities and where these take place  Determine individual’s contacts & locations throughout a day.

Constructing Synthetic Social Contact Networks

Networks capture social interaction pertinent to the disease We focus on flu like diseases and the appropriate network is a social contact network based on proximity relationship. What Is a Network Edge attributes: activity type: shop, work, school activity type: shop, work, school (start time 1, end time 1) (start time 1, end time 1) (start time 2, end time 2) (start time 2, end time 2) … … Vertex attributes: (x,y,z) (x,y,z) land use land use … …Locations Vertex attributes: age age household size household size gender gender income income … … People

Two Sets of Data Sources and Generation Methods for Delhi Synthetic Population and Network Data & Methodsthe coarse networkthe detailed network data demographicsIndia census 2001 India census micro- data (India Human Development Survey - UMD) geographic data LandScan 2007MapMyIndia activitygeneric activity templates Thane travel survey residential contact survey method peopledistributiondistribution/IPF locationsdensity Real locations+ home along roads activity schedules categorized templates decision tree + templates configuration model activity locations gravity model

Residential Contacts: for the Detailed Network Only Office Mall School Residential Area

Motivation of the residential contact network : – Approximate 40% adults in India do not travel to work. The network model interaction among them around their homes (within residential area). Survey data collected : – age, gender of staying at home people: node label – contact durations/frequencies of each person near their home: edge label/node degree Formal question: generate a random network s.t. – Given degree distribution of a bunch of nodes – Given label of each node – Assumption: network tend to be homophilous (nodes of the similar labels is connected with higher probability ) Method: – Configuration model with the added feature of node homophilous. – Refer to the next slide for details. Generation of the Residential Network

Population for the coarse networkPopulation for the detailed network Population Synthesis M47 F22F4 M71 M17 F22F11F46 M23M33 F2 M53 Split into HHs F36F6 M13M65 M47 F22F4 M71 M17 F22F11F46 M23M13 F2 M53 F36F6 M13M65 M47 F22F6 M65 M17 F22F2F46 M23M1\23 F4 M53 F36F11 M33M71 Extract individuals M47 F22F4 M71 M17 F22F11F46 M23M21 F2 M53 F36F6 M13M65

Metrics – Entity level: the population, built infrastructure and their layout – Collective level: validate against aggregate statistics. – Network level: structural properties – Epidemic dynamics level: policy effects How to Compare Two Networks

Individual level age-gender structure Comparison for Synthetic Populations Household level demographic structure Entropy: 1.35 v.s. 1.02

the Coarse Network Precision of Location Distribution the Detailed Network LandScan Grid Synthetic LocationsReal Locations

Activity Statistics

Note: First Row: the coarse network; Second Row: the detailed network Temporal Visiting Degree in Random Selected Locations

travel distance distribution radius of gyration distribution G PL : Temporal and Spatial Properties

G PL : Structural Properties The people-location network G PL : the degree of a large portion of nonhome Locations have a power law like distribution.

People-People Network G P

Disease Spread in a Social Network Within-host disease model: SEIR Between-host disease model: – probabilistic transmissions along edges of social contact network – from infectious people to susceptible people

Epidemic Simulations to Study the Delhi Population Disease model  Flu similar to H1N1 in 2009: assume R 0 =1.35, 1.40, 1.45, 1.60 (only the results when R 0 =1.35 are shown, but others are similar)  SEIR model: heterogeneous incubation and infectious durations  10 random seeds every day Interventions  Vaccination: implemented at the beginning of epidemic; compliance rate 25%  Antiviral: implemented when 1% population are infectious; covers 50% population; effective for 15 days  School closure: implemented when 1% population are infectious; compliance rate 60%; lasts for 21 days  Work closure: implemented when 1% population are infectious; compliance rate 50%; lasts for 21 days Total five configurations (including base case). Each configuration is simulated for 300 days and 30 replicates

Comparison in Epidemic Simulations Impact to Epidemic Dynamics (R 0 =1.35): – The coarse network exploits generic activity schedules, where people travel much more frequently. Therefore, the two networks show very different epidemic dynamics in base case.

Similarities of two networks: – Vaccination is still most effective strategy. – Pharmaceutical interventions is more effective than the non-pharmaceutical. – School closure is more effective than work closure Differences of two networks – Severity is significantly different – In delaying outbreak of disease, school closure is more effective than Antiviral in the coarse network, which is on the contrary in the detailed network. Epidemic Simulation Results: Interventions

CategoriesMetrics Underlying Synthetic Population Household Structure Location Layout Duration of Activities Number of Daily Activities Travel Distance Radius of Gyration G PL Temporal Degree of Random Locations Degree of People-Location Graphs GPGP Degree, Clustering Coefficient, Contact Duration, Shortest Path Epidemic Dynamics No Interventions Pharmaceutical Interventions Non-Pharmaceutical Interventions Metrics Review

Novel methodologies in creating a realistic social contact network for a typical urban area in developing countries Comparison to a coarser network suggests: – Similarity reflects generic properties for social contact networks – Region specific features are captured in the detailed model – The epidemic dynamics of the region is strongly influenced by activity pattern and demographic structure of local residents – A higher resolution social contact network helps us make better public health policy A realistic representation of social networks require adequate empirical input. We propose the criteria of adequacy: – Does the new input decrease uncertainty of the system? – Does the new input significantly change epidemics and intervention policy? Conclusions

END Questions?

EXTRA SLIDES

Calibrate R 0 to be 1.35 Vulnerability is defined as: Normalized number of infected over 10,000 runs of random simulations Vulnerability distribution of the detailed network is flat comparing to the coarse network, and it is less vulnerable due to less frequent travel. Epidemic Simulation Results: Vulnerability

Calibrate R 0 to be 1.35 Epidemic Simulation Results

Case study: – Delhi (NCT-I): a representative south Asian city that was never studied before. Statistics: – million people in 2001; 22 million in 2011 – Most populous metropolis: 2 nd in India; 4 th in the world – 573 square miles, 9 regions (refer to the pic) – The Yamuna river going through urban area. Unique socio-cultural characteristics: – Large slum area – Tropical weather – Environmental hygiene Delhi: National Capital Territory of India

Two Versions of Delhi Networks The coarse network: – Based on very limited data – Generic methodology applicable to any region in world The detailed network: – Requires household level micro sample data and other detailed data, not available for all countries Improvement on results is expected: – to evaluate the network generation model; – to understand importance of different levels of details.

Population generation Input: Joint distribution of age and gender of the population in Delhi (from the India census 2001) Algorithm: – Normalize the counts in the joint distribution of age and gender into a joint probability table – Create million individuals one by one. For each individual: Randomly select a cell c with the probability of each cell of the city. Create a person with the age and gender corresponding to the cell c. End Output: million individuals are created, each individual is associated with disaggregate attributes of gender and age. V1: Synthetic Population Generation

Demographic Data: basic census data + India Micro-Sample – India Census 2001 – Micro sample for household structure: India Human Development Survey 2005 by the University of Maryland and the National Council of Applied Economic Research, which tells about each household sample: hh size, hh head’s age, hh income, house types, animal care; and also for each individual in the hh: demographic details, religion, work, marital status, relationship to head, etc. Activity Data: Thane travel survey + residential contacts survey – Activity templates from 2001 Household Travel Survey statistics for Thane, India, and school attendance statistics from the UNESCO Institute of Statistics (UIS) o Activity templates are extracted with CART, and assigned to synthetic population with decision tree. – Survey on residential area contacts in India, conducted by NDSSL o Approximate 40% adults in India do not travel to work. The survey focused on them. o Collected people’s age, gender, and contact durations/frequencies near their home. Location Data: MapMyIndia data – Ward-wise statistics for population and households. – Coordinates for locations such as schools, shopping centers, hotels etc. – Infrastructures such as roads, railway stations, land use etc. – Boundary for each city, town and ward. Data Input

Same methodology as we did for US populations: Input:total # of households Aggregate distribution of demographic properties from Census: hh size, householder’s age Household micro-samples Output: Synthetic population with household structure. Each individual is assigned an age and gender. Algorithm: 1. Estimate joint distribution of household size and householder’s age: 1) construct a joint table of hh size and householder’s age: fill in # of samples for each cell 2) multiply total # of households to distributions to calculate marginal totals for the table 3) run IPF to get a convergent joint table 4) normalize: divide counts in each cell with (total # of samples), it’s probability for each cell. (illustrated in next slide) 2. create the synthetic households and population: 1) randomly select a cell with the probability in joint table 2) select a household sample h from all samples associated with that cell uniformly at random 3) create a synthetic household H, so that H has same members as h, each member in H has same demographic attributes as those in h. 4) repeat step , until # of synthetic households is equal to the total # of households from Census. V2: synthetic population creation method

IPF example Row Adjustment Column Adjustment Iteration Iteration Iteration 3: Finished Row Column Start

V2: household distribution – a snapshot Households are distributed along real streets/community blocks. V2 avoids to distribute households on rivers, lakes and green land etc. (V1 distribute them uniformly within each 1(miles)*1(miles) block)

Activity templates generation Flowchart: Generating Activity Sequences based on Thane Survey for Delhi-V2 Frequency distribution of reported activity sequences Demographics of the Thane sample population; UIS stat Demographics of the Thane sample population; UIS stat 1) Demographics 2) Act template: Activity sequence Activity duration 1) Demographics 2) Act template: Activity sequence Activity duration Commute categories Activity sequences sampling Data sources: Outcome: sampling decision tree Frequency distribution of trips: Trip start time Trip length Frequency distribution of trips: Trip start time Trip length

Motivation of the residential contact network: – Approximate 40% adults in India do not travel to work. The network model interaction among them around their homes (within residential area). Survey data collected: – age, gender of staying at home people: node label – contact durations/frequencies of each person near their home: edge label/node degree Formal question: generate a random network s.t. – Given degree distribution of a bunch of nodes – Given label of each node – Assumption: network tend to be homophilous (nodes of the similar labels is connected with higher probability ) Method: – Configuration model with the added feature of node homophilous. – Refer to the next slide for details. Generation of the Residential Network

For each edge-type in (long-dur, mid-dur, short-dur), do: 1. Initialize each node with a degree drawn i.i.d. from the degree distribution according to its label (age/gender) 2. Form a list of “stubs” – connections of nodes that haven’t be matched with neighbors. Call it stubList. 3. Pick a starting node v 0 randomly. 4. For each of v 0 ’s stubs, choose an element v 1 from the stubList as described in following: 1) v 1 is chosen randomly from the stubList; 2) if v1 is same as v0 or already connected to v0, go to 4.1). 3) with a probability p (>0.5), we do test if v 1 is similar to v 0, if not, go to 4.1) and repeat the selection. 4) create an edge between v 0 and v 1, its duration is computed randomly based on the edge-type (long, mid or short duration) Done. Random Network Generation: configuration model with the added feature of node homophilous.