Neza Vodopivec Applied Math and Scientific Computation Program Advisor: Dr. Jeffrey Herrmann Department of Mechanical Engineering.

Slides:



Advertisements
Similar presentations
Jacob Goldenberg, Barak Libai, and Eitan Muller
Advertisements

Chapter 7 Hypothesis Testing
Modeling and Simulation By Lecturer: Nada Ahmed. Introduction to simulation and Modeling.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
New Mexico Computer Science For All Designing and Running Simulations Maureen Psaila-Dombrowski.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
Neza Vodopivec Applied Math and Scientific Computation Program Advisor: Dr. Jeffrey Herrmann Department of Mechanical Engineering.
Lecture 2 Today: Statistical Review cont’d:
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
1 Complexity of Network Synchronization Raeda Naamnieh.
Simulation.
Development of Empirical Models From Process Data
1 Validation and Verification of Simulation Models.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Experimental Evaluation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Short Term Load Forecasting with Expert Fuzzy-Logic System
New Mexico Computer Science for All Computational Science Investigations (from the Supercomputing Challenge Kickoff 2012) Irene Lee December 9, 2012.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Inference for regression - Simple linear regression
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Chapter 8: Systems analysis and design
Computational Biology BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BsysE595 Lecture Basic modeling approaches for engineering systems – Summary and Review Shulin Chen January 10, 2013.
Zhiyong Wang In cooperation with Sisi Zlatanova
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
ENM 503 Lesson 1 – Methods and Models The why’s, how’s, and what’s of mathematical modeling A model is a representation in mathematical terms of some real.
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Conceptual Modelling and Hypothesis Formation Research Methods CPE 401 / 6002 / 6003 Professor Will Zimmerman.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Modeling and Simulation Discrete-Event Simulation
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Chapter Three TWO-VARIABLEREGRESSION MODEL: THE PROBLEM OF ESTIMATION
FUNCTIONS AND MODELS 1. The fundamental objects that we deal with in calculus are functions.
MODES-650 Advanced System Simulation Presented by Olgun Karademirci VERIFICATION AND VALIDATION OF SIMULATION MODELS.
Project Estimation techniques Estimation of various project parameters is a basic project planning activity. The important project parameters that are.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
CDA6530: Performance Models of Computers and Networks Chapter 8: Statistical Simulation --- Discrete-Time Simulation TexPoint fonts used in EMF. Read the.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Introduction to Loops For Loops. Motivation for Using Loops So far, everything we’ve done in MATLAB, you could probably do by hand: Mathematical operations.
Online Social Networks and Media
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
MA354 An Introduction to Math Models (more or less corresponding to 1.0 in your book)
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
From the population to the sample The sampling distribution FETP India.
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
Building Valid, Credible & Appropriately Detailed Simulation Models
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
or items of information; these will be numbers in context
Chapter 7. Classification and Prediction
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Debugging Intermittent Issues
Linear Regression.
Debugging Intermittent Issues
CHAPTER 29: Multiple Regression*
Hidden Markov Models Part 2: Algorithms
6-1 Introduction To Empirical Models
TexPoint fonts used in EMF.
1 FUNCTIONS AND MODELS.
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Neza Vodopivec Applied Math and Scientific Computation Program Advisor: Dr. Jeffrey Herrmann Department of Mechanical Engineering Abstract: Understanding how information spreads throughout a population can help public health officials improve how they communicate with the public in emergency situations. In this project, I implement an agent-based information diffusion model inspired by the Bass model. I compare my discrete-time implementation to a traditional differential-equation conceptualization of the Bass model. Finally, I test my model by seeing how well it predicts the spread of information through an actual Twitter network.

 In the weeks following the events of 9/11, seven letters containing dangerous strains of Bacillus anthracis were mailed to senators and news agencies.  Although the FBI never determined a sender or motive, the national panic following the anthrax attacks spurred public health agencies to plan out responses to similar, larger-scale scenarios.

 Anthrax is not contagious, but its dynamics require a fast dissemination of targeted public health information because newly infected individuals have a far better prognosis when they are treated quickly.  In order to increase effectiveness of a targeted public health message, we must understand how information spreads through a population.

To help us understand patterns of communication, we develop models. The goal of an information diffusion model is to depict how a piece of information spreads through a given population over time. We are interested in the successive increases in the fraction of people who are aware of the information.

 Typically based on differential equations which describe only aggregate behavior.  Typically allow predictions in the time domain but not the spatial domain.  Simple, generalizable, easily debugged.

Unlike traditional differential-equation models that treat population as an aggregate, agent-based models keep track of individual agents and their relationships to one another.

My project is divided into two parts: 1. An agent-based information diffusion simulation. 2. A statistical analysis of my model.

The Bass model (Bass, 1969), which was originally developed to model the diffusion of new products in marketing, can be applied to the diffusion of information. The model is based on the assumption that people get their information from two sources:

advertising

The Bass model (Bass, 1969), which was originally developed to model the diffusion of new products in marketing, can be applied to the diffusion of information. The model is based on the assumption that people get their information from two sources: word of mouthadvertising

The Bass model describes the change in the fraction of a population that has become aware of a piece of information: where F(t) is the aware fraction of the population, p is the advertising coefficient, and q is the word-of- mouth coefficient.

We can formulate an agent-based model inspired by the classical Bass model. We discretize the problem and make the following modifications: 1. Instead of taking a deterministic time aggregate, we update probabilistically. 2. Instead of allowing each agent to be influenced by the entire population, it is influenced only by its neighbors.

 The agent-based Bass model assumes agents are arranged in some fixed, known network.  Formally, the network is a directed graph with agents as its nodes. An agent’s neighbors are those who connect to it. That is, agent i is j ’s neighbor if there is an edge from node i to j.  The networks I will use were obtained from Twitter follower data. Twitter is a service which allows its users to post short messages and list which other users they read (“follow”). A directed edge from agent i to agent j represents that agent j “follows” agent i on Twitter.

 The agent-based Bass model is a discrete-time model in which each agent has one of two states at each time step t: (1) unaware or (2) aware.  At time t=0, all agents are unaware.  At each time step, an unaware agent has an opportunity to become aware. Its state changes with p, the probability that it becomes aware due to advertising or due to word of mouth.  The probability of that an agent becomes aware due to word of mouth increases as a function of the fraction of its neighbors who became aware in previous time steps.  Once an agent becomes aware, it remains aware for the rest of the simulation.

At each iteration, the probability that an unaware agent i becomes aware is: P i (t) = p ∆t + q ∆t [n i (t) /m i ] – (p q ∆t 2 [n i (t) /m i ]) m i is the number of neighbors of agent i. n i (t) is the number of neighbors of agent i that became aware before time t. p and q are parameters which indicate the effectiveness of advertising and WOM per unit of time, respectively.

At each iteration, the probability that an unaware agent i becomes aware is: P i (t) = p ∆t + q ∆t [n i (t) /m i ] – (p q ∆t 2 [n i (t) /m i ]) Probability that agent becomes aware due to advertising. m i is the number of neighbors of agent i. n i (t) is the number of neighbors of agent i that became aware before time t. p and q are parameters which indicate the effectiveness of advertising and WOM per unit of time, respectively.

At each iteration, the probability that an unaware agent i becomes aware is: P i (t) = p ∆t + q ∆t [n i (t) /m i ] – (p q ∆t 2 [n i (t) /m i ]) Probability that agent Probability that agent becomes aware due to becomes aware due to advertising. WOM. m i is the number of neighbors of agent i. n i (t) is the number of neighbors of agent i that became aware before time t. p and q are parameters which indicate the effectiveness of advertising and WOM per unit of time, respectively.

At each iteration, the probability that an unaware agent i becomes aware is: P i (t) = p ∆t + q ∆t [n i (t) /m i ] – (p q ∆t 2 [n i (t) /m i ]) Probability that agent Probability that agent Probability that agent becomes aware due to becomes aware due to becomes aware due to advertising. WOM. both advertising and WOM. m i is the number of neighbors of agent i. n i (t) is the number of neighbors of agent i that became aware before time t. p and q are parameters which indicate the effectiveness of advertising and WOM per unit of time, respectively.

Arbitrarily identify the N agents with the set 1,…, N. Let A denote the E×2 matrix listing all (directed) edges of the graph as ordered pairs of nodes. INPUT: matrix A, parameters p and q. 1. Keep track of the state of the agents in a length-N bit vector initialized to all zeros. 2. At each time step, for each agent: 1.Check the bit vector to determine if the agent is already aware. If so, skip it. 2.Make the agent newly aware with probability p. 3.Look up the agent’s neighbors in A. Determine what fraction of them are aware. Make the agent newly aware with probability q times that fraction. 4.Once all agents have been processed, record the newly aware ones as aware in the bit vector. 3. Stop once all agents have become aware or after a maximum number of iterations. OUTPUT: complete history of the bit vector.

I plan to run the simulation numerous times and analyse the resulting data. I wish to examine the empirical distribution of the aware fraction F(t) of the network at each time t. To do so, I will compute the first two moments of the distributions. Then I will plot, as a function of time, the mean F̅(t) surrounded by 90 percent confidence intervals.

 All code will be implemented in MATLAB.  Outside Software: an implementation of the agent- based Bass model written in NetLogo, a programming language used to develop agent- based simulations.  Hardware: AMD Opteron computer, 32 cores, 256 GB of RAM.  Parallelization: multiple simulations will run in parallel. Each run will be logged and later analysed.

 The network structure for my simulations will be derived from real- world Twitter data.  I will use a database containing two such networks, each given in the form of an E×2 matrix listing the E directed edges of a graph as ordered pairs of nodes. The graphs contain approximately 5,000 and 2,000 edges, respectively.  I will also have data for testing my algorithm to see how well it predicts the actual spread of information through a Twitter network. I will use the above matrices along with an M-long vector giving the time (measured from t=0) when a node changed states from unaware to aware, where M is at most the size of the network.

To verify that I have implemented the (conceptual) agent- based Bass model correctly, I will validate my code in the following ways: 1. I will compare my results to those obtained in a simulation performed with NetLogo, software used in agent-based modeling. 2. I will perform traditional verification techniques used in agent-based modeling. 3. I will verify that my results are well approximated by the analytical differential equation-based Bass model.

I will perform three types of the validation methods traditionally used with agent-based models: (1) Corner Cases, (2) Sampled Cases, and (3) Relative Value Testing. Corner Cases test to make sure that the model behaves as expected when extreme values are given as inputs.  If p=0 and q=1, then no one should be aware at the end of the simulation.  If p=1 and q=0, then all agents should be aware after the first iteration.

Sampled Cases test to see that the model produces a reasonable range of results.  If p > 0, then all agents in the network should eventually become aware.  Moreover, with this assumption, the fraction of agents who are aware should increase at each iteration.

Relative Value Testing verifies that the relationship between inputs and outputs is reasonable.  As we increase p and q, the time until all agents become aware should decrease.  We record two outputs separately: the fraction of agents who become aware due to advertising and fraction of agents who become aware due to WOM. If we increase q while keeping p constant, the fraction of the population that becomes aware due to WOM should increase, but this should not affect the fraction that becomes aware due to advertising.

To validate the agent-based model, we make a simplifying assumption: all agents are connected. As a result, local network structure is no longer important. Because each agent i has the whole network, including itself, as its neighbor set, n i (t)/m i is simply F(t), the aware fraction of the network. Therefore, we can rewrite P i (t) = p ∆t + q ∆t [n i (t)/m i ] – (p q ∆t 2 [n i (t) /m i ]) as P(t) = p ∆t + q ∆t F(t) – p q ∆t 2 F(t) for all i.

P(t) = p ∆t + q ∆t F(t) – p q (∆t) 2 F(t) Multiplying the probability that an agent becomes aware by the fraction of unaware agents, we obtain ∆F(t), the change in the aware fraction of the population: ∆F(t) = P(t) [1- F(t)] = [p ∆t + q ∆t F(t) – p q (∆t) 2 F(t)] [1- F(t)] Dividing through by ∆t and letting ∆t  0 recovers the analytical Bass model: 0 ∆F/∆t = [p + q F(t) - p q (∆t) F(t)] [1- F(t)]

 In the special case of a completely connected network, the dynamics of the agent-based Bass model are well approximated by the analytical Bass model.  If they are viewed as physical quantities measured in, say, s -1, the coefficients p and q of the agent-based Bass model are identical to the p and q of the analytical model.

 I will test my model by seeing how well it predicts the actual spread of information through a Twitter network.  The two real-world cases, I will use to assess my model measure the diffusion of the following information, respectively, through Twitter networks: 1.The attack that killed Osama bin Laden 2.News of Hurricane Irene.  Additionally, I will test the efficiency of my code against an existing NetLogo implementation.

OctoberDevelop basic simulation code. Develop code for statistical analysis of results. NovemberValidate simulation code by checking corner cases, sampled cases, and by relative testing. Validate code against analytic model. DecemberValidate simulation against existing NetLogo implementation. Prepare mid-year presentation and report.

JanuaryInvestigate efficiency improvements to code. Incorporate sparse data structures. FebruaryParallelize code. Test code efficiency against existing NetLogo implementation. MarchTest model against empirical Twitter data. Create visualization of model, time permitting. AprilWrite final project report and prepare presentation.

 Simulation code.  Code for statistical analysis.  A graph with the following three curves based on data collected from numerous runs of the simulation: mean and both ends of a 90 percent confidence interval at each time step.  A detailed comparison of my code’s running time against that of the existing NetLogo implementation.  Side by side, the graphs of simulation results compared with the real-world observed Twitter data.

Bass, Frank (1969). “A new product growth model for consumer durables”. Management Science 15 (5): p. 215–227. Chandrasekaran, Deepa and Tellis, Gerard J. (2007). “A Critical Review of Marketing Research on Diffusion of New Products”. Review of Marketing Research, p ; Marshall School of Business Working Paper No. MKT Dodds, P.S. and Watts, D.J. (2004). “Universal behavior in a generalized model of contagion”. Phys. Rev. Lett. 92, Mahajan, Vijay; Muller, Eitan and Bass, Frank (1995). “Diffusion of new products: Empirical generalizations and managerial uses”. Marketing Science 14 (3): G79–G88. Rand, William M. and Rust, Roland T. (2011). “Agent-Based Modeling in Marketing: Guidelines for Rigor (June 10, 2011)”. International Journal of Research in Marketing; Robert H. Smith School Research Paper No. RHS