Click to Edit Talk Title USMA Network Science Center Specific Communication Network Measure Distribution Estimation Daniel.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Spatial Dependency Modeling Using Spatial Auto-Regression Mete Celik 1,3, Baris M. Kazar 4, Shashi Shekhar 1,3, Daniel Boley 1, David J. Lilja 1,2 1 CSE.
Inference for Regression
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
Approaches to Data Acquisition The LCA depends upon data acquisition Qualitative vs. Quantitative –While some quantitative analysis is appropriate, inappropriate.
Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie Jorgensen, 06, G4 Advisors: MAJ Ian McCulloh, D/MATH LTC John.
Sam Pfister, Stergios Roumeliotis, Joel Burdick
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Network Statistics Gesine Reinert. Yeast protein interactions.
Detecting Changes in Social Networks Using Statistical Process Control Cadet Matthew R. Webb Advisor: Major Ian McCulloh, D/Math USMA 20 May 2007 NetSci.
Chapter Topics Types of Regression Models
1 Validation and Verification of Simulation Models.
Simple Linear Regression Analysis
1 Sensor Placement and Lifetime of Wireless Sensor Networks: Theory and Performance Analysis Ekta Jain and Qilian Liang, Department of Electrical Engineering,
Pattern Recognition Topic 2: Bayes Rule Expectant mother:
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Variance Fall 2003, Math 115B. Basic Idea Tables of values and graphs of the p.m.f.’s of the finite random variables, X and Y, are given in the sheet.
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
BPT 2423 – STATISTICAL PROCESS CONTROL.  Frequency Distribution  Normal Distribution / Probability  Areas Under The Normal Curve  Application of Normal.
WARM – UP 1.Phrase a survey or experimental question in such a way that you would obtain a Proportional Response. 2.Phrase a survey or experimental question.
Text Analysis Using Automated Language Translators CDT John Stanford MAJ Ian McCulloh.
Information Networks Power Laws and Network Models Lecture 3.
Key ideas of analysis & interpretation of data Visualize data – (tables, pictures, graphs, statistics, etc. to reveal patterns & relationships). Making.
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
The Properties of Pennies. Science Labs Begin with a Question Do all groups of pennies have the same physical properties?
SIMULATION USING CRYSTAL BALL. WHAT CRYSTAL BALL DOES? Crystal ball extends the forecasting capabilities of spreadsheet model and provide the information.
Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.
“The Geography of the Internet Infrastructure: A simulation approach based on the Barabasi-Albert model” Sandra Vinciguerra and Keon Frenken URU – Utrecht.
1 Statistical Distribution Fitting Dr. Jason Merrick.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Scatterplots & Regression Week 3 Lecture MG461 Dr. Meredith Rolfe.
Percolation Processes Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopPercolation Processes1.
Yongqin Gao, Greg Madey Computer Science & Engineering Department University of Notre Dame © Copyright 2002~2003 by Serendip Gao, all rights reserved.
Class 9: Barabasi-Albert Model-Part I
Monitoring High-yield processes MONITORING HIGH-YIELD PROCESSES Cesar Acosta-Mejia June 2011.
PCB 3043L - General Ecology Data Analysis.
Topic 6: The distribution of the sample mean and linear combinations of random variables CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Intelligent and Adaptive Systems Research Group A Novel Method of Estimating the Number of Clusters in a Dataset Reza Zafarani and Ali A. Ghorbani Faculty.
Network (graph) Models
Random Walk for Similarity Testing in Complex Networks
Statistics 200 Lecture #4 Thursday, September 1, 2016
A tale of many cities: universal patterns in human urban mobility
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Simulation-Based Approach for Comparing Two Means
Statistical significance & the Normal Curve
CSE 4705 Artificial Intelligence
Department of Computer Science University of York
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CASE − Cognitive Agents for Social Environments
Additional notes on random variables
Additional notes on random variables
Essential Statistics Two-Sample Problems - Two-sample t procedures -
Essential Statistics Sampling Distributions
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Presentation transcript:

Click to Edit Talk Title USMA Network Science Center Specific Communication Network Measure Distribution Estimation Daniel P. Baller, Joshua Lospinoso 17 June th ICCRTS

2 Agenda Background & Motivation ORA Visualization Assumptions NPM Simulated Networks Results Future Research

3 Background & Motivation Simulating random instances of networks is a hot topic in today’s Network Science research –Erdos-Renyi, Watts-Strogatz, Barabasi-Alberts, NPM How do networks arrive at structure? How do we explore these structures? Many methods of simulating random networks exist No sound methodology exists for measuring “goodness” We propose a methodology for testing how well simulations perform under a rigorous statistical framework, and execute testing for two data sets under the Network Probability Matrix and Erdos-Renyi.

4 Data Set 1 (Net07) Warfighting Simulation run at FT Leavenworth, KS in April Mid Career Army Officers 4 day simulated exercise Self Reported Communications survey Surveys conducted 2 x per day 8 Total Data Sets

5 Data Set 2 (Net05) Warfighting Simulation run at FT Leavenworth, KS in Mid Career Army Officers 5 day simulated exercise Self Reported Communications survey Surveys conducted 2 x per day 9 Total data sets

6 Data Data symmetrized and dichotomized Square symmetric matrix Over time data compiled by agent

7 ORA Visualization (Net07)

8 ORA Visualization (Net05)

9 Assumptions Network in Dynamic Equilibrium Observations based on Underlying edge probability structure Maintain ergodicity Graph courtesy of David Krackhardt

10 Network Probability Matrix

Graph courtesy of ORA Erdsös-Réni Random Graph Assumptions Network in Dynamic Equilibrium Observations based on Underlying edge probability structure Maintain ergodicity All nodes have same probability distribution.

Erdsös-Réni Random Graph All edges have the same probability Same random Bernoulli trial Connectivity threshold Graph courtesy of ORA

13 Simulated Network Monte Carlo Simulation N = 100,000

14 The Statistical Test We use Hamming Distance (HD) as a measure of similarity between networks. We find HD for all combinations of time periods in the empirical data. We take all of the simulated graphs, and find HD between them and each time period from empiricals. Perform T-TEST between these two sets of HDs. This test tells us if the simulation (NPM/E-R) does a BETTER JOB at explaining a given time period than the rest of the empirical data does.

15 The Statistical Test 1.Create a vector of hamming distances between all possible combinations of empirical vectors. Group them by time period. 2.Create a vector of hamming distances between all simulated graphs and each time period, grouping the vectors by time period. 3.Perform a T-test between each corresponding vector to answer the following question: Does the NPM/E-R, on average, more closely match any particular time period from the empirical data than the rest of the empirical data?

16 Results (NET07): NPM M8N60000 Data Set Mean Hamming Distance to Empirical Networks Standard Deviation of Hamming Distance to Empirical Networks Mean Hamming Distance to Simulated Networks Standard Deviation of Hamming Distance to Simulated Networkst-testp-value Comparison of simulated vs. empirical data

17 Results (NET07): E-R M8N60000 Data Set Mean Hamming Distance to Empirical Networks Standard Deviation of Hamming Distance to Empirical Networks Mean Hamming Distance to Simulated Networks Standard Deviation of Hamming Distance to Simulated Networkst-testp-value Comparison of simulated vs. empirical data

18 Results (NET05): NPM M9N60000 Data Set Mean Hamming Distance to Empirical Networks Standard Deviation of Hamming Distance to Empirical Networks Mean Hamming Distance to Simulated Networks Standard Deviation of Hamming Distance to Simulated Networksp-value t-test

19 Results (NET05): E-R M9N60000 Data Set Mean Hamming Distance to Empirical Networks Standard Deviation of Hamming Distance to Empirical Networks Mean Hamming Distance to Simulated Networks Standard Deviation of Hamming Distance to Simulated Networksp-value t-test

20 Significance NPM performs well, E-R does not. –Why? (Net-07 Clustering) Since the NPM does a good job of representing the laws which govern the network, we can use simulation to: –Explore large numbers of “instances” of the graphs –Create distributions of network and agent-level measures With a validated simulation, we facilitate further statistical analysis of the network and its measures! –Statistical Process Control: When has the network undergone a significant change? –Percolation: What is the likelihood that a rumor/ideology/belief spreads throughout the network?

21 Results (NET07) Fit a normal distribution to densities

22 Results (NET05) Fit a normal distribution to densities

23 Conclusion When highly complex systems are being simulated, and empirical data is available, we can use this methodology to test whether our simulation is at least as close to each time series in a data set as the rest of the time periods are. –Which model (Erdos, Watts, Barabasi, NPM) most accurately describes the empirical data? The “simple case” of the NPM is shown to be a viable explanation of social networks.

24 Acknowledgements U.S. Army Research Institute U.S. Military Academy Network Science Center U.S. Army Research Labs Dr. Kathleen Carley COL Steve Horton LTC John Graham MAJ Tony Johnson MAJ Ian McCulloh Dr. Craig Schrieber