Autocorrelation in Social Networks: A Preliminary Investigation of Sampling Issues Antonio Páez Darren M. Scott Erik Volz Sunbelt XXVI – International.

Slides:



Advertisements
Similar presentations
Regression Analysis.
Advertisements

Inferential Statistics 1: Basic Concepts Fortrose Academy Geography Department Advanced Higher.
Analysis of variance and statistical inference.
Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
Hypothesis Testing Steps in Hypothesis Testing:
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Inference Sampling distributions Hypothesis testing.
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Ch11 Curve Fitting Dr. Deshi Ye
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
GIS and Spatial Statistics: Methods and Applications in Public Health
Hypothesis Testing Using a Single Sample
Correlation and Autocorrelation
PSY 307 – Statistics for the Behavioral Sciences
Chapter 10 Simple Regression.
Estimation of parameters. Maximum likelihood What has happened was most likely.
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Why Geography is important.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Introduction to Regression Analysis, Chapter 13,
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Så används statistiska metoder i jordbruksförsök Svenska statistikfrämjandets vårkonferens den 23 mars 2012 i Alnarp Johannes Forkman, Fältforsk, SLU.
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Difference Two Groups 1. Content Experimental Research Methods: Prospective Randomization, Manipulation Control Research designs Validity Construct Internal.
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
Chapter 9 Statistical Data Analysis
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
- Interfering factors in the comparison of two sample means using unpaired samples may inflate the pooled estimate of variance of test results. - It is.
Statistical Techniques I EXST7005 Review. Objectives n Develop an understanding and appreciation of Statistical Inference - particularly Hypothesis testing.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Chapter 11 Goodness of Fit Test (section 11.2)
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Sampling Populations Ideal situation - Perfect knowledge Not possible in many cases - Size & cost Not necessary - appropriate subset  adequate estimates.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Spatial Analysis & Vulnerability Studies START 2004 Advanced Institute IIASA, Laxenburg, Austria Colin Polsky May 12, 2004 Graduate School of Geography.
1 G Lect 11a G Lecture 11a Example: Comparing variances ANOVA table ANOVA linear model ANOVA assumptions Data transformations Effect sizes.
Chapter 10: Analysis of Variance: Comparing More Than Two Means.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Hypothesis Testing Errors. Hypothesis Testing Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Correlation. u Definition u Formula Positive Correlation r =
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Geog. 579: GIS and Spatial Analysis - Lecture 10 Overheads 1 1. Aspects of Spatial Autocorrelation 2. Measuring Spatial Autocorrelation Topics: Lecture.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.
Inference about the slope parameter and correlation
Why Model? Make predictions or forecasts where we don’t have data.
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
CHAPTER 3 Analysis of Variance (ANOVA)
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON
Chapter 10: Analysis of Variance: Comparing More Than Two Means
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Chapter 9 Hypothesis Testing
Spatial Autocorrelation
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Chapter 9 Hypothesis Testing
Why are Spatial Data Special?
Introductory Statistics
Presentation transcript:

Autocorrelation in Social Networks: A Preliminary Investigation of Sampling Issues Antonio Páez Darren M. Scott Erik Volz Sunbelt XXVI – International Network for Social Network Analysis Network Autocorrelation Analysis

Spatial analysis ◘Central tenet: First Law of Geography “Everything is related to everything else, but near things are more related than distant things” (Tobler, 1970) ◘Spatial analysis (Miller, 2004)

Spatial statistical models ◘Statistical representation of this principle ○Spatially autoregressive model Y= X  +  WY +WY + Spatial spillovers Economic externalities … (e.g. Fingleton, 2003;2004)

Connectivity matrix W ◘Key element of the model ○Defines the spatial structure of the study area ○Position relative to other units

First Law: General principle ◘Distance in social space “Everyone is related to everyone else, but near people are more related than distant people” ◘Akerlof’s social distance (1997) “Agents who are initially close interact more strongly while those who are socially distant have little interaction”

Social network analysis ◘Network models Y= X  +  WY +WY + Social influence (e.g. Leenders, 2002; Marsden and Friedkin, 1994)

Geo-referencing ◘Nature of connectivity is relatively unambiguous even if definition of weights is not

Social referencing ◘Identification of network connections

Specification of W ◘Research questions Within a linear autoregressive framework: ○What is the effect of under-specifying matrix W… ? (how much effort should go into trying to observe/identify network connections?) ○What is the effect of different network topologies…? On quality of estimators, model identification (Previous work by Stetzer, 1982; Griffith, 1996)

Experimental setup ◘Assumptions ○Closed system (interactions with the rest of the world are negligible) ○All individuals are observed, their attributes can be obtained ○Not all network connections are identified »Deliberate effort to minimize observation cost: select individuals and identify all their connections

Experimental setup ◘Simulate networks with different topologies (Matrix W ) ○Poisson distribution / exponential distribution ○Degree distribution: 1.5, 3.5, 5.5, 7.5 ○Clustering: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7 Random networks with tunable degree distribution and clustering – Volz, 2004

Experimental setup ◘Simulate data ○  : 0.1, 0.3, 0.5, 0.7, 0.9 ○  1 =2.0;  2 =1.0 ○ X 1 : const; X 2 : uniform (1,10) (see Anselin and Florax, 1995) ○  : standard normal ○ n =100 (number of observations) Y =  WY +X 1  1 +X 2  2 + 

Experimental setup ◘Randomly sample from connectivity matrix W (e.g. 95% of individual connections) ○ s : 0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50 ◘Estimate coefficients ○1,000 each level of sampling ◘Calculate mse: bias – variance ◘Model identification: likelihood ratio test

Results Degree Distribution ( d ) Clustering ( c )  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =

Summary and conclusions ◘Specification of connectivity matrix in social network settings ◘Resources available for observing network connections – sampling strategies ◘Simulation experiment using networks with controlled topologies: quality of estimators, power of identification tests

Summary and conclusions ◘Main control is degree of network autocorrelation ◘Clustering: relatively small effect

Summary and conclusions ◘Weak network autocorrelation (  = 0.1 ~ 0.3) ○Effect of under-specification on coefficients is relatively small ○Tests may fail to identify the effect ◘Moderate network autocorrelation (  = 0.5) ○Effect on coefficients becomes s~0.75, and this effect is sharper with increasing degree distribution ○Tests correctly reject null hypothesis of no autocorrelation 90% of p=0.05

Summary and conclusions ◘Strong network autocorrelation (  = 0.7 ~ 0.9) ○Quality of estimators deteriorates very rapidly, s~0.90 ○Tests lose power at higher degree distributions ◘Further research ○Alternative sampling schemes (e.g. snowball, referral) ○Over-specification of connectivity matrix W ○“Seeding” matrix W

d = 1.5; c = 0.2;  = 0.1 d c 

d = 1.5; c = 0.2;  = 0.3 d c 

d = 1.5; c = 0.2;  = 0.5 d c 

d = 1.5; c = 0.2;  = 0.7 d c 

d = 1.5; c = 0.2;  = 0.9 d c 

d = 1.5; c = 0.3;  = 0.1 d c 

d = 1.5; c = 0.3;  = 0.3 d c 

d = 1.5; c = 0.3;  = 0.5 d c 

d = 1.5; c = 0.3;  = 0.7 d c 

d = 1.5; c = 0.3;  = 0.9 d c 

d = 1.5; c = 0.4;  = 0.1 d c 

d = 1.5; c = 0.4;  = 0.3 d c 

d = 1.5; c = 0.4;  = 0.5 d c 

d = 1.5; c = 0.4;  = 0.7 d c 

d = 1.5; c = 0.4;  = 0.9 d c 

d = 1.5; c = 0.5;  = 0.1 d c 

d = 1.5; c = 0.5;  = 0.3 d c 

d = 1.5; c = 0.5;  = 0.5 d c 

d = 1.5; c = 0.5;  = 0.7 d c 

d = 1.5; c = 0.5;  = 0.9 d c 

d = 1.5; c = 0.6;  = 0.1 d c 

d = 1.5; c = 0.6;  = 0.3 d c 

d = 1.5; c = 0.6;  = 0.5 d c 

d = 1.5; c = 0.6;  = 0.7 d c 

d = 1.5; c = 0.6;  = 0.9 d c 

d = 1.5; c = 0.7;  = 0.1 d c 

d = 1.5; c = 0.7;  = 0.3 d c 

d = 1.5; c = 0.7;  = 0.5 d c 

d = 1.5; c = 0.7;  = 0.7 d c 

d = 1.5; c = 0.7;  = 0.9 d c 

d = 3.5; c = 0.2;  = 0.1 d c 

d = 3.5; c = 0.2;  = 0.3 d c 

d = 3.5; c = 0.2;  = 0.5 d c 

d = 3.5; c = 0.2;  = 0.7 d c 

d = 3.5; c = 0.2;  = 0.9 d c 

d = 3.5; c = 0.3;  = 0.1 d c 

d = 3.5; c = 0.3;  = 0.3 d c 

d = 3.5; c = 0.3;  = 0.5 d c 

d = 3.5; c = 0.3;  = 0.7 d c 

d = 3.5; c = 0.3;  = 0.9 d c 

d = 3.5; c = 0.4;  = 0.1 d c 

d = 3.5; c = 0.4;  = 0.3 d c 

d = 3.5; c = 0.4;  = 0.5 d c 

d = 3.5; c = 0.4;  = 0.7 d c 

d = 3.5; c = 0.4;  = 0.9 d c 

d = 3.5; c = 0.5;  = 0.1 d c 

d = 3.5; c = 0.5;  = 0.3 d c 

d = 3.5; c = 0.5;  = 0.5 d c 

d = 3.5; c = 0.5;  = 0.7 d c 

d = 3.5; c = 0.5;  = 0.9 d c 

d = 3.5; c = 0.6;  = 0.1 d c 

d = 3.5; c = 0.6;  = 0.3 d c 

d = 3.5; c = 0.6;  = 0.5 d c 

d = 3.5; c = 0.6;  = 0.7 d c 

d = 3.5; c = 0.6;  = 0.9 d c 

d = 3.5; c = 0.7;  = 0.1 d c 

d = 3.5; c = 0.7;  = 0.3 d c 

d = 3.5; c = 0.7;  = 0.5 d c 

d = 3.5; c = 0.7;  = 0.7 d c 

d = 3.5; c = 0.7;  = 0.9 d c 

d = 5.5; c = 0.2;  = 0.1 d c 

d = 5.5; c = 0.2;  = 0.3 d c 

d = 5.5; c = 0.2;  = 0.5 d c 

d = 5.5; c = 0.2;  = 0.7 d c 

d = 5.5; c = 0.2;  = 0.9 d c 

d = 5.5; c = 0.3;  = 0.1 d c 

d = 5.5; c = 0.3;  = 0.3 d c 

d = 5.5; c = 0.3;  = 0.5 d c 

d = 5.5; c = 0.3;  = 0.7 d c 

d = 5.5; c = 0.3;  = 0.9 d c 

d = 5.5; c = 0.4;  = 0.1 d c 

d = 5.5; c = 0.4;  = 0.3 d c 

d = 5.5; c = 0.4;  = 0.5 d c 

d = 5.5; c = 0.4;  = 0.7 d c 

d = 5.5; c = 0.4;  = 0.9 d c 

d = 5.5; c = 0.5;  = 0.1 d c 

d = 5.5; c = 0.5;  = 0.3 d c 

d = 5.5; c = 0.5;  = 0.5 d c 

d = 5.5; c = 0.5;  = 0.7 d c 

d = 5.5; c = 0.5;  = 0.9 d c 

d = 5.5; c = 0.6;  = 0.1 d c 

d = 5.5; c = 0.6;  = 0.3 d c 

d = 5.5; c = 0.6;  = 0.5 d c 

d = 5.5; c = 0.6;  = 0.7 d c 

d = 5.5; c = 0.6;  = 0.9 d c 

d = 5.5; c = 0.7;  = 0.1 d c 

d = 5.5; c = 0.7;  = 0.3 d c 

d = 5.5; c = 0.7;  = 0.5 d c 

d = 5.5; c = 0.7;  = 0.7 d c 

d = 5.5; c = 0.7;  = 0.9 d c 

d = 7.5; c = 0.2;  = 0.1 d c 

d = 7.5; c = 0.2;  = 0.3 d c 

d = 7.5; c = 0.2;  = 0.5 d c 

d = 7.5; c = 0.2;  = 0.7 d c 

d = 7.5; c = 0.2;  = 0.9 d c 

d = 7.5; c = 0.3;  = 0.1 d c 

d = 7.5; c = 0.3;  = 0.3 d c 

d = 7.5; c = 0.3;  = 0.5 d c 

d = 7.5; c = 0.3;  = 0.7 d c 

d = 7.5; c = 0.3;  = 0.9 d c 

d = 7.5; c = 0.4;  = 0.1 d c 

d = 7.5; c = 0.4;  = 0.3 d c 

d = 7.5; c = 0.4;  = 0.5 d c 

d = 7.5; c = 0.4;  = 0.7 d c 

d = 7.5; c = 0.4;  = 0.9 d c 

d = 7.5; c = 0.5;  = 0.1 d c 

d = 7.5; c = 0.5;  = 0.3 d c 

d = 7.5; c = 0.5;  = 0.5 d c 

d = 7.5; c = 0.5;  = 0.7 d c 

d = 7.5; c = 0.5;  = 0.9 d c 

d = 7.5; c = 0.6;  = 0.1 d c 

d = 7.5; c = 0.6;  = 0.3 d c 

d = 7.5; c = 0.6;  = 0.5 d c 

d = 7.5; c = 0.6;  = 0.7 d c 

d = 7.5; c = 0.6;  = 0.9 d c 

d = 7.5; c = 0.7;  = 0.1 d c 

d = 7.5; c = 0.7;  = 0.3 d c 

d = 7.5; c = 0.7;  = 0.5 d c 

d = 7.5; c = 0.7;  = 0.7 d c 

d = 7.5; c = 0.7;  = 0.9 d c 