SA basics Lack of independence for nearby obs

Slides:



Advertisements
Similar presentations
A Brief Introduction to Spatial Regression
Advertisements

Autocorrelation Functions and ARIMA Modelling
Chapter 3 Properties of Random Variables
Spatial autoregressive methods
Managerial Economics in a Global Economy
Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Spatial Autocorrelation using GIS
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Spatial statistics Lecture 3.
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
Basic geostatistics Austin Troy.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
Local Measures of Spatial Autocorrelation
Spatial Statistics II RESM 575 Spring 2010 Lecture 8.
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
Chapter 10 Simple Regression.
Spatial autoregressive methods Nr245 Austin Troy Based on Spatial Analysis by Fortin and Dale, Chapter 5.
The Simple Regression Model
Applied Geostatistics
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Why Geography is important.
Chapter 11: Inference for Distributions
Statistical Treatment of Data Significant Figures : number of digits know with certainty + the first in doubt. Rounding off: use the same number of significant.
Introduction to Regression Analysis, Chapter 13,
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Global Measures of Spatial Autocorrelation
Correlation and Linear Regression
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Fundamentals of Data Analysis Lecture 7 ANOVA. Program for today F Analysis of variance; F One factor design; F Many factors design; F Latin square scheme.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
CORRELATION & REGRESSION
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Ecosystems are: Hierarchically structured, Metastable, Far from equilibrium Spatial Relationships Theoretical Framework: “An Introduction to Applied Geostatistics“,
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
ITEC6310 Research Methods in Information Technology Instructor: Prof. Z. Yang Course Website: c6310.htm Office:
Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships.
Geo479/579: Geostatistics Ch4. Spatial Description.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
What’s the Point? Working with 0-D Spatial Data in ArcGIS
Local Spatial Statistics Local statistics are developed to measure dependence in only a portion of the area. They measure the association between Xi and.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Material from Prof. Briggs UT Dallas
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Chapter 13 Understanding research results: statistical inference.
Chapter 4 Variability PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh Edition by Frederick J Gravetter and Larry.
Methods of Presenting and Interpreting Information Class 9.
Synthesis.
Moran’s I and Correlation Coefficient r Differences and Similarities
Spatial statistics: Spatial Autocorrelation
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
Fundamentals of regression analysis
CHAPTER 29: Multiple Regression*
Discrete Event Simulation - 4
STATISTICS Topic 1 IB Biology Miss Werba.
SPATIAL ANALYSIS IN MACROECOLOGY
MGS 3100 Business Analysis Regression Feb 18, 2016
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont

SA basics Lack of independence for nearby obs Negative vs. positive vs. random Induced vs inherent spatial autocorrelation Induced when thing being measured is a function of the actual autocorrelated variable First order (gradient) vs. second order (patchiness) SA can be at two scales: within patch and between patch Directional patterns: anisotropy Measured based on point pairs

Spatial lags Source: ESRI, ArcGIS help

Statistical ramifications Spatial version of redundancy If the observations are spatially clustered the estimates obtained from the correlation coefficient or OLS estimator will be biased and confidence intervals too narrow. biased because the areas with higher concentration of events will have a greater impact on the model estimate and they will overestimate precision because, since events tend to be concentrated, there are actually fewer number of independent observations than degrees of freedom indicate. Like measuring the same thing many times estimate of the standard errors will be too low. This might lead you to believe that some coefficients are significant, when in fact they are not. False positives or Type 1 errors End up with systematic bias in models towards variables that are correlated in space Similar to pseudo-replication in ecology

Example Example from Prof Allan Frei: “Imagine if you had data from 10 counties spread all over the northeast. The data includes variables such as median income, # of highways, and access to the internet. You regress access to the internet (dependent variable) against the other two (independent) variables. You realize that n=10 is not enough to get significant results, so you need more data. So, you go get data from 10 additional counties, but you choose the counties that are immediately adjacent to the original 10. If incomes and infrastructure hardly change from one county to the next, you are not really getting any additional information. This would be spatial autocorrelation. So how does this affect the regression results? You would be doing the calculations as if you had n=20 cases, but in reality you only had 10 independent cases. So, you would be overestimating the number of degrees of freedom, getting unrealistic t and p values, and underestimating the standard errors of the coefficients.” Source: http://www.geography.hunter.cuny.edu/~afrei/gtech702_fall03_webpages/notes_spatial_autocorrelation.htm

Tests Null hypothesis: observed spatial pattern of values is equally likely as any other spatial pattern Test if values observed at a location do not depend on values observed at neighboring locations

Moran’s I Ratio of two expressions: similarity of pairs, including only those within distance threshold adjusted for number of items, over variance of data Applied to zones or points with continuous variables associated with them. Compares the value of the variable at any one location with the value at all other locations Where N is the number of cases Xi is the variable value at a particular location Xj is the variable value at another location is the mean of the variable Wij is a weight applied to the comparison between location i and location j OR # of connections In matrix See http://www.spatialanalysisonline.com/output/html/MoranIandGearyC.html#_ref177275168 for more detail

Moran’s I Wij is a contiguity matrix Varies between –1.0 and + 1.0 If zone j is adjacent to zone i, the interaction receives a weight of 1: see lags from second slide Another option is to make Wij a distance-based weight which is the inverse distance between locations I and j (1/dij) Or can use both (this is an option in ArcGIS) Compares the sum of the cross-products of values at different locations, two at a time weighted by the inverse of the distance between the locations Varies between –1.0 and + 1.0 When autocorrelation is high, the coefficient is high A high I value indicates positive autocorrelation Significance tested with Z statistic

Geary’s C One prob with Moran’s I is that it’s based on averages so easily biased by skewed distribution with outliers. Geary’s C deals with this because: Interaction is not the cross-product of the deviations from the mean like Moran, but the deviations in intensities of each observation location with one another However it can also be biased in presence of skewness but for another reason; squared differences between one value and an outlier value will have disproportionate effect on index

Geary’s C Value typically range between 0 and 2 If value of any one zone are spatially unrelated to any other zone, the expected value of C will be 1 Values less than 1 (between 1 and 2) indicate negative spatial autocorrelation Inversely related to Moran’s I Does not provide identical inference because it emphasizes the differences in values between pairs of observations, rather than the covariation between the pairs. Moran’s I gives a more global indicator, whereas the Geary coefficient is more sensitive to differences in small neighborhoods.

Scale effects Can measure I or C at different spatial lags to see scale dependency with spatial correlogram Source:http://iussp2005.princeton.edu/download.aspx?submissionId=51529

Source: Fortin and Dale, Spatial Analysis

LISA Local version of Moran: maps individual covariance components of global Moran Require some adjustment: standardize row total in weight matrix (number of neighbors) to sum to 1—allows for weighted averaging of neighbors’ influence Also use n-1 instead of n as multiplier Usually standardized with z-scores +/- 1.96 is usually a critical threshold value for Z And expected value where

Gwynns Fall Crime Data

Local Moran

Local Getis with 2000 m weights

Local Getis with adjacency weights

Local Getis-Ord Statistic Used in Arc GIS Indicates both clustering and whether clustered values are high or low For a chosen critical distance d, G(d) is  where xi is the value of ith point, w(d) is the weight for point i and j for distance d.