Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships.

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and regression
Spatial statistics Lecture 3.
Objectives (BPS chapter 24)
Basic geostatistics Austin Troy.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 12 Simple Regression
The Simple Regression Model
Applied Geostatistics
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
REGRESSION MODEL ASSUMPTIONS. The Regression Model We have hypothesized that: y =  0 +  1 x +  | | + | | So far we focused on the regression part –
Chapter Topics Types of Regression Models
Deterministic Solutions Geostatistical Solutions
SA basics Lack of independence for nearby obs
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Chapter 12 Section 1 Inference for Linear Regression.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Correlation and Regression
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
Regression Analysis (2)
Covariance and correlation
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Introduction to Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Geographic Information Science
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Bootstrap Event Study Tests Peter Westfall ISQS Dept. Joint work with Scott Hein, Finance.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Statistics for Managers Using Microsoft® Excel 5th Edition
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Inference about the slope parameter and correlation
Chapter 13 Simple Linear Regression
Synthesis.
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Least Squares Lines
CHAPTER 29: Multiple Regression*
Presentation transcript:

Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships between observations from independent data can be analyzed in numerous ways. Some include: 1. Estimation through Stochastic Dependencies 2. Spatial Regression: Deterministic structure of the mean function. 3. Lattice Modeling: expressing observations as functions of neighboring values. Chapter Emphasis: exploratory tools for spatial data must allow some insight into the spatial structure in the data.

For instance, stem & leaf plots and histograms pictorially represent the data, but tell us nothing about the data’s spatial orientation or structure. (Stem & Leaf Plot) (Histogram)

Example of using lattice modeling to demonstrate importance of retaining spatial information: 10 X 10 lattices filled with 100 observations drawn at random. Lattice A is a completely random assignment of observations to lattice positions. Lattice B is an assignment to positions such that a value is surrounded by values similar in magnitude.

Histograms of the 100 observed values that do not take into account spatial position will be identical for the two lattices: Note: The density estimate is not an estimate of the probability distribution of the data; that requires a different formula. Even if the histogram is calculated by lumping data across spatial locations appears Gaussian does not imply that the data are a realization of a Gaussian random field.

Plotting observed values against the average value of the nearest neighbors the difference in the spatial distribution between the two lattices emerge: The data in lattice A are not spatially correlated and the data in lattice B are very strongly autocorrelated. Terminology:

Distinguishing between spatial and non-spatial arrangements can detect outliers. In a box plot or a stem & leaf plot, outliers are termed “distributional.” A “spatial” outlier in an observation that is unusual compared to its surrounding values. Diagnosing Spatial Outliers: Median-Polish the data, meaning remove the large scale trends in the data by some outer outlier-resistant method, and to look for outlying observations in a box-plot of the median-polished residuals. Use of Lag Plots (Previous example)

Concerning Mercer and Hall Grain Yield. 1 S+Spatial States Code: Bwplot(y~grain, data=wheat, ylab=“Row”, xlab= “Grain Yield”) Bwplot (x~grain,data=wheat, ylab=“Column”, xlab= “Grain Yield”)

Describing, Diagnosing, and Testing the Degree of Spatial Autocorrelation Geostatistical Data: the empirical semivariogram provides an estimate of the spatial structure. Lattice data JOINT-COUNT statistics have been developed for binary and nominal data. Moran (1950) and Geary (1954): developed autocorrelation coefficients for continuous attributes observed on lattices. Coefficient Moran’s “I” and Geary’s “C.” Comparing an estimate of the covariation among the Z(s) to an estimate of their variation. 2

Let Z(s i ), i= 1,2,3,…,n denote the attribute Z observed at site s i and U i = Z(s i )- Z its centered version. w ij denotes the neighborhood connectivity weight between sites s i and s j with w ii = 0.

In the absence of spatial autocorrelation, I has an expected value E[I]= -1/(n-1) values I > E[I] indicate positive autocorrelation. values I < E[I] indicate negative autocorrelation. To determine whether a deviation of I from its expectation is statistically significant one relies on the asymptotic distribution of I which is Gaussian with mean -1/(n-1) and variance δ 2 I. The hypothesis of no spatial autocorrelation is rejected at the α x 100% significance level if |Z obs | = |I- E[I]| / σ I is more extreme than the z a/2 cutoff of a standard Gaussian distribution.

1.Assume Z(si) are Gaussian Under Null Hypothesis, Z(s i ) are assumed G(μ,σ 2 ), so that U i ~ (0, σ 2 (1-1/n)) 2. Randomization Framework Z(s i ) are considered fixed; randomly permuted among the n lattice sites. There are n! equally likely random permutations and σ I 2 is the variance of the n! Moran I values. 3 Best Alternative to Randomization.

Calculates the Z obs statistics and p-values under the Gaussian and randomization assumption. Data containing the W matrix (W= [wij] ) is passed to the macro through the w_data option. (we are utilizing SAS ® macro %MoranI) For rectangular lattices: use the macro %ContWght (in file \SASMacros\ContiguityWeights.sas) calculates the W matrices for classical neighborhood definitions.

%include ‘DriveLetterofCDROM: \Data\SAS\MercerWheatYieldData.sas’; %include ‘DriveLetterofCDROM: \SASMacros\ContiguityWeights.sas’; %include ‘DriveLetterofCDROM: \SASMacros\MoranI.sas’; Title1 “Moran’s I for Mercer and Hall Wheat Yield, Rook’s Move”; %Contwght (rows=30, cols=25, move=rook, out=rook); %MoranI(data=mercer, y=grain, row=row, col=col, w_data=rock); 4

1.Sensitive to large scale trends in data 2.Very sensitive to the choice of the neighborhood matrix W If the rook definition (edges abut) is replaced by the bishop’s move (touching corners), the autocorrelation remains significant but the value of the test statistic is reduced by about 50%. Title1 Moran’s I for Mercer and Hall Wheat Grain Data, Bishop’s Move”; %ContWght (row=20, cols=25, move=bishop, out=bishop); %MoranI(data=mercer, y=grain, row=row, col=col, w_data=bishop); 5

Linear Model: Z= x + 0.2y x 2 + e, e~iidG(0,1), where x and y are the lattice coordinates. Data simulate; do x= 1 to 10; do y= 1 to 10; z= *x + 0.2*y *x*x + rannor(2334); output; end; end; Run; Title1 “Moran’s I for independent data with large-scale trend”; %ContWght(rows=10, cols=10, move=rock, out=rock); %MoranI(data=simulate, y=z, row=x, col=y, w_data=rook) Test indicates strong positive “autocorrelation” which is an artifact of the changes in E[Z] rather than stochastic spatial dependency among the sites.

IF trend contamination distorts inferences about the spatial autocorrelation coefficient, then it seems reasonable to remove the trend and calculate the autocorrelation coefficient from the RESIDUALS. The residual vector Modified I test statistic The mean and variance differ a little bit, now, the E[I*] depends on the weights W and the X matrix. ( 6 )

Title1 “Moran’s I for Mercer and Hall Wheat Yield Data”; Title 2 “Calculated for Regression Residuals”; %include “DriveLetterofCDROM: \SASMacros\MoranResiduals.sas’; Data xmat: set mercer; x1= col; x2= col**2, x3= col**3; keep x1 x2 x3 Run; %RegressI(xmat=xmat, data=mercer, z=grain, weight=rook, local=1); This particular code fits a large scale mean model with cubic column effects and no row effects. This adds higher order terms for column effects and leaves the results essentially unchanged. 7

The value of Zobs is slightly reduced from Output 9.3(slide 14) indicating that the column trends did add some false autocorrelation. P value is highly significant, conventional tests for independent data is not a fun analysis.

Optional Parameter: local= 8 LISA: Local Indicator of Spatial Association The interpretation is that if the test statistics is < Expected Value then sites connected to each site s i have attribute values dissimilar from Z(s i ) A high (low) value at s i is surrounded by low (high) values. If the test statistic is > Expected Value, then a high (low) value at Z(s i ) is surrounded by high (low) values at connected sites.

Graph shows detrended Mercer and Hall grain yield data with sites with positive LISAs. Hot-spots where autocorrelation is locally much greater than for the remainder of the lattice is obvious.