1 Testing spatial correlation (autocorrelation) 1.Moran’s I 2.Geary’s c 3.Variogram 4.Join counts Cliff, A. D. & Ord, J. K. 1981. Spatial processes: models.

Slides:



Advertisements
Similar presentations
Chapter 16 Inferential Statistics
Advertisements

CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Is it statistically significant?
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
Objectives (BPS chapter 24)
GIS and Spatial Statistics: Methods and Applications in Public Health
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Correlation and Autocorrelation
MARE 250 Dr. Jason Turner Hypothesis Testing II To ASSUME is to make an… Four assumptions for t-test hypothesis testing: 1. Random Samples 2. Independent.
The Simple Regression Model
Applied Geostatistics
Chapter Sampling Distributions and Hypothesis Testing.
1 He, F., Zhou, J. and Zhu, H.T Autologistic regression model for the distribution of vegetation. Journal of Agricultural, Biological and Environmental.
Multivariate Regression Model y =    x1 +  x2 +  x3 +… +  The OLS estimates b 0,b 1,b 2, b 3.. …. are sample statistics used to estimate 
Topic 3: Regression.
T-Tests Lecture: Nov. 6, 2002.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
Ordinary Kriging Process in ArcGIS
IENG 486 Statistical Quality & Process Control
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
5-3 Inference on the Means of Two Populations, Variances Unknown
Getting Started with Hypothesis Testing The Single Sample.
Lorelei Howard and Nick Wright MfD 2008
Variance and covariance Sums of squares General linear models.
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Lecture 5 Correlation and Regression
Correlation & Regression
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Comparing Two Samples Harry R. Erwin, PhD
CORRELATION & REGRESSION
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Chapter 15 Data Analysis: Testing for Significant Differences.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Nonparametric Statistics aka, distribution-free statistics makes no assumption about the underlying distribution, other than that it is continuous the.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Chapter 8: Simple Linear Regression Yang Zhenlin.
PCB 3043L - General Ecology Data Analysis.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Methods of Presenting and Interpreting Information Class 9.
Statistical Inference
Chapter 11: Simple Linear Regression
PCB 3043L - General Ecology Data Analysis.
Lecture 46 Section 14.5 Wed, Apr 13, 2005
Presentation transcript:

1 Testing spatial correlation (autocorrelation) 1.Moran’s I 2.Geary’s c 3.Variogram 4.Join counts Cliff, A. D. & Ord, J. K Spatial processes: models and applications. Pion Chapter 12 – Correlation between two maps

2 Testing correlation between two maps (continuous variables) Proportion of land area classified as phydric ln(elevation) in foot x1x1 x2x2 Gumpertz, M.L., Wu, C.-T. & Pye J.M Logistic regression for southern pine beetle outbreaks with spatial and temporal autocorrelation. Forest Science

3 Assume the correlation coefficient between the two maps is r. The null hypothesis: H 0 : r = 0. If y = (y 1, y 2, …, y N ) is a random, independent sample, and x = (x 1, x 2, …, x N ) is also an independent sample, the test of H 0 is straightforward. Under H 0, r has the distribution (N is sample size, e.g., the number of cells): (*) Therefore, p-value for observing an extreme r obs is: Equivalently, the test of H 0 can be done using a t-test because has a t-distribution. Note these two tests are identical.

4 However, in reality y = (y 1, y 2, …, y N ) is rarely an independent sample, neither is x = (x 1, x 2, …, x N ). This nuisance is caused by autocorrelation. Autocorrelation inflates type I error. This means two uncorrelated maps will be more likely mistakenly accepted as significantly correlated (reject a true hypothesis). In order to make a correct inference, we need to penalize the sample size. For example, although the sample size is n, the effective sample size should be much smaller than n because of autocorrelation. The effective sample size can be calculated following the method of Clifford et al. (1989), or Dutilleul’s method for small sample size. Clifford, P., Richardson, S. and Hemon, D Assessing the significance of the correlation between two spatial processes. Biometrics 45: Dutilleul, P Modifying the t test for assessing the correlation between two spatial processes. Biometric 49:

5 The effective sample size can be calculated following the method of Clifford et al. (1989). where is a covariance matrix among the n locations. It is a N×N symmetric matrix. It can be estimated by variogram of geostatistics. Calculating the variogram is the most important step to test H 0. The major part of computation is to estimate the variogram and the covariance (covariogram) matrix. Covariogram is a decreasing function, i.e., two nearby locations have high covariance than locations far away. Therefore, the covariance matrix captures the spatial correlation structure of the data. distance covariance

6 Once we have estimated the covariance matrix, the effective sample size is: Then the test of H 0 can follow the same probability distribution as (*), but replace N in (*) by the effective sample size M. The p-value can be as calculated: Note the W-test described in Clifford et al. is very similar to the above test, thus, is not included in my R program. Simply,, and W ~ N(0,1), a standard normal distribution.

7 Description of R program The main program is called “association.main”. It has five functions. boxcox.fn: boxcoxize the data to make it normality. generatexy.fn: generate a location matrix, and plot the map (image) variogram.fn: calculate empirical variogram for a data varcov.fn: estimate covariance using a theoretical model to fit empirical variogram. test.association.fn: calculate p-value for the test.

8 Example: BCI plot – correlation between number of recruits and number of species. Cell size = 10×10 m. Total number of cells N = 5000 Data file name in R: bci.recruit.dat Number of recruits Number of species > bci.recruit.dat[1:10,] abund nsp recruit simpson ……… …… 5000 …… …… Question of great ecological interest is: Whether diversity (species richness) promotes recruitment and seedling survival? Wills, C. et al Non-random processes contribute to the maintenance of diversity in tropical forests. Science 311:

9 >association.main(bci.recruit.dat, map1=2, map2=3,cellsize=10,boxcox=“no”) The results are: Correlation coef. r = Original sample size = 5000 p-value = 1e-04 Effective sample size = p-value = map1 = 2 is “number of species”, map2=3 is “number of recruit” The correlation coefficient between the two maps is Without considering autocorrelation, it is highly significant with p- value = After taking account of spatial autocorrelation, it is marginally different from 0, with p-value = (It is significant at p=0.05 level, but not at p=0.001 level.) Example: BCI plot – correlation between number of recruits and number of species. Cell size = 10×10 m. Total number of cells N = 5000 Note: You need package geoR to run this program.