Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Association Defining the relationship between two variables.

Similar presentations


Presentation on theme: "Spatial Association Defining the relationship between two variables."— Presentation transcript:

1 Spatial Association Defining the relationship between two variables.

2 Method Depends On Data Type The statistical/spatial analysis method is the function of measurement level and the spatial data model. NominalOrdinalInterval/ Ratio Nominal Chi-sq Median by Nominal Class Mean by nominal class K-S test Ordinal Rank Correlation Coefficient Mean of the ordinal class Interval/ Ratio Co-variance Cross correlation Correlation

3 Chi-Square Chi-Square can be used to compare: –Area A to Area B –Area A to Line –Area A to Point The hypothesis is always the same: –HO: The distribution of observations across Area A is equal to the Expected Distribution, where the Expected Distribution is usually CSR. Or does the distribution of Area A explains the distribution of the observations, assume that random observations would be distributed proportionally across Area A. –HA: Not equal to Expected Distribution, indicating a potential first order effect.

4 Chi-Square

5 Advantages: –Easy to compute and interpret –Non-parametric, distribution neutral –“Easy” to determine expected values – proportional to area –Can be applied to nominal (count) data. Disadvantages: –Results are influenced by the scale of the observations – use other indices –Ideal for points, more problematic with areas and lines. –Influenced by zone systems – arbitrary areas

6 Cramer V Statistic Cramer's coefficient is a measure of association that ranges from 0 to 1. A Cramer's coefficient of 0 indicates that the calculated chi-square is 0, i.e., the observed frequencies are all equal to the expected frequencies. This means that the there is perfect independence between the rows and columns and the column variable provides no information about the row variable. A Cramer's coefficient of 1 indicates that the calculated chi-square is the highest possible chi-square value [n(L-1)]; this indicates a perfect relationship between the rows and columns -- the column variable provides perfect information about the row variable. A V greater the 0.7 is strong, between 0.4-0.7 is moderate, between 0.2-0.4 is weak. V equals the square root of chi-square divided by sample size, n, times m, which is the smaller of (rows - 1) or (columns - 1): –V = SQRT(X 2 /nm).

7 Chi-Square Calculate the chi-square statistic and the Cramer's coefficient for the following data. Test for significance at the 0.05 level. The Table value for a Chi-square statistic with 4 degrees of freedom at the 0.05 level is 9.488. 3.28 is between right-tail probability of 0.7 and 0.5 Where V = [3.28 / 16 * 1] 1/2 Veg TypeVeg Area sq.km. Fraction of Area Observed Fire area sq km Expected Fire Area sq km Chi-sq A10000.250.221.62 B10000.253.521.125 C8000.22.31.60.3063 D10000.251.920.005 E2000.050.10.40.225 Total40001883.2813 V = 0.453

8 Chi-Square Chi-square is sensitive to scale. The bigger the numbers the large chi-square. V will normalize for scale. Here, although the chi-square is very high the results may still be only moderately strong with a V of 0.453 (note vegetation type a and b). Veg TypeVeg Area haFraction of Area Observed Fire area haExpected Fire Area haChi-sq A1000000.2520200162 B1000000.25350200112.5 C800000.223016030.625 D1000000.251902000.5 E200000.05104022.5 Total4000001800 328.13 V= 0.453

9 Kolmogorov – Smirnov Test Compare Observed CFD to Expect CDF –HO: Observed EQ Expect –HA: Observed NE Expect, indication of a 1st order effect Expected can be any distribution – usually CSR. Advantages: –Ideal for comparing points to fields, more problematic with areas and lines. –Nonparametric – distribution neutral –Easy to compute and interpret Disadvantages: –How to compute the Expect CDF? –Use random number of point with the same sample size as the observed. If the sample is “small” you random points may not appear random. –Create a CDF for the population using all measurements or a large sample. In this case you are sampling the environment and you are asking are the sample points randomly located across the environment.

10 K-S Test Archeology sites vs. distance from wadis Random n = 250 Sites n = 84 P=0.01; Dmax = 0.21 P=0.05; Dmax = 0.17 P=0.10; Dmax = 0.15 A Dmax = 0.12 indicates the distance distributions may be the same

11 Dmax = 0.98 – 0.86 = 0.12

12 Analysis of Environmental Justice Point in Polygon Analysis

13

14

15 Erie Chi-Squared V = 0.86V = 0.37

16 Interpreting Chi Square Zero indicates no relationship Large numbers indicate stronger relationship Or, a table of significance can be consulted to determine if the specific value is statistically significant The fact that we have shown that there is a correlation between variables does NOT mean that we have found out anything about WHY this is so. In our analysis we might state our assumptions as to why this is so, but we would need to perform other analyses to show causation.

17 Spatial Correspondence of Areal Distributions Quadrat and nearest-neighbor analysis deal with a single distribution of points Often, we want to measure the distribution of two or more variables The coefficient of Areal correspondence and chi-square statistics perform these tasks

18 Coefficient of Areal Correspondence Simple measure of the extent to which two distributions correspond to one another –Compare wheat farming to areas of minimal rainfall Based on the approach of overlay analysis

19 Overlay Analysis Two distributions of interest are mapped at the same scale and the outline of one is overlaid with the other

20 Coefficient of Areal Correspondence CAC is the ratio between the area of the region where the two distributions overlap and the total area of the regions covered by the individual distributions of the entire region

21

22 Result of CAC Where there is no correspondence, CAC is equal to 0 Where there is total correspondence, CAC is equal to 1 CAC provides a simple measure of the extent of spatial association between two distributions, but it cannot provide any information about the statistical significance of the relationship

23 Resemblance Matrix Proposed by Court (1970) Advantages over CAC –Limits are –1 to +1 with a perfect negative correspondence given a value of –1 –Sampling distribution is roughly normal, so you can test for statistical significance


Download ppt "Spatial Association Defining the relationship between two variables."

Similar presentations


Ads by Google