Point Pattern Analysis

Slides:



Advertisements
Similar presentations
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Chapter 14 Comparing two groups Dr Richard Bußmann.
Introduction to Statistics
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Spatial Statistics II RESM 575 Spring 2010 Lecture 8.
Review: What influences confidence intervals?
Chi-square Test of Independence
Hypothesis Testing - the scientists' moral imperative moral imperative moral imperative To tell whether our data supports or rejects our ideas, we use.
Introduction to Mapping Sciences: Lecture #5 (Form and Structure) Form and Structure Describing primary and secondary spatial elements Explanation of spatial.
Linear Regression Example Data
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Chapter 12: Analysis of Variance
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Inference for regression - Simple linear regression
Hypothesis Testing:.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Chapter 9 Statistical Data Analysis
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Spatial Statistics Applied to point data.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
1 Statistical Inference. 2 The larger the sample size (n) the more confident you can be that your sample mean is a good representation of the population.
Statistical Analysis Mean, Standard deviation, Standard deviation of the sample means, t-test.
Chi-square Test of Independence Hypotheses Neha Jain Lecturer School of Biotechnology Devi Ahilya University, Indore.
Basic Statistics Inferences About Two Population Means.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Sampling Populations Ideal situation - Perfect knowledge Not possible in many cases - Size & cost Not necessary - appropriate subset  adequate estimates.
Chapter 4 – Distance methods
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Tests of Random Number Generators
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
A Summary of An Introduction to Statistical Problem Solving in Geography Chapter 12: Inferential Spatial Statistics Prepared by W. Bullitt Fitzhugh Geography.
What’s the Point? Working with 0-D Spatial Data in ArcGIS
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.
So, what’s the “point” to all of this?….
Final Project : 460 VALLEY CRIMES. Chontanat Suwan Geography 460 : Spatial Analysis Prof. Steven Graves, Ph.D.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
PCB 3043L - General Ecology Data Analysis.
Point Pattern Analysis
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Week 13a Making Inferences, Part III t and chi-square tests.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Lecture Slides Elementary Statistics Twelfth Edition
Statistics for Managers using Microsoft Excel 3rd Edition
Summary of Prev. Lecture
Comparing Three or More Means
PCB 3043L - General Ecology Data Analysis.
Elementary Statistics
Review: What influences confidence intervals?
Presentation transcript:

Point Pattern Analysis using Spatial Inferential Statistics Briggs Henan University 2010

Briggs Henan University 2010 Last time Concept of statistical inference Drawing conclusions about populations from samples Null Hypothesis of no difference Alternative hypotheses (which we really want to accept) Random point pattern Is our observed point pattern “significantly different from random” Briggs Henan University 2010

How Point Pattern Analysis (PPA) is different From Centrographic Statistics (previously): Centrographic Statistics calculates single, summary measures PPA analyses the complete set of points From Spatial Autocorrelation (discussed later): with PPA, the points have location only; there is no “magnitude” value With Spatial Autocorrelation points have different magnitudes; there is an attribute variable. Briggs Henan University 2010

Approaches to Point Pattern Analysis Two primary approaches: Point Density using Quadrat Analysis Based on polygons Analyze points using polygons! Uses the frequency distribution or density of points within a set of grid squares. Point Association using Nearest Neighbor Analysis Based on points Uses distances between the points Although the above would suggest that the first approach examines first order effects and the second approach examines second order effects, in practice the two cannot be separated. Briggs Henan University 2010

Quadrat Analysis: The problem of selecting quadrat size Too small: many quadrats with zero points Too big: many quadrats have similar number of points O.K. Length of Quadrat edge = A=study area N= number of points Modifiable Areal Unit Problem Briggs Henan University 2010

Briggs Henan University 2010 Uniform grid --used for secondary data Types of Quadrats Random sampling --useful in field work Frequency counts by Quadrat would be: Multiple ways to create quadrats --and results can differ accordingly! Quadrats don’t have to be square --and their size has a big influence Briggs Henan University 2010

Quadrat Analysis: Variance/Mean Ratio (VMR) Apply uniform or random grid over area (A) with width of square given by: Treat each cell as an observation and count the number of points within it, to create the variable X Calculate variance and mean of X, and create the variance to mean ratio: variance / mean For an uniform distribution, the variance is zero. Therefore, we expect a variance-mean ratio close to 0 For a random distribution, the variance and mean are the same. Therefore, we expect a variance-mean ratio around 1 For a clustered distribution, the variance is relatively large Therefore, we expect a variance-mean ratio above 1 Where: A = area of region n = # of points See following slide for example. See O&U p 98-100 for another example Briggs Henan University 2010

Briggs Henan University 2010 random x uniform x Clustered x RANDOM UNIFORM/ DISPERSED CLUSTERED Formulae for variance 2 Note: N = number of Quadrats = 10 Ratio = Variance/mean Briggs Henan University 2010

Significance Test for VMR A significance test can be conducted based upon the chi-square frequency distribution The test statistic is given by: (sum of squared differences)/Mean The test will ascertain if a pattern is significantly more clustered than would be expected by chance (but does not test for a uniformity) The values of the test statistics in our cases would be: For degrees of freedom: N - 1 = 10 - 1 = 9, the value of chi-square at the 1% level is 21.666. Thus, there is only a 1% chance of obtaining a value of 21.666 or greater if the points had been allocated randomly. Since our test statistic for the clustered pattern is 80, we conclude that there is (considerably) less than a 1% chance that the clustered pattern could have resulted from a random process = random 60-(202)/10 = 10 2 uniform 40-(202)/10 = 0 2 clustered 200-(202)/10 = 80 2

Quadrat Analysis: Frequency Distribution Comparison Rather than base conclusion on variance/mean ratio, we can compare observed frequencies in the quadrats (Q= number of quadrats) with expected frequencies that would be generated by a random process (modeled by the Poisson frequency distribution) a clustered process (e.g. one cell with P points, Q-1 cells with 0 points) a uniform process (e.g. each cell has P/Q points) The standard Kolmogorov-Smirnov test for comparing two frequency distributions can then be applied – see next slide See Lee and Wong pp. 62-68 for another example and further discussion. Briggs Henan University 2010

Kolmogorov-Smirnov (K-S) Test The test statistic “D” is simply given by: D = max [ Cum Obser. Freq – Cum Expect. Freq] The largest difference (irrespective of sign) between observed cumulative frequency and expected cumulative frequency The critical value at the 5% level is given by: D (at 5%) = 1.36 where Q is the number of quadrats Q Expected frequencies for a random spatial distribution are derived from the Poisson frequency distribution and can be calculated with: p(0) = e-λ = 1 / (2.71828P/Q) and p(x) = p(x - 1) * λ /x Where x = number of points in a quadrat and p(x) = the probability of x points P = total number of points Q = number of quadrats λ = P/Q (the average number of points per quadrat) See next slide for worked example for cluster case Briggs Henan University 2010

Briggs Henan University 2010 Row 10 The spreadsheet spatstat.xls contains worked examples for the Uniform/ Clustered/ Random data previously used, as well as for Lee and Wong’s data Briggs Henan University 2010

Weakness of Quadrat Analysis Results may depend on quadrat size and orientation (Modifiable areal unit problem) test different sizes (or orientations) to determine the effects of each test on the results Is a measure of dispersion, and not really pattern, because it is based primarily on the density of points, and not their arrangement in relation to one another Results in a single measure for the entire distribution, so variations within the region are not recognized (could have clustering locally in some areas, but not overall) For example, quadrat analysis cannot distinguish between these two, obviously different, patterns For example, overall pattern here is dispersed, but there are some local clusters Briggs Henan University 2010

Nearest-Neighbor Index (NNI) (O&U p. 100) Uses distances between points It compares: the mean of the distance observed between each point and its nearest neighbor with the expected mean distance if the distribution was random: Observed Average Distance Expected Average Distance For random pattern, NNI = 1 For clustered pattern, NNI = 0 For dispersed pattern, NNI = 2.149 NNI = See next slide for formulae for calculation Briggs Henan University 2010

Calculating Nearest Neighbor Index Where: The average distance to nearest neighbor Area of region: result very dependent on this value Briggs Henan University 2010

Significance Test for NNI The test statistic is calculated as follows: Z = Av. Distance Observed - Av. Distance Expected. Standard Error It has a Normal Frequency Distribution. It tests if the observed pattern is significantly different from random. if Z is below –1.96 or above +1.96, we are “95% confident that the distribution is not randomly distributed.” or can say: If the observed pattern was random, there are less than 5 chances in 100 we would have observed a z value this large. Note: in the example on the next slide, the fact that the NNI for uniform is 1.96 is coincidence! Briggs Henan University 2010

Calculating Test Statistic for Nearest Neighbor Index Where: (Standard error) Briggs Henan University 2010

Briggs Henan University 2010 RANDOM CLUSTERED UNIFORM NNI Mean distance NNI Mean distance NNI Mean distance Z = 5.508 Z = -0.1515 Z = 5.855 Briggs Henan University 2010 Source: Lembro

Running in ArcGIS Telecom and Software Companies Result is very dependent on area of the region. There is an option to insert your own value. Default value is the “minimum enclosed rectangle that encompasses all features. Briggs Henan University 2010

Briggs Henan University 2010 results Scroll up the window to see all the results. Note: Progress box continues to run until graphic is closed. Always close graphic window first. Produced if “Display output graphically “ box is  Briggs Henan University 2010

Evaluating the Nearest Neighbor Index Advantages Unlike quadrats, the NNI considers distances between points No quadrat size problem However, NNI has problems Very dependent on the value of A, the area of the study region. What boundary do we use for the study area? Minimum enclosing rectangle? (highly affected by a few outliers) Convex hull Convex hull with buffer. What size buffer? There is an “adjustment for edge effects” but problems remain Based on only the mean distance to the nearest neighbor Doesn’t incorporate local variations, or clustering scale could have clustering locally in some areas, but not overall Based on point location only and does not incorporate magnitude of phenomena (quantity) at that point Briggs Henan University 2010

Ripley’s K(d) Function Ripley’s K is calculated multiple times, each for a different distance band, So it is represented as K(d): K is a function of distance, d The distance bands are placed around every point K (d) is the average density of points at each distance (d), divided by the average density of points in the entire area (n/a) If the density is high for a particular band, then clustering is occurring at that distance O&U p. 135-137 Where S is a point, and C(si, d) is a circle of radius d, centered at si Ripley B.D. 1976. The second –order analysis of stationary point processes. Journal of Applied Probability 13: 255-266 Briggs Henan University 2010

Briggs Henan University 2010 Not this simple with real data!!! The low end (0.2) corresponds to distances within the cluster The high end (0.6) corresponds to distance between the clusters within clustered between The distance bands are placed around every point. Note the big problem of edge effects from circles outside the study area. dispersed Source: O’Sullivan & Unwin, p. Begins flat Briggs Henan University 2010

Running in ArcGIS Telecom and Software Companies use 9 for tests--99 takes a long time! Weight field—number of points at that location Distance bands Result is very dependent on area of the region. Can insert your own value. Again—study area has big effect so there are several options for this Briggs Henan University 2010

Interpreting the Results Not this simple with real data!!! Clustered, since observed is above expected Dispersed, if observed was below expected Pattern is clustered! Expected based on random pattern Observed Expected

Briggs Henan University 2010 Distance bands: start 5,000 feet size: 10,000 feet Expected assumes random pattern Confidence band—9 iterations (takes long time for 99!) Results for 10,000 feet Bands Briggs Henan University 2010

Briggs Henan University 2010 Distance bands: start 10,000 feet size: 20,000 feet Also experiment with different region (study area) boundaries. Results for 20,000 feet Bands Briggs Henan University 2010

Briggs Henan University 2010 Plotting the Difference Between Observed and Expected K, versus Distance Y field = Diffk = ObservedK - ExpectedK X field = ExpectedK or HiConfEnv Distance between clusters 70,000 feet = 13 miles = 20 km Briggs Henan University 2010

Problems with Ripley K(d) Dependent on study area boundary (edge effect) Circles go outside study area Special adjustments are available (see O&U p. 148) Try different options for boundary in ArcGIS Affected by circle radii selected Try different values Each point has unit value—no magnitude or quantity Weight field assumes “X” points at that location e.g. X = 3, then 3 points at that location Briggs Henan University 2010

Briggs Henan University 2010 What have we learned? How to measure and test if spatial patterns are clustered or dispersed. Briggs Henan University 2010

Briggs Henan University 2010 Why is this important? ? We can measure and test --not just look and guess! Is it clustered? That is science. Briggs Henan University 2010

Briggs Henan University 2010 Not just GIS! I taught these tools to senior undergraduate geography students. They are also used in Earth Management. A former Henan University student and faculty member (now at UT-Dallas) is using Ripley’s K function for research on urban forests. Briggs Henan University 2010

Briggs Henan University 2010 Next Time No classes next week Next class will be Wednesday November 17 Topic Spatial Autocorrelation Unlike PPA, in Spatial Autocorrelation points have different magnitudes; there is an attribute variable. Briggs Henan University 2010

Briggs Henan University 2010