Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed patterns to CSR. The two measurements that are used to describe pattern are: Density of points across the analysis area Distance between points within the analysis area
Distance Methods Distance methods are becoming more common * Does not require rasterization * Easy to do with GIS Point 1 10, 15 Point 2 15,20
Issues with Length Measurement Measurements in GIS are often made on horizontal projections of objects –length and area may be substantially lower than on a true three-dimensional surface
Be careful 0.25:1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = 3.16 No an issue if the gradient is uniform.
Manhattan Distance Distance is computed between to points (cells) by moving either N-S or E-W. Cell 1 15, 15 Cell 2 10, 20 (row, column)
Distance Methods Nearest-Neighbor Distance (NND) * Basic Statistics from Sample (Mean, SD) * Compare to Expect Population Mean, SD * Z statistic, R statistic * Assumes a normal distribution to compute expected values * Global estimate of pattern
Nearest Neighbor Distance R < 1 R > 1 R = 1
Nearest Neighbor Analysis Nearest neighbor analysis examines the distances between each point and the closest point to it, and then compares these to expected values for a random sample of points from a CSR (complete spatial randomness) pattern. CSR is generated by means of two assumptions: 1) that all places are equally likely to be the recipient of a case (event) and 2) all cases are located independently of one another. The mean nearest neighbor distance = where N is the number of points. d i is the nearest neighbor distance for point i.
The expected value of the nearest neighbor distance in a random pattern = where A is the area and B is the length of the perimeter of the study area. The variance =
And the Z statistic = This approach assumes: Equations for the expected mean and variance cannot be used for irregularly shaped study areas. The study area is a regular rectangle or square. Area (A) is calculated by (Xmax – Xmin) * (Ymax – Ymin), where these represent the study area boundaries. R statistic = Observed Mean d / Expect d R = 1 random, R 0 cluster, R 2+ uniform
2 x 0.5 A = 1, B = 5 E (di) = Var (d) = 8.85 x x 1 A = 1, B = 4 E(di) = Var(d) = 8.48 x x 2: E(di) =
Real world study areas are complex and violate the assumptions of most equations for expected values. Wilderness Campsites
Solution Simulate randomization using Monte Carlo Methods. Compare simulated distribution to observed. * If possible use the “true” area and perimeter to compute the expected value. * Software that does not ask for area/perimeter or a shapefile of the study area will assume a rectangle.
Autotheft – Within City
Autotheft - Downtown
Autotheft - Neighborhood
Nearest Neighbor - ArcMap MethodAreaObserved NND Expected NND Z ScoreP-Valve Euclidean Euclidean Manhattan Manhattan Manhattan
Distance Methods G Function (Revised NND) * Same measurements as NND * Analyzed using a CDF – Compare to Expected * Expected CDF can be Theoretical or Generated (E(G(d)) = * d statistic (max distance between Observed and Expected CDF) * Can test d statistic with the Kolmogorov- Smirnov Test
G Function From O’Sullivan and Unwin Geographic Information Analysis 1/12 = 0.083
Distance Methods F Function * Similar to G – but measures distance for a set of random points * Also uses CDF and same Expected Distribution Function as G * Harder to Interpret!!! * I have never used it. I also do not like it! Both G and F Functions have edge and area problems. Better to use a generated expected distribution
G and F Functions Clustered Evenly Spaced From O’Sullivan and Unwin Geographic Information Analysis
K Function (Riley, 1976) * Statistic is based on the sum of all the points within a distance d of each observation where n = # of points λ = Density (n/area) C(s i, d) = a circle with radius d centered at point s i Distance Methods
Ripley K counts the number of points found with r distance from each point. The maximum r distance should be about ½ the short dimension of the input points. The K increases quicker then expected the points are clustered. If K increases slower then expected the points are dispersed.
Distance Methods Expect K(d) E(K(d)) = λ π d 2 / λ = π d 2 L(d) = (K(d)/ π) 1/2 E(L(d)) = d
K Function Clustered Evenly Spaced From O’Sullivan and Unwin Geographic Information Analysis L(d)
There are a total of 32 points in this analysis. New Mexico is approximately 500km per side, so we will set our maximum study distance at 250km. We choose 25 increments so that we will calculate the observed L(d) and confidence interval for every 10km. 99 permutations are used for creating the confidence envelope in order to test the null hypothesis at approximately the a=0.01 level.
Figure 2: Graph of K-Function Results
A graph of the K-function results is shown below. The observed L(d) is 0 for 10km and 20km because the closest pair of points is approximately 29km apart. At a distance of 30km, the observed L(d) falls within the generated confidence interval. However, for distances between 40km and 90km the observed L(d) lies outside of the confidence interval. This indicates that we can reject the null hypothesis of CSR. Also, since the observed L(d) is less than the Minimum L(d), this implies that we have a statistically significant dispersed or regular distribution of points.