Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
Outline input analysis input analyzer of ARENA parameter estimation
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Spatial Statistics II RESM 575 Spring 2010 Lecture 8.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Correlation and Autocorrelation
Applied Geostatistics Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured.
GG313 Lecture 8 9/15/05 Parametric Tests. Cruise Meeting 1:30 PM tomorrow, POST 703 Surf’s Up “Peak Oil and the Future of Civilization” 12:30 PM tomorrow.
Inference about a Mean Part II
Ch 5 Practical Point Pattern Analysis Spatial Stats & Data Analysis by Magdaléna Dohnalová.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Point Pattern Analysis
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Descriptive Statistics for Spatial Distributions Chapter 3 of the textbook Pages
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
IS415 Geospatial Analytics for Business Intelligence Lesson 10: Geospatial Data Analysis- Point Patterns Analysis.
Rectangle The area of a rectangle is by multiplying length and height. The perimeter of a rectangle is the distance around the outside of the rectangle.
Chapter 6 The Normal Probability Distribution
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Distance. Euclidean Distance Minimum distance from a source (Value NoData) Input grid must have at least one source cell with the rest of the grid.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Chapter 4 – Distance methods
Confidence intervals and hypothesis testing Petter Mostad
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Ripley K – Fisher et al.. Ripley K - Issues Assumes the process is homogeneous (stationary random field). Ripley K was is very sensitive to study area.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 15 – Analysis of Variance Math 22 Introductory Statistics.
Tests of Random Number Generators
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Point Pattern Analysis. Methods for analyzing completely censused population data F Entire extent of study area or F Each unit of an array of contiguous.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
Point Pattern Analysis
1 CLUSTER VALIDITY  Clustering tendency Facts  Most clustering algorithms impose a clustering structure to the data set X at hand.  However, X may not.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
From Wikipedia: “Parametric statistics is a branch of statistics that assumes (that) data come from a type of probability distribution and makes inferences.
Module 25: Confidence Intervals and Hypothesis Tests for Variances for One Sample This module discusses confidence intervals and hypothesis tests.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
METU, GGIT 538 CHAPTER V MODELING OF POINT PATTERNS.
Problem 1: Service System Capacity CustomersServed Customers Queue Server Problem: Can a server taking an average of x time units meet the demand? Solution.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Chi-square test.
Confidence Intervals and Hypothesis Tests for Variances for One Sample
Summary of Prev. Lecture
Comparing Three or More Means
Testing a Claim About a Mean:  Not Known
CHAPTER 11 Inference for Distributions of Categorical Data
Consolidation & Review
Module 26: Confidence Intervals and Hypothesis Tests for Variances for Two Samples This module discusses confidence intervals and hypothesis tests for.
Correlation and Regression
Chapter 9 Hypothesis Testing.
Chapter 23 Comparing Means.
Quantitative Methods in HPELS HPELS 6210
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Hypothesis Tests for Two Population Standard Deviations
Summary of Tests Confidence Limits
Analyzing the Association Between Categorical Variables
Hypothesis Tests for a Standard Deviation
CHAPTER 11 Inference for Distributions of Categorical Data
Working with Two Populations
Statistical Inference for the Mean: t-test
Presentation transcript:

Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed patterns to CSR. The two measurements that are used to describe pattern are: Density of points across the analysis area Distance between points within the analysis area

Distance Methods Distance methods are becoming more common * Does not require rasterization * Easy to do with GIS Point 1 10, 15 Point 2 15,20

Issues with Length Measurement Measurements in GIS are often made on horizontal projections of objects –length and area may be substantially lower than on a true three-dimensional surface

Be careful 0.25:1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = :1 – Hypotenuse = 3.16 No an issue if the gradient is uniform.

Manhattan Distance Distance is computed between to points (cells) by moving either N-S or E-W. Cell 1 15, 15 Cell 2 10, 20 (row, column)

Distance Methods Nearest-Neighbor Distance (NND) * Basic Statistics from Sample (Mean, SD) * Compare to Expect Population Mean, SD * Z statistic, R statistic * Assumes a normal distribution to compute expected values * Global estimate of pattern

Nearest Neighbor Distance R < 1 R > 1 R = 1

Nearest Neighbor Analysis Nearest neighbor analysis examines the distances between each point and the closest point to it, and then compares these to expected values for a random sample of points from a CSR (complete spatial randomness) pattern. CSR is generated by means of two assumptions: 1) that all places are equally likely to be the recipient of a case (event) and 2) all cases are located independently of one another. The mean nearest neighbor distance = where N is the number of points. d i is the nearest neighbor distance for point i.

The expected value of the nearest neighbor distance in a random pattern = where A is the area and B is the length of the perimeter of the study area. The variance =

And the Z statistic = This approach assumes: Equations for the expected mean and variance cannot be used for irregularly shaped study areas. The study area is a regular rectangle or square. Area (A) is calculated by (Xmax – Xmin) * (Ymax – Ymin), where these represent the study area boundaries. R statistic = Observed Mean d / Expect d R = 1 random, R  0 cluster, R  2+ uniform

2 x 0.5 A = 1, B = 5 E (di) = Var (d) = 8.85 x x 1 A = 1, B = 4 E(di) = Var(d) = 8.48 x x 2: E(di) =

Real world study areas are complex and violate the assumptions of most equations for expected values. Wilderness Campsites

Solution Simulate randomization using Monte Carlo Methods. Compare simulated distribution to observed. * If possible use the “true” area and perimeter to compute the expected value. * Software that does not ask for area/perimeter or a shapefile of the study area will assume a rectangle.

Autotheft – Within City

Autotheft - Downtown

Autotheft - Neighborhood

Nearest Neighbor - ArcMap MethodAreaObserved NND Expected NND Z ScoreP-Valve Euclidean Euclidean Manhattan Manhattan Manhattan

Distance Methods G Function (Revised NND) * Same measurements as NND * Analyzed using a CDF – Compare to Expected * Expected CDF can be Theoretical or Generated (E(G(d)) = * d statistic (max distance between Observed and Expected CDF) * Can test d statistic with the Kolmogorov- Smirnov Test

G Function From O’Sullivan and Unwin Geographic Information Analysis 1/12 = 0.083

Distance Methods F Function * Similar to G – but measures distance for a set of random points * Also uses CDF and same Expected Distribution Function as G * Harder to Interpret!!! * I have never used it. I also do not like it! Both G and F Functions have edge and area problems. Better to use a generated expected distribution

G and F Functions Clustered Evenly Spaced From O’Sullivan and Unwin Geographic Information Analysis

K Function (Riley, 1976) * Statistic is based on the sum of all the points within a distance d of each observation where n = # of points λ = Density (n/area) C(s i, d) = a circle with radius d centered at point s i Distance Methods

Ripley K counts the number of points found with r distance from each point. The maximum r distance should be about ½ the short dimension of the input points. The K increases quicker then expected the points are clustered. If K increases slower then expected the points are dispersed.

Distance Methods Expect K(d) E(K(d)) = λ π d 2 / λ = π d 2 L(d) = (K(d)/ π) 1/2 E(L(d)) = d

K Function Clustered Evenly Spaced From O’Sullivan and Unwin Geographic Information Analysis L(d)

There are a total of 32 points in this analysis. New Mexico is approximately 500km per side, so we will set our maximum study distance at 250km. We choose 25 increments so that we will calculate the observed L(d) and confidence interval for every 10km. 99 permutations are used for creating the confidence envelope in order to test the null hypothesis at approximately the a=0.01 level.

Figure 2: Graph of K-Function Results

A graph of the K-function results is shown below. The observed L(d) is 0 for 10km and 20km because the closest pair of points is approximately 29km apart. At a distance of 30km, the observed L(d) falls within the generated confidence interval. However, for distances between 40km and 90km the observed L(d) lies outside of the confidence interval. This indicates that we can reject the null hypothesis of CSR. Also, since the observed L(d) is less than the Minimum L(d), this implies that we have a statistically significant dispersed or regular distribution of points.