Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems,

Slides:



Advertisements
Similar presentations
Spatial autoregressive methods
Advertisements

Spatial Autocorrelation using GIS
Spatial statistics Lecture 3.
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
Local Measures of Spatial Autocorrelation
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
QUANTITATIVE DATA ANALYSIS
Applied Geostatistics Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured.
Applied Geostatistics
SA basics Lack of independence for nearby obs
Why Geography is important.
Quantitative Genetics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Tse-Chuan Yang, Ph.D The Geographic Information Analysis Core Population Research Institute Social Science Research Institute Pennsylvania State University.
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
IS415 Geospatial Analytics for Business Intelligence
Global Measures of Spatial Autocorrelation
Ch 5 Practical Point Pattern Analysis Spatial Stats & Data Analysis by Magdaléna Dohnalová.
Area Objects and Spatial Autocorrelation Chapter 7 Geographic Information Analysis O’Sullivan and Unwin.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Chapter 9 Statistical Data Analysis
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Spatial Statistics Applied to point data.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Interpolation.
Spatial Statistics in Ecology: Area Data Lecture Four.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Model Construction: interpolation techniques 1392.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sampling Populations Ideal situation - Perfect knowledge Not possible in many cases - Size & cost Not necessary - appropriate subset  adequate estimates.
Comparisons of NDVI values Dean Monroe. Location.
Introduction to Spatial Microsimulation Dr Kirk Harland.
Spatial Analysis & Vulnerability Studies START 2004 Advanced Institute IIASA, Laxenburg, Austria Colin Polsky May 12, 2004 Graduate School of Geography.
Extent and Mask Extent of original data Extent of analysis area Mask – areas of interest Remember all rasters are rectangles.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.
So, what’s the “point” to all of this?….
Local Spatial Statistics Local statistics are developed to measure dependence in only a portion of the area. They measure the association between Xi and.
Spatial Statistics and Analysis Methods (for GEOG 104 class).
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Phil Hurvitz Avian Conservation Lab Meeting 8. March. 2002
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
GIS September 27, Announcements Next lecture is on October 18th (read chapters 9 and 10) Next lecture is on October 18th (read chapters 9 and 10)
Technical Details of Network Assessment Methodology: Concentration Estimation Uncertainty Area of Station Sampling Zone Population in Station Sampling.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Material from Prof. Briggs UT Dallas
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Geog. 579: GIS and Spatial Analysis - Lecture 16 Overheads Measures based on absolute adjacency Area features Geary Index (ratio/interval)
A Framework and Methods for Characterizing Uncertainty in Geologic Maps Donald A. Keefer Illinois State Geological Survey.
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
Why Model? Make predictions or forecasts where we don’t have data.
Spatial statistics: Spatial Autocorrelation
Task 2. Average Nearest Neighborhood
Introduction to Spatial Statistical Analysis
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
Spatial statistics Topic 4 2/2/2007.
Tabulations and Statistics
Spatial Autocorrelation
CH2. Cleaning and Transforming Data
Spatial interpolation
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Sampling Distributions
Why are Spatial Data Special?
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Concepts and Applications of Kriging
Presentation transcript:

Local Indicators of Categorical Data Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems, 5(2),

Why does space matter? Toblers first Law: "Everything is related to everything else, but near things are more related than distant things.“[1] Spatial autocorrelation Observations are located in space/ have spatial component Where did someone get sick? Where are richer people living? A wide range of questions can be evaluated from a spatial perspective High likelihood of similar properties if distance (physical but also social etc.) is low Data has often distinct spatial characteristics Clustering vs randomness vs uniform distribution

Spatial Data Basics Spatial Data is stored together with attributes in two formats Raster Data Area represented by equally sized squares Vector Data Data represented as Points, Lines or Polygons

Global and local measures Expression of spatial value similarity Global Measures Moran’s I (deviation from mean) Geary’s C (actual values) Getis-Ord (identifies general clustering of high or low values) Join-Count Statistic (binary data) Single value for entire data set Local Measures Value for each observation E.g. Local Getis-Ord and Local Moran’s I Expression of spatial value similarity

Example Global Moran’s I N is the number of observations (points or polygons) is the mean of the variable X i is the variable value at a particular location X j is the variable value at another location W ij is a weight indexing location of i relative to j

Measures of Local Spatial Association Common Uses assessing the assumption of stationarity for a given study region identifying the existence of pockets of distinctive data values (hot and cold spots) identifying the scale (spatial extent) at which there is no discernible association of data values

Measures of Local Spatial Association Example Local Moran’s I Measurement of similarity for each region Local Getis Ord… Sum of local values creates global test statistic All common measures for continuous (and ordinal) variables Developed in context of regression to identify residuals Would quantify categorical data implying measurable distance No measurements for local spatial association of categorical data

Categorical Data Join-Count widely applied as global measure Mostly for binary data More classes problematic and require large sub regions to ensure sufficient counts Only cells and polygons Counts links between cells Values assigned based on occurrence or non occurrence Border between cells Assume from now on a raster dataset with black and white cells New: Local join-count statistic Different from quantitative data; two base concepts: composition which relates to aspatial characteristics of the different classes Global and local concentration configuration which refers to characteristics of the spatial distribution of the classes Clustering

Categorical Data Global composition: Share of one class at overall count 15 cells black, 85 white  total:100 Share: 15% black Local composition: If global composition is known likelihood of finding x members of a class is given by binominal distribution: Evaluation of significant presence and absence of cells based on formula above for specific m by m subregion; adjustment for multiple testing; assuming no spatial dependence Pr(X = x) < 0.05

Join-Count Test Statistic Test Statistic given by: Z= Observed - Expected SD of Expected Expected = randomly generated Expected ValuesSD of expected k= total number of joins Pb expected proportion black (random or given)  pw proportion white M is based on k via

Categorical Data Global configuration Counts all possible links and counts links with b/b, w/w, b/w – share Rarely used High share of b/b and w/w in contrast to b/w indicates clustering High relative share of b/w indicates dispersal Local configuration Local configuration dependent on global and local composition  Conditional relationship; Is number of joint counts different from random distribution of black cells

Categorical Data Local configuration continued Using global composition we derive joint count for random distribution Distribution of joint counts For large datasets with counts for b/b, w/w and b/w larger than 30: normal approximation Smaller: total count or simulation of sample configurations Counting all links in subregion around spatial unit Identifying all cells which differ significantly regarding b/b, w/w and b/w count from global value assuming randomness Local composition and configuration can be combined as tool for visualization

An example Perennial shrub Atriplex hymenelytra Study area: Death Valley, CA Black: Presence of perennial shrub / White: Absence Global composition: 65/256=0.254 Insignificant global test, no spatial association Local tests for matrices: 3x3, 5x5 and 7x7

Example Significant deviations from global composition under the assumption of non-dependence

Example Significant deviation from global configuration under the assumption of non- dependence

Example Combination of both Interpretation can be difficult Hot clumps,and hot or clump only indicate area specifically suitable for growth of the shrub Explorative data analysis: next question: What makes this area special?

Problems Assumption of global spatial non-dependence Problematic True random patterns very unlikely With global spatial dependence: Too liberal: many local hotspots identified Suggested method: Identify cells with significant local composition Compare number and distribution with random simulations Identify cells with significant local configuration (clumps) Compute probability to encounter black cells in clumps + outside of clumps Evaluate local composition using additive binominal with all subregions Useful? Step two enables evaluation via montecarlo simulation if numbers and distribution vary significantly We are still often interested in the hotspots  targets for intervention etc.

Potential Problems Vector data characterized by unequally sized polygons How to define areas? Steps to central polygon Potential bias towards large polygons with many boundaries Highly complex data problematic What if polygon has multiple borders with second polygon Other methods yield also results Moran’s I and Getis-Ord produce results with binary data Though conceptually inappropriate might provide hints and include global composition and standardize Scan Statistic to identify hot-spots but requires conversion to point data

Potential Problems Edge Effects What to do with missing values at the edge of the study area? Use of count data to estimate edge effects highly problematic Modifiable areal unit problem (MAUP) Testing across varying subregion sizes (steps) Clustering varies across geographic scales Multiple testing Can be too conservative

Conclusion Joint counts well established measure of global spatial association for categorical data Development of local spatial statistics for categorical data More accurate conceptual treatment of categorical data Can visualize clustering and concentration of categorical data Useful for explorative spatial data analysis But often limited to binary problem Practical improvement? A local Moran’s I may provide an indication It depends on the question asked Assessing impact of global measures Complicated and not fully developed Necessity: depends on question asked

Software to deal with spatial problems GIS Spatial data tool Spatial properties (adjacency…) inherent to datasets – worry free Tools can be created in Python/ integrated tools for spatial statistics Push a button but limited options in non-spatial statistics R Flexible and a large variety of available tools Data has to be preprocessed to allow spatial calculation –adjacency etc. Can take some time (Matlab) (SAS) Seems to have a variety of procedures for point data analysis

Code in R Introduction to spatial R: Creating neighbors in spatial data: This can be also used to create all subregions Global join count (SPDEP package): Perform this test on all subregions using global as expected values Procedure for test for differences in local composition (has to be performed for all spatial units) (stats package): devel/library/stats/html/prop.test.html devel/library/stats/html/prop.test.html

References Anselin, L. (1995). Local indicators of spatial association-LISA. Geographical analysis, 27(2), Boots, B. (2003). Developing local measures of spatial association for categorical data. Journal of Geographical Systems, 5(2), Rogerson, P., & Yamada, I. (2008). Statistical detection and surveillance of geographic clusters. CRC Press. Tobler W., (1970) "A computer movie simulating urban growth in the Detroit region". Economic Geography, 46(2):