Download presentation
Presentation is loading. Please wait.
Published byTobias Banks Modified over 8 years ago
1
Shuming Bao China Data Center University of Michigan Introduction to Spatial Statistics and Regression Models
2
Topics I. An Overview II. An Overview of Spatial Statistics III. Geostatistical Analysis IV. An Overview of Spatial Econometric Models V. Applications of Spatial Statistics and Models
3
I. An Overview
4
lWhy spatial data is different from non-spatial lWhy spatial data is different from non-spatial data ? (spatial neighborhood) lStatistical property for spatial data: lSpatial dependence (autocorrelation) l Heterogeneity l Spatial trend (non-stationarity) lSensitive to spatial boundaries and spatial unit lSensitive to spatial boundaries and spatial unit (Country, County, Tract) Elevation, Major cities and Lat / Long grid Why Spatial is Special?
5
Difference between Conventional Statistics and Spatial Statistics Con. statistics Spatial statistics Data: Time-series data Spatial data (cross-sectional) Relationship: Time ( y t-1, y t, y t+1 ) Topology ( y i-1, y i, y i+1 ) Process: {Z(t), t T} {Z(s;t), s D(t), t T} Model: Y = WY + t = 1, 2, 3, … w i,j = 1 if i is adjacent to j - time-series - spatial autocorrelation autocorrelation
6
Tests on spatial patterns: Tests on spatial non-stationarity Tests on spatial autocorrelation Data-driven approaches (Exploratory Spatial Data Analysis) Global Statistics Local statistics Model-driven approaches Spatial linear and non-linear models Space-temporal models Issues of Spatial Statistics
7
Types of Spatial Data: Geospatial data Polygons Points Lines Images/Grid Socioeconomic data County/Province statistics Census data Social surveys Spatial Data Spatial Data Sources: Geographic data (polygons, points and lines) Arc/Info data Shape files (*.shp, *.shx, and *.dbf) Grid Image data (ERDAS Image, JPEG, TIFF, BMP and Arc/Info Image) Tabular data (dBASE, INFO and TEXT) SQL SDE (Spatial Data Engine)
8
Adjacency criterion: 1 if location j is adjacent to i, w ij = 0 if location j is not adjacent to i. Distance criterion: 1 if location j is within distance d from i, w ij (d) = 0 otherwise. A general spatial distance weight matrices: w ij (d) = d ij -a b Defining Spatial Weight Matrices
9
Criteria: theoretical and empirical Accessibility ( roads, rivers, railways, airlines and Internet ) Economic linkage ( commuter flows, migrations, trade flows ) Social linkage ( college admission, language ) Locational linkage ( neighborhood, geographical distance ) Methodology: Binary matrix Row standardized matrix Weight function (wij=f(x,y..)) Defining Spatial Linkage (Weights)
10
解答问题 What is the difference between spatial dependence and spatial heterogeneity? Spatial dependence (空间依赖性) : 指空间观测点(至少两个)之间的空间依赖关系。 spatial heterogeneity (空间异质性) : 指空间中不同点上的观测值具有不同的空间分布,如在不同观测点 所得到的随机观察值服从到的正态分布函数(期望值或方差不同)。 如:比较人均寿命 —— 武汉市和上海市不同;西藏和新疆不同 。 Non-stationary (非平衡性、非稳定性 ) : 指空间分布面上表现的非稳定性。 无论是非稳定性,还是稳定性,在空间分布上都可能具有差异(表 现为期望和方差的不同),因而也会具有空间异质性。
11
II. An Overview of Spatial Statistics
12
2.1. Spatial Analysis & Statistical Analysis of Spatial Data Spatial analysis recognize location as a property and context of data, and ascribe explanatory power to location and spatial interaction effects Statistical analysis of spatial data recognize location as a factor producing effects through spatial interaction, but the procedures developed control for these effects in application of analytical techniques and interpretation of results. It treats the geography of phenomena as an external influence rather than allowing spatial relationships to have explanatory power.
13
2.2 Tests for Spatial Patterns To detect the spatial pattern (spatial association and spatial autocorrelation), some standard global and new local spatial statistics have been developed. These include the Moran I, Geary C (see Cliff and Ord 1973, 1981), G statistics (Getis 1992), LISA (Anselin 1995) and GLISA (Bao and Henry 1996). There are two aspects in common for all those spatial analytical techniques. First, they start from the assumption of a randomized distribution of spatial pattern. Second, the spatial pattern, spatial structure, or form for the spatial dependence are derived from the data only without pre-conceived theoretical notion.
14
Moran I (Z value) is positive: observations tend to be similar; negative: observations tend to be dissimilar; approximately zero: observations are arranged randomly over space. 2.3 Test for Spatial Autocorrelation: Moran I For the assumption of a normally distribution For the assumption of a randomly distribution The standardized Moran I:
15
Geary C: large C value (>>1): observations tend to be dissimilar; small C value (<<1) indicates that they tend to be similar. Geary C: 2.3 Test for Spatial Autocorrelation: Geary C The Geary statistic is always positive and asymptotically normal. The hypotheses for the Geary statistic test is that the mean of the Geary statistic is 1 if there is no spatial autocorrelation.
16
2.4 Test for Local Spatial Association Questions: (1)Is the observed value at location i surrounded by a cluster of high or low value? (2)Is the observed value at location i associated positively with the surrounding observations (similarity) or negatively with the surrounding observations (dissimilarity)? Local Spatial Statistics: (1)The G statistics (Ord and Getis 1992; Getis and Ord 1994) (2)LISA (Anselin 1995)
17
Local Moran: 2.4 Test for Local Spatial Association: LISA A pseudo-significance level of the I i may be obtained by a "conditional" randomization or permutation approach(Anselin 1995). The observed value of Z i at location i is held fixed and the remaining values are randomly permuted over all the locations in equal probability. In actual computation, each resampled data set can be selected from the population randomly without replacement. The significance level p-value can be obtained by calculating the proportion of data permutations in the data sets that have emulated I i greater than (or less than) or equal to the actual I i. A small p-value (such as p < 0.05) indicates that location i is associated with relatively high values of the surrounding locations. A large p-value (such as p > 0.95) indicates that location i is associated with relatively low values in surrounding locations.
18
Local Geary: A large p-value (such as p > 0.95) indicates a small C i in extremes, which suggests a positive spatial association (similarity) of observation i with its surrounding observations A small p-value (such as p < 0.05) indicates a large C i in extremes, which suggests a negative spatial association (dissimilarity) of observation i with its surrounding observations. 2.4 Test for Local Spatial Association: LISA
19
2.5 Test for Local Spatial Association: GLISA Generalized Forms for Local Moran (Bao and Henry 1996) where {Z i } is a series of standardized observations, {d ij } is a row standardized spatial weight matrix (where p j =P X=x j |X x i ),.
20
The generalized local Geary.
21
III. Geostatistical Analysis
22
Spatial Trend: Stationarity & Non-Stationarity
23
Simple Linear Surface Model Y = aX 1 + bX 2
24
Linear & Non-Linear Models
25
gam ma nugget sill range Distance h Theoretical Variogram: Identifying Spatial Trend
26
gam ma nugget sill range Distance h Experimental Variogram: where N(h k )={(i,j): x i -x i _ =h}, |N(h k )| is the number of distinct elements of N(h k ), or. Nugget Nugget - represent micro-scale variation or measurement error. Its estimated by (0). Sill Sill - represent the variance of the random field lim h (h). Range Range - the distance at which data are no longer autocorrelated. Identifying Spatial Trend
27
Nugget – Range - Sill
29
Robust Variogram
30
Isotropic and Anisotropic Models Isotropic model - the variogram depends only on the distance between the points. Anisotropic models - the variogram depends also on the direction (1) Geometric anisotropy - the range of the variogram changes in different directions while the sill remains constant. (2) Zonal anisotropy - the sill of the variogram changes with direction while the range remains constant.
31
Stationary Models Stationary model - the variogram at large distances converge to the sill,. Gaussian model: Exponential model: Spherical model: Hole-effect model: Nugget-effect model:
32
Non-Stationary Models Non-stationary model - the variogram tends to infinity as h tends to infinity. Power model: Logarithmic model: Linear model:
33
Variogram Fitting - Prior information - Experimental variogram - Diagnostic of residuals - Model validation
34
Interpolation (Kriging) Ordinary kriging Universal kriging Block kriging Point kriging
35
Variogram & LISA and Moran I Identify effective range of spatial weights: Spatial range
36
IV. An Overview of Spatial Econometric Models
37
Spatial Regression Spatially autoregressive model Spatial moving average model Semi-parametric model
38
Y = WY + where y is an observed variable over space D: {Y(s i ): s i D, i=1?n }, W is a spatial weight matrix (nxn), is the spatial autoregressive parameter, and ~ N(0, 2 ). OLS estimates are biased and inconsistent: Simple Spatial Autoregressive Model
39
General Form of Spatial Model where W 1 and W 2 are spatial weight matrices, ~ N(0, ).
40
Test for Spatial Dependence Test for spatial error dependence: Moran I test: I = e ' e/e ' ~ N( i, 2 i ) where e is a vector of OLS residuals and W is a spatial weight matrix. Largange Multiplier Test: Largange Multiplier Test: Test for the substantive spatial dependence:
41
Tests for Heterogeneity in the Presence of Spatial Dependence Chow test: H0:H1:H0:H1: where is the ML estimate for the spatial parameter, e R (e U ) are the residuals for a restricted (unrestricted) regression, and 2 is the estimate for the error variance for either the restricted model, the unrestricted model or both.
42
Issues on Spatial Data Analysis Heterogeneity in spatial data Boundary effect Scale effect Missing data Spatial weights
43
Research Topics: Theory and Methodology Spatial equilibrium theories and unequilibrium theories Exploratory spatial data analysis Spatial sampling Geostatistics, such as multivariates variogram and kriging Local spatial statistics Spatial cluster analysis Spatial data visualization - dynamic visualization, statistical graphics, multi-dimensional data visualization
44
Spatial Modeling Specification and tests of spatial models Space-time models Logistics model for categorical data Spatially weighted regression model Non-linear spatial models
45
Tools for Spatial Analysis Desktop Software SpaceStat for ArcView SPLUS for ArcView GeoDa The SAS ® Enterprise Intelligence …. Web Based Applications: DemographicsNow.com ….
46
Spatial Explorer for Statistics and Census Data
47
V. Applications
48
Local Pattern Test : Counties of the Local Gearys for 1980/1990 Population Growth in South Carolina
49
Spatial Range Test : Census Tracts of the Local Morans for 1980/1990 Population Growth in Columbia FEA
50
Learning from Spatial Data Is there any spatial cluster over space? Are spatial observations distributed randomly over space? Are spatial observations correlated (autocorrelation)? Is there any spatial outlier? Is there any spatial trend? What is the interaction (statistically and theoretically) between different variables? How the spatial patterns change? How to predict an unknown spatial value at a specific location?
51
Understanding the changes in our earth system environment Understanding the impacts of human activities on the earth system Improving public knowledge of the earth system and environmental crises Spatial Learning – Understand China and Global Changes
52
Related Publications Bao, Shumingm Orn Bodvarsson, Jack Hou, and Yaohui Zhao, 2009. Migration in China from 1985 to2000. In The Chinese Economy, 2009, Vol. 42 (4). Shi, Anqing, Shuming Bao, 2007. Migration, Education and Rural Development: Evidence from China 2000 Population Census Data. In Journal of Chinese Economic and Business Studies. 2007, Vol. 5 (2): 163 – 177. Bao, Shuming, Anqing Shi, and Jack W. Hou. 2006. Migration and Regional Development in China. In Shuming Bao, Shuanglin Lin, Changwen Zhao (Eds.), Chinese Economy after WTO Accession. Ashegate. Bao, Shuming, Anqing Shi and Jack W. Hou. 2005. An Analysis of the Spatial Changing Patterns of Migration in China. In China Population Science, 2005 (5). Bao, Shuming, Mark Henry and David Barkley, 2004. Identifying Urban-Rural Linkages - Tests for Spatial Effects in the Carlino-Mills Model. In: Luc Anselin, Raymond Florax and Sergio J. Rey (Eds.), Advances in Special Econometrics, Springer-Verlag New York, LLC. Bao, Shuming, Gene Chang, Jeffrey D. Sachs, and Wing Thye Woo. 2002. Geographic Factors and China’s Regional Development Under Market Reforms, 1978–98. In: China Economic Review, Vol. 13:89-111. Bao, Shuming, Luc Anselin, Doug Martin, and Diana Stralberg, 2000. An Seemless Integration of Spatial Statistics and GIS: the S-PLUS for ArcView and the S+GRASSLAND Link. Geographical Systems. 2000, Vol. 2:287-306. Bao, Shuming and Luc Anselin. 1997. "Linking Spatial Statistics with GIS: Operational Issues in SpaceStat/ArcView Interface and S+Grassland Link". In Proceedings of 1997 American Statistics Association Meeting, 61-66. Bao, Shuming and Mark S. Henry. 1996. "Heterogeneity issues in local measurements of spatial association.” Geographical Systems, 1996, Vol. 3: 1-13. Bao, Shuming, Mark. S. Henry, David L. Barkley, and Kerry Brooks. 1995. “RAS - an integrated Regional Analysis System with ARC/INFO.” Computers, Environment, and Urban Systems, 19, 1: 37-56.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.