Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shuming Bao China Data Center University of Michigan Introduction to Spatial Statistics and Regression Models.

Similar presentations


Presentation on theme: "Shuming Bao China Data Center University of Michigan Introduction to Spatial Statistics and Regression Models."— Presentation transcript:

1 Shuming Bao China Data Center University of Michigan Introduction to Spatial Statistics and Regression Models

2 Topics I. An Overview II. An Overview of Spatial Statistics III. Geostatistical Analysis IV. An Overview of Spatial Econometric Models V. Applications of Spatial Statistics and Models

3 I. An Overview

4 lWhy spatial data is different from non-spatial lWhy spatial data is different from non-spatial data ? (spatial neighborhood) lStatistical property for spatial data: lSpatial dependence (autocorrelation) l Heterogeneity l Spatial trend (non-stationarity) lSensitive to spatial boundaries and spatial unit lSensitive to spatial boundaries and spatial unit (Country, County, Tract) Elevation, Major cities and Lat / Long grid Why Spatial is Special?

5 Difference between Conventional Statistics and Spatial Statistics Con. statistics Spatial statistics Data: Time-series data Spatial data (cross-sectional) Relationship: Time ( y t-1, y t, y t+1 ) Topology ( y i-1, y i, y i+1 ) Process: {Z(t), t  T} {Z(s;t), s  D(t), t  T} Model: Y =  WY +  t = 1, 2, 3, … w i,j = 1 if i is adjacent to j  - time-series  - spatial autocorrelation autocorrelation

6 Tests on spatial patterns:  Tests on spatial non-stationarity  Tests on spatial autocorrelation Data-driven approaches (Exploratory Spatial Data Analysis)  Global Statistics  Local statistics Model-driven approaches  Spatial linear and non-linear models  Space-temporal models Issues of Spatial Statistics

7 Types of Spatial Data: Geospatial data Polygons Points Lines Images/Grid Socioeconomic data County/Province statistics Census data Social surveys Spatial Data Spatial Data Sources: Geographic data (polygons, points and lines) Arc/Info data Shape files (*.shp, *.shx, and *.dbf) Grid Image data (ERDAS Image, JPEG, TIFF, BMP and Arc/Info Image) Tabular data (dBASE, INFO and TEXT) SQL SDE (Spatial Data Engine)

8 Adjacency criterion: 1 if location j is adjacent to i, w ij =  0 if location j is not adjacent to i. Distance criterion: 1 if location j is within distance d from i, w ij (d) =  0 otherwise. A general spatial distance weight matrices: w ij (d) = d ij -a  b Defining Spatial Weight Matrices

9  Criteria: theoretical and empirical Accessibility ( roads, rivers, railways, airlines and Internet ) Economic linkage ( commuter flows, migrations, trade flows ) Social linkage ( college admission, language ) Locational linkage ( neighborhood, geographical distance )  Methodology: Binary matrix Row standardized matrix Weight function (wij=f(x,y..)) Defining Spatial Linkage (Weights)

10 解答问题 What is the difference between spatial dependence and spatial heterogeneity? Spatial dependence (空间依赖性) : 指空间观测点(至少两个)之间的空间依赖关系。 spatial heterogeneity (空间异质性) : 指空间中不同点上的观测值具有不同的空间分布,如在不同观测点 所得到的随机观察值服从到的正态分布函数(期望值或方差不同)。 如:比较人均寿命 —— 武汉市和上海市不同;西藏和新疆不同 。 Non-stationary (非平衡性、非稳定性 ) : 指空间分布面上表现的非稳定性。 无论是非稳定性,还是稳定性,在空间分布上都可能具有差异(表 现为期望和方差的不同),因而也会具有空间异质性。

11 II. An Overview of Spatial Statistics

12 2.1. Spatial Analysis & Statistical Analysis of Spatial Data Spatial analysis recognize location as a property and context of data, and ascribe explanatory power to location and spatial interaction effects Statistical analysis of spatial data recognize location as a factor producing effects through spatial interaction, but the procedures developed control for these effects in application of analytical techniques and interpretation of results. It treats the geography of phenomena as an external influence rather than allowing spatial relationships to have explanatory power.

13 2.2 Tests for Spatial Patterns To detect the spatial pattern (spatial association and spatial autocorrelation), some standard global and new local spatial statistics have been developed. These include the Moran I, Geary C (see Cliff and Ord 1973, 1981), G statistics (Getis 1992), LISA (Anselin 1995) and GLISA (Bao and Henry 1996). There are two aspects in common for all those spatial analytical techniques. First, they start from the assumption of a randomized distribution of spatial pattern. Second, the spa­tial pattern, spatial structure, or form for the spatial depend­ence are derived from the data only without pre-conceived theoretical notion.

14 Moran I (Z value) is positive: observations tend to be similar; negative: observations tend to be dissimilar; approximately zero: observations are arranged randomly over space. 2.3 Test for Spatial Autocorrelation: Moran I For the assumption of a normally distribution For the assumption of a randomly distribution The standardized Moran I:

15 Geary C: large C value (>>1): observations tend to be dissimilar; small C value (<<1) indicates that they tend to be similar. Geary C: 2.3 Test for Spatial Autocorrelation: Geary C The Geary statistic is always positive and asymptotically normal. The hypotheses for the Geary statistic test is that the mean of the Geary statistic is 1 if there is no spatial autocorrelation.

16 2.4 Test for Local Spatial Association Questions: (1)Is the observed value at location i surrounded by a cluster of high or low value? (2)Is the observed value at location i associated positively with the surrounding observations (similarity) or negatively with the surrounding observations (dissimilarity)? Local Spatial Statistics: (1)The G statistics (Ord and Getis 1992; Getis and Ord 1994) (2)LISA (Anselin 1995)

17 Local Moran: 2.4 Test for Local Spatial Association: LISA A pseudo-significance level of the I i may be obtained by a "conditional" randomization or permutation approach(Anselin 1995). The observed value of Z i at location i is held fixed and the remaining values are randomly permuted over all the locations in equal probability. In actual computa­tion, each resampled data set can be selected from the population randomly without replacement. The significance level p-value can be obtained by calculating the proportion of data permutations in the data sets that have emulated I i greater than (or less than) or equal to the actual I i. A small p-value (such as p < 0.05) indicates that location i is associated with relatively high values of the surrounding locations. A large p-value (such as p > 0.95) indicates that location i is associated with relatively low values in sur­rounding locations.

18 Local Geary: A large p-value (such as p > 0.95) indicates a small C i in extremes, which suggests a posi­tive spatial association (similarity) of observation i with its surrounding observations A small p-value (such as p < 0.05) indicates a large C i in extremes, which suggests a negative spatial association (dissimilarity) of observation i with its sur­rounding observations. 2.4 Test for Local Spatial Association: LISA

19 2.5 Test for Local Spatial Association: GLISA Generalized Forms for Local Moran (Bao and Henry 1996) where {Z i } is a series of standardized observations, {d ij } is a row standardized spatial weight matrix (where p j =P X=x j |X  x i ),.

20 The generalized local Geary.

21 III. Geostatistical Analysis

22 Spatial Trend: Stationarity & Non-Stationarity

23 Simple Linear Surface Model Y = aX 1 + bX 2

24 Linear & Non-Linear Models

25 gam ma nugget sill range Distance h Theoretical Variogram: Identifying Spatial Trend

26 gam ma nugget sill range Distance h Experimental Variogram: where N(h k )={(i,j): x i -x i _ =h}, |N(h k )| is the number of distinct elements of N(h k ), or. Nugget Nugget - represent micro-scale variation or measurement error. Its estimated by  (0). Sill Sill - represent the variance of the random field lim h   (h). Range Range - the distance at which data are no longer autocorrelated. Identifying Spatial Trend

27 Nugget – Range - Sill

28

29 Robust Variogram

30 Isotropic and Anisotropic Models Isotropic model - the variogram depends only on the distance between the points. Anisotropic models - the variogram depends also on the direction (1) Geometric anisotropy - the range of the variogram changes in different directions while the sill remains constant. (2) Zonal anisotropy - the sill of the variogram changes with direction while the range remains constant.

31 Stationary Models Stationary model - the variogram at large distances converge to the sill,. Gaussian model: Exponential model: Spherical model: Hole-effect model: Nugget-effect model:

32 Non-Stationary Models Non-stationary model - the variogram tends to infinity as h tends to infinity. Power model: Logarithmic model: Linear model:

33 Variogram Fitting - Prior information - Experimental variogram - Diagnostic of residuals - Model validation

34 Interpolation (Kriging) Ordinary kriging Universal kriging Block kriging Point kriging

35 Variogram & LISA and Moran I Identify effective range of spatial weights: Spatial range

36 IV. An Overview of Spatial Econometric Models

37 Spatial Regression Spatially autoregressive model Spatial moving average model Semi-parametric model

38 Y =  WY +  where y is an observed variable over space D: {Y(s i ): s i  D, i=1?n }, W is a spatial weight matrix (nxn),  is the spatial autoregressive parameter, and  ~ N(0,  2 ). OLS estimates are biased and inconsistent: Simple Spatial Autoregressive Model

39 General Form of Spatial Model where W 1 and W 2 are spatial weight matrices,  ~ N(0,  ).

40 Test for Spatial Dependence Test for spatial error dependence: Moran I test: I = e ' e/e ' ~ N(  i,  2 i ) where e is a vector of OLS residuals and W is a spatial weight matrix. Largange Multiplier Test: Largange Multiplier Test: Test for the substantive spatial dependence:

41 Tests for Heterogeneity in the Presence of Spatial Dependence Chow test: H0:H1:H0:H1: where is the ML estimate for the spatial parameter, e R (e U ) are the residuals for a restricted (unrestricted) regression, and  2 is the estimate for the error variance for either the restricted model, the unrestricted model or both.

42 Issues on Spatial Data Analysis Heterogeneity in spatial data Boundary effect Scale effect Missing data Spatial weights

43 Research Topics: Theory and Methodology Spatial equilibrium theories and unequilibrium theories Exploratory spatial data analysis Spatial sampling Geostatistics, such as multivariates variogram and kriging Local spatial statistics Spatial cluster analysis Spatial data visualization - dynamic visualization, statistical graphics, multi-dimensional data visualization

44 Spatial Modeling Specification and tests of spatial models Space-time models Logistics model for categorical data Spatially weighted regression model Non-linear spatial models

45 Tools for Spatial Analysis Desktop Software SpaceStat for ArcView SPLUS for ArcView GeoDa The SAS ® Enterprise Intelligence …. Web Based Applications: DemographicsNow.com ….

46 Spatial Explorer for Statistics and Census Data

47 V. Applications

48 Local Pattern Test : Counties of the Local Gearys for 1980/1990 Population Growth in South Carolina

49 Spatial Range Test : Census Tracts of the Local Morans for 1980/1990 Population Growth in Columbia FEA

50 Learning from Spatial Data Is there any spatial cluster over space? Are spatial observations distributed randomly over space? Are spatial observations correlated (autocorrelation)? Is there any spatial outlier? Is there any spatial trend? What is the interaction (statistically and theoretically) between different variables? How the spatial patterns change? How to predict an unknown spatial value at a specific location?

51 Understanding the changes in our earth system environment Understanding the impacts of human activities on the earth system Improving public knowledge of the earth system and environmental crises Spatial Learning – Understand China and Global Changes

52 Related Publications Bao, Shumingm Orn Bodvarsson, Jack Hou, and Yaohui Zhao, 2009. Migration in China from 1985 to2000. In The Chinese Economy, 2009, Vol. 42 (4). Shi, Anqing, Shuming Bao, 2007. Migration, Education and Rural Development: Evidence from China 2000 Population Census Data. In Journal of Chinese Economic and Business Studies. 2007, Vol. 5 (2): 163 – 177. Bao, Shuming, Anqing Shi, and Jack W. Hou. 2006. Migration and Regional Development in China. In Shuming Bao, Shuanglin Lin, Changwen Zhao (Eds.), Chinese Economy after WTO Accession. Ashegate. Bao, Shuming, Anqing Shi and Jack W. Hou. 2005. An Analysis of the Spatial Changing Patterns of Migration in China. In China Population Science, 2005 (5). Bao, Shuming, Mark Henry and David Barkley, 2004. Identifying Urban-Rural Linkages - Tests for Spatial Effects in the Carlino-Mills Model. In: Luc Anselin, Raymond Florax and Sergio J. Rey (Eds.), Advances in Special Econometrics, Springer-Verlag New York, LLC. Bao, Shuming, Gene Chang, Jeffrey D. Sachs, and Wing Thye Woo. 2002. Geographic Factors and China’s Regional Development Under Market Reforms, 1978–98. In: China Economic Review, Vol. 13:89-111. Bao, Shuming, Luc Anselin, Doug Martin, and Diana Stralberg, 2000. An Seemless Integration of Spatial Statistics and GIS: the S-PLUS for ArcView and the S+GRASSLAND Link. Geographical Systems. 2000, Vol. 2:287-306. Bao, Shuming and Luc Anselin. 1997. "Linking Spatial Statistics with GIS: Operational Issues in SpaceStat/ArcView Interface and S+Grassland Link". In Proceedings of 1997 American Statistics Association Meeting, 61-66. Bao, Shuming and Mark S. Henry. 1996. "Heterogeneity issues in local measurements of spatial association.” Geographical Systems, 1996, Vol. 3: 1-13. Bao, Shuming, Mark. S. Henry, David L. Barkley, and Kerry Brooks. 1995. “RAS - an integrated Regional Analysis System with ARC/INFO.” Computers, Environment, and Urban Systems, 19, 1: 37-56.


Download ppt "Shuming Bao China Data Center University of Michigan Introduction to Spatial Statistics and Regression Models."

Similar presentations


Ads by Google