Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Spatial Data Analysis in the Social Sciences RSOC597A: Special Topics in Methods/Statistics Kathy Brasier Penn State University June 14,

Similar presentations


Presentation on theme: "Introduction to Spatial Data Analysis in the Social Sciences RSOC597A: Special Topics in Methods/Statistics Kathy Brasier Penn State University June 14,"— Presentation transcript:

1 Introduction to Spatial Data Analysis in the Social Sciences RSOC597A: Special Topics in Methods/Statistics Kathy Brasier Penn State University June 14, 2005

2 Session Objectives Understand why spatial data analysis is important Understand why spatial data analysis is important Identify types of questions for which SDA is relevant Identify types of questions for which SDA is relevant Gain basic knowledge of the concepts, statistics, and methods of SDA Gain basic knowledge of the concepts, statistics, and methods of SDA Identify some important issues and decision points within SDA Identify some important issues and decision points within SDA Learn about some resources for doing spatial data analysis (software, web sites, books, etc.) Learn about some resources for doing spatial data analysis (software, web sites, books, etc.) Avoid getting lost in equations! Avoid getting lost in equations!

3 Why Do Spatial Analysis? “Everything is related to everything else, but closer things more so.” (attributed to Tobler)

4 Examples Is your educational level likely to be similar to your neighbor’s? Is your educational level likely to be similar to your neighbor’s? Are farm practices likely to be similar on neighboring farms? Are farm practices likely to be similar on neighboring farms? Are housing values likely to be similar in nearby developments? Are housing values likely to be similar in nearby developments? Do nearby neighborhoods have similar burglary rates? Do nearby neighborhoods have similar burglary rates?

5 County Homicide Rates 1990

6 What Is Spatial Data? 4 main types 4 main types event data, spatially continuous data, zonal data, spatial interaction data event data, spatially continuous data, zonal data, spatial interaction data Most frequently used in social sciences is zonal data Most frequently used in social sciences is zonal data Data aggregated to a set of areal units (counties, MSAs, census blocks, ZIP codes, watersheds, etc.) Data aggregated to a set of areal units (counties, MSAs, census blocks, ZIP codes, watersheds, etc.) Variables measured over the set of units Variables measured over the set of units Examples: Census, REIS, County and City Databook, etc. Examples: Census, REIS, County and City Databook, etc.

7 What is Spatial Data Analysis? “The analysis of data on some process operating in space, where methods are sought to describe or explain the behavior of this process and its possible relationship to other spatial phenomena.” Bailey and Gatrell (1995:7) Objective of spatial data analysis: to understand the spatial arrangement of variable values, detect patterns, and examine relationships among variables

8 Why Do Spatial Data Analysis? To learn more about what you’re studying To learn more about what you’re studying To avoid specification problems (missing variables, measurement error) To avoid specification problems (missing variables, measurement error) To ensure satisfaction of statistical assumptions To ensure satisfaction of statistical assumptions To be cool! To go crazy! To learn more about statistics than you ever wanted or thought possible! To be cool! To go crazy! To learn more about statistics than you ever wanted or thought possible! To learn the limitations of statistics To learn the limitations of statistics

9 Theoretical Reasons for Spatial Analysis It tells us something more about what we’re studying It tells us something more about what we’re studying Is there an unmeasured process that affects the phenomenon? Is there an unmeasured process that affects the phenomenon? Does this process manifest itself in space? Does this process manifest itself in space? Examples: interaction processes, diffusion, historical or ethnic legacy, programmatic effects Examples: interaction processes, diffusion, historical or ethnic legacy, programmatic effects

10 Statistical Reasons for Spatial Analysis Violation of regression assumptions Violation of regression assumptions Units of analysis might not be independent Units of analysis might not be independent Parameter estimates are inefficient Parameter estimates are inefficient Estimated error variance is downwardly biased, which inflates the observed R 2 values Estimated error variance is downwardly biased, which inflates the observed R 2 values If spatial effects are present, and you don’t account for them, your model is not accurate! If spatial effects are present, and you don’t account for them, your model is not accurate!

11 Examples of Research Using SDA Epidemiology (environmental exposure research) Epidemiology (environmental exposure research) Criminology (crime patterns) Criminology (crime patterns) Education (neighborhood effects on attainment) Education (neighborhood effects on attainment) Diffusion/adoption (technologies) Diffusion/adoption (technologies) Social movements (trade unions, demonstrations) Social movements (trade unions, demonstrations) Market analysis (housing and land price variation) Market analysis (housing and land price variation) Spillover effects (economic spillovers of universities) Spillover effects (economic spillovers of universities) Regional studies (regional income variation & inequality) Regional studies (regional income variation & inequality) Demography (segregation patterns) Demography (segregation patterns) Political science (election studies) Political science (election studies)

12 BREAK!!

13 When do you need to do SDA? Is there a theoretical reason to suspect differences across space? Is there a theoretical reason to suspect differences across space? Differences in phenomena (variable values) Differences in phenomena (variable values) Differences in relationships between phenomena (covariances) Differences in relationships between phenomena (covariances) Are you using data with spatial referent? Are you using data with spatial referent? If yes to both, it is a good idea to at least explore any potential spatial effects If yes to both, it is a good idea to at least explore any potential spatial effects Exploration will tell you more about the subject you’re studying Exploration will tell you more about the subject you’re studying

14 Spatial Independence Null hypothesis (H 0 ) Null hypothesis (H 0 ) Any event has an equal probability of occurring at any position in the region Any event has an equal probability of occurring at any position in the region Position of any event is independent of the position of any other Position of any event is independent of the position of any other Implicit assumption of much work in social sciences Implicit assumption of much work in social sciences

15 Spatial Effects Test Hypothesis (H 1 ) Test Hypothesis (H 1 ) Probability of an event occurring not equal for each location within region Probability of an event occurring not equal for each location within region Position of any one event dependent on position of any other event Position of any one event dependent on position of any other event Methods and statistics of SDA test this hypothesis Methods and statistics of SDA test this hypothesis If supported, can tell us more about what we’re studying; can improve our models If supported, can tell us more about what we’re studying; can improve our models If not supported, we know that we have satisfied assumptions If not supported, we know that we have satisfied assumptions

16 First Order Spatial Effects Non-uniform distribution of observations over space Non-uniform distribution of observations over space Large-scale variation in mean across the spatial units Large-scale variation in mean across the spatial units Values of the variables are not independent of their spatial location Values of the variables are not independent of their spatial location Results from interaction of unique characteristics of the units and their spatial location Results from interaction of unique characteristics of the units and their spatial location Ex: magnets and iron filings (Bailey & Gatrell) Ex: magnets and iron filings (Bailey & Gatrell) Referred to as spatial heterogeneity Referred to as spatial heterogeneity

17 Causes of Spatial Heterogeneity Patterns of social interaction that create unique characteristics of spatial units Patterns of social interaction that create unique characteristics of spatial units Spatial regimes: legacies of regional core-periphery relationships => differences between units (pop, econ dvpt, etc.) Spatial regimes: legacies of regional core-periphery relationships => differences between units (pop, econ dvpt, etc.) Differences in physical features of spatial units Differences in physical features of spatial units Size of counties Size of counties Combination: Combination: Differences in topography of units => different patterns of economic development (extractive industries) Differences in topography of units => different patterns of economic development (extractive industries)

18 County Homicide Rates 1990 First order effects?

19 Second Order Spatial Effects Localized covariation among means (or other statistics) within the region Localized covariation among means (or other statistics) within the region Tendency for means to ‘follow’ each other in space Tendency for means to ‘follow’ each other in space Results in clusters of similar values Results in clusters of similar values Ex: magnets and iron filings (Bailey & Gatrell) Ex: magnets and iron filings (Bailey & Gatrell) Referred to as spatial dependence (spatial autocorrelation) Referred to as spatial dependence (spatial autocorrelation)

20 Causes of Spatial Dependence Underlying socio-economic process has led to clustered distribution of variable values Underlying socio-economic process has led to clustered distribution of variable values Grouping processes Grouping processes grouping of similar people in localized areas grouping of similar people in localized areas Spatial interaction processes Spatial interaction processes people near each other more likely to interact, share people near each other more likely to interact, share Diffusion processes Diffusion processes Neighbors learn from each other Neighbors learn from each other Dispersal processes Dispersal processes People move, but tend to be short distances, take their knowledge with them People move, but tend to be short distances, take their knowledge with them Spatial hierarchies Spatial hierarchies Economic influences that bind people together Economic influences that bind people together Mis-match of process and spatial units Mis-match of process and spatial units Counties vs retail trade zones Counties vs retail trade zones Census block groups vs neighborhood networks Census block groups vs neighborhood networks

21 County Homicide Rates 1990 Second order effects?

22 So now that I’ve convinced you that spatial data analysis is an important consideration…. What Do We Do About It?

23 Goals of SDA To identify spatial effects and their causes To identify spatial effects and their causes To appropriately measure spatial effects To appropriately measure spatial effects To incorporate spatial effects into models To incorporate spatial effects into models To improve our knowledge of the process and how it occurs over space To improve our knowledge of the process and how it occurs over space All of these goals require both theory and methods All of these goals require both theory and methods

24 Exploratory Spatial Data Analysis Start with questions about your theory and data: Start with questions about your theory and data: Are there likely to be spatial processes at work (diffusion, interaction, etc.)? Are there likely to be spatial processes at work (diffusion, interaction, etc.)? Do your data units match the process? Do your data units match the process? (Messner et al. reading) (Messner et al. reading) Visually and statistically explore your data Visually and statistically explore your data Run basic descriptive statistics Run basic descriptive statistics Map variables Map variables Look for patterns, outliers Look for patterns, outliers Look for spatial effects (large-scale variation, localized clusters) Look for spatial effects (large-scale variation, localized clusters)

25 Gini Index 1989

26 How to Measure ‘Space’? Need to define space in order to measure its effects Need to define space in order to measure its effects Traditional ways (regional dummy variables, distance measures, etc.) Traditional ways (regional dummy variables, distance measures, etc.) Neighborhood structure Neighborhood structure Weights matrix Weights matrix n x n matrix, where: n x n matrix, where: 0 = not neighbor 1 = neighbor

27 Weights Matrix ‘Neighbors’ can be defined as: ‘Neighbors’ can be defined as: Boundaries: Boundaries: Adjacent units (rook or queen) Adjacent units (rook or queen) Those units sharing some minimum/maximum proportion of common boundary Those units sharing some minimum/maximum proportion of common boundary Centroids Centroids If centroids are within some specified distance If centroids are within some specified distance If unit is one of k nearest neighbors defined by centroid distance If unit is one of k nearest neighbors defined by centroid distance Others? Others? Decision to use one over another somewhat arbitrary Decision to use one over another somewhat arbitrary Simpler is generally better Simpler is generally better Closer is generally better Closer is generally better Rely on theory, your knowledge, and the ESDA to guide you Rely on theory, your knowledge, and the ESDA to guide you

28 Weights Matrix Example 123 456 789 Simple Contiguity (rook) Matrix 1234567891010100000 2101010000 3010001000 4100010100 5010101010 6001010001 7000100010 8000010101 9000001000 Sample Region and Units

29 Statistical Tests for Spatial Dependence (Autocorrelation) Univariate Global Moran’s I Univariate Global Moran’s I Indicates presence and degree of spatial autocorrelation among variable values across spatial units Indicates presence and degree of spatial autocorrelation among variable values across spatial units Where z is a vector of variable values expressed as deviations from the mean Where W is the weights matrix Expected value of I convergences on 0 when n is large; can do significance tests Large positive => strong clustering of similar values Large negative => strong clustering of dissimilar values

30 Global Moran’s I and Moran Scatterplot Assesses relationship between the variable value for unit of origin (x axis) against the average of the values its neighbors (y axis)

31 Local Indicators of Spatial Autocorrelation (LISA) Local Moran’s I Local Moran’s I Decomposes global measure into each unit’s contribution Decomposes global measure into each unit’s contribution Identifies the local ‘hotspots’, areas which contribute disproportionately to global Moran’s I Identifies the local ‘hotspots’, areas which contribute disproportionately to global Moran’s I

32 LISA Cluster Maps Homicide Rate 1990 Gini Index 1989

33 Additional Suggestions for ESDA Identify outliers and hotspots both statistically and visually Identify outliers and hotspots both statistically and visually Try taking outlier units out of analysis and see what happens (does Moran’s I change?) Try taking outlier units out of analysis and see what happens (does Moran’s I change?) Explore changes in spatial patterns over time Explore changes in spatial patterns over time Compare two (or more) regions Compare two (or more) regions Split your sample by a variable of interest Split your sample by a variable of interest Try different weights matrices Try different weights matrices Play around with different covariates – get into your data! Play around with different covariates – get into your data!

34 BREAK!!!

35 Regression Modeling and SDA Use theory and ESDA findings to craft your model Use theory and ESDA findings to craft your model Procedure: Procedure: Run OLS model Run OLS model Assess diagnostics Assess diagnostics If diagnostics indicate no spatial autocorrelation (or other violations of regression assumptions), OLS model is fine If diagnostics indicate no spatial autocorrelation (or other violations of regression assumptions), OLS model is fine If diagnostics indicate spatial autocorrelation present, need to consider ways to measure and incorporate spatial structure If diagnostics indicate spatial autocorrelation present, need to consider ways to measure and incorporate spatial structure

36 OLS Diagnostics Diagnostics of OLS model will indicate type of spatial effects Diagnostics of OLS model will indicate type of spatial effects If either present, need to identify likely source If either present, need to identify likely source Remedies Remedies Spatial heterogeneity (Koenker-Bassett test) Spatial heterogeneity (Koenker-Bassett test) Include covariate which accounts for heterogeneity? Include covariate which accounts for heterogeneity? Split region? Split region? Spatial autocorrelation (Lagrange Multiplier tests) Spatial autocorrelation (Lagrange Multiplier tests) Identify missing variables? Identify missing variables? Explore effects of spatially-lagged independent variables? Explore effects of spatially-lagged independent variables? Use appropriate spatial regression model? Use appropriate spatial regression model?

37 Spatial Regression Models ESDA and OLS diagnostics tell you that there is spatial autocorrelation ESDA and OLS diagnostics tell you that there is spatial autocorrelation Identify the source (LM tests will help) Identify the source (LM tests will help) Regression residuals (LM-Error) Regression residuals (LM-Error) Mis-match of process and spatial units => systematic errors, correlated across spatial units Mis-match of process and spatial units => systematic errors, correlated across spatial units Dependent variable (LM-Lag) Dependent variable (LM-Lag) Underlying socio-economic process has led to clustered distribution of variable values => influence of neighboring values on unit values Underlying socio-economic process has led to clustered distribution of variable values => influence of neighboring values on unit values Spatial autocorrelation in both Spatial autocorrelation in both

38 Spatial Autocorrelation in Residuals => Spatial Error Model y = Xβ + εε = λWε + ξ ε is the vector of error terms, spatially weighted (W); λ is the coefficient; and ξ is the vector of uncorrelated, homoskedastic errors Incorporates spatial effects through error term Incorporates spatial effects through error term

39 Spatial Autocorrelation in Dep. Variable => Spatial Lag Model y = ρWy + Xβ + ε y is the vector of the dependent variable, spatially weighted (W); ρ is the coefficient Incorporates spatial effects by including a spatially lagged dependent variable as an additional predictor

40 Spatial Lag Example 172634 445564 758693 Spatial lag = sum of spatially-weighted values of neighboring cells Spatial lag = sum of spatially-weighted values of neighboring cells = 1/3(7) + 1/3(5) + 1/3(4) = 5.3 Sample Region and Units

41 Example: Change in Farm Numbers 1982-1992 RQ: RQ: How do changes in agricultural structure affect the rates of farm loss during the Farm Crisis? How do changes in agricultural structure affect the rates of farm loss during the Farm Crisis? Hypothesized spatial effect: Hypothesized spatial effect: spatial dependence through clustering of similar types of farms spatial dependence through clustering of similar types of farms

42 Farm Structure Example: Moran’s I Statistics Matrix Moran’s I for dep var Contiguity0.465*** 45-mile0.413*** 100-mile0.267***

43 Farm Structure Example: LISA Maps

44 Farm Structure Example: OLS Regression & Diagnostics Variable (sig. only) Coeff. Prime farmland -0.343* Corporate Farming 0.196*** 0.196*** Small-scale Farming 0.904*** 0.904*** … Adj. R 2 0.696 0.696 Likelihood (L) -410.187 AIC 862.374 862.374Prob.LM-Error0.000 R-LM-Error0.024 LM-Lag0.000 R-LM-Lag0.000

45 Farm Structure Example: Spatial Error – Spatial Lag Regression Variable (sig. only) Coeff. Prime farmland -0.243* Corporate Farming 0.180*** 0.180*** Small-scale Farming 0.820*** 0.820*** Rho (dep var) 0.381*** 0.381*** Lambda (error) 0.044 0.044 Adj. R 2 0.740 0.740 Likelihood (L) -381.736 AIC 807.473 807.473Prob.LM-Error0.212 Likelihood ratio test for spatial lag dependence 0.768

46 Practical Issues with SDA Scale of observations vs scale of process Scale of observations vs scale of process Time as a factor in analysis (no natural order) Time as a factor in analysis (no natural order) Definition of proximity Definition of proximity Edge/boundary effects Edge/boundary effects Modifiable area unit problem Modifiable area unit problem Complexity of topography Complexity of topography Assumptions related to ‘sample’ of attributes Assumptions related to ‘sample’ of attributes

47 How in the Heck Do I Actually Do This? Existing statistical software packages (SPSS, SAS) Existing statistical software packages (SPSS, SAS) Have trouble with weights matrix, so need to bring in by hand Have trouble with weights matrix, so need to bring in by hand Some routines exist, but limited Some routines exist, but limited Comprehensive software packages Comprehensive software packages S+ Spatialstats S+ Spatialstats Linear spatial regression; weights construction Linear spatial regression; weights construction Not transparent; no diagnostics; not compatible with ArcView 8.2 Not transparent; no diagnostics; not compatible with ArcView 8.2 Spatial Toolbox (LeSage) Spatial Toolbox (LeSage) Matlab routines Matlab routines Linear spatial regression; weights construction; Bayesian estimation; spatial probit/tobit models Linear spatial regression; weights construction; Bayesian estimation; spatial probit/tobit models

48 Software Packages (2) SpaceStat SpaceStat Linear spatial regression; weights construction; diagnostics; multiple options Linear spatial regression; weights construction; diagnostics; multiple options Outdated architecture and interface; not supported by Anselin; not compatible with ArcView 8.2 Outdated architecture and interface; not supported by Anselin; not compatible with ArcView 8.2 GeoDa & Spdep (R) GeoDa & Spdep (R) GeoDa strong in ESDA, mapping; weights construction; basic linear spatial regression w/ diagnostics GeoDa strong in ESDA, mapping; weights construction; basic linear spatial regression w/ diagnostics Spdep has linear spatial regression w/ diagnostics; greater functionality than GeoDa; driven by command language Spdep has linear spatial regression w/ diagnostics; greater functionality than GeoDa; driven by command language Both shareware, downloadable Both shareware, downloadable Little support, other than network of those using software Little support, other than network of those using software Anselin’s working on PySpace, software to have greater breadth of options, diagnostics, models, and estimation procedures Anselin’s working on PySpace, software to have greater breadth of options, diagnostics, models, and estimation procedures

49 Additional Resources Handout has resources listed (web, articles, etc.) Handout has resources listed (web, articles, etc.) Particularly CSISS, SAL Particularly CSISS, SAL If interested, consider joining Openspace listserve If interested, consider joining Openspace listserve AERS faculty AERS faculty Geographic Information Analysis group within PRI Geographic Information Analysis group within PRI

50 Assignment Details in handout Details in handout Article choices – Use those with * Article choices – Use those with * Due Date Due Date June 17 (Fri.) by 5:00 pm (email preferred) June 17 (Fri.) by 5:00 pm (email preferred) NOTE CHANGE: I will email you comments/grades NOTE CHANGE: I will email you comments/grades Re-writes due June 23 (Thur.) by 5:00 pm Re-writes due June 23 (Thur.) by 5:00 pm Questions? Questions?


Download ppt "Introduction to Spatial Data Analysis in the Social Sciences RSOC597A: Special Topics in Methods/Statistics Kathy Brasier Penn State University June 14,"

Similar presentations


Ads by Google