Rodolphe Devillers (Almost) everything you always wanted to know (or maybe not…) about Geographically Weighted Regressions JCU Stats Group, March 2012
Outline Background Spatial autocorrelation Spatial non-stationarity Geographically Weighted Regressions (GWR)
Outline Background Spatial autocorrelation Spatial non-stationarity Geographically Weighted Regressions (GWR)
Background
Decrease in cod populations 1984
1985 Decrease in cod populations
1986 Decrease in cod populations
1987 Decrease in cod populations
1988 Decrease in cod populations
1989 Decrease in cod populations
1990 Decrease in cod populations
1991 Decrease in cod populations
1992 Decrease in cod populations
1993 Decrease in cod populations
1994 Decrease in cod populations
Scientific surveys Fisheries observers 4 species > records GeoCod Project (2006-…) Biological Data Goal: Get a better understanding of the spatial and temporal dynamics of some fish/shellfish species in the NW Atlantic region, and their relationship with the physical environmental Environmental Data Temperature Salinity Remote Sensing > 300 GB
Fisheries data Collection Environmental data Other data(Bathy, etc.) IntegrationAnalysis Normalized database Visualization 1234 GeoCod project
Context A number of statistical methods can be used Testing spatial statistics SpeciesEnvironnement ?
Outline Background Spatial autocorrelation Spatial non-stationarity Geographically Weighted Regressions (GWR)
Spatial autocorrelation “ …the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations. ” (Legendre, 1993)
Spatial autocorrelation - Basics Positive (Neighbours more similar) Neutral (Random) Negative (Neighbours less similar)
Spatial autocorrelation – is it common? Elevation Air/water temperature Air humidity Disease distribution Species abundance Housing value Etc.
Spatial autocorrelation – why bother? Spatial autocorrelation in the data leads to spatial autocorrelation in the residuals
Spatial autocorrelation – why bother? Most statistics are based on the assumption that the values of observations in each sample are independent of one another Consequence: it will violate the assumption about the independence of residuals and call into question the validity of hypothesis testing Main effect: Standard errors are underestimated, t-scores are overestimated (= increases the chance of a Type I error = Incorrect rejection of a Null Hypothesis) Sometime inverts the slope of relationships.
Spatial autocorrelation – how to measure it? Measures of spatial autocorrelation: Moran’s I Geary’s C Others (e.g. Getis’ G)
Spatial autocorrelation – How can I deal with it? Many ways to handle this: Subsampling, adjusting type I error, adjusting the effective sample size, etc. (Dale and Fortin (2002) Ecoscience 9(2)) Autocovariate regressions, spatial eigenvector mapping (SEVM), generalised least squares (GLS), conditional autoregressive models (CAR), simultaneous autoregressive models (SAR), generalised linear mixed models (GLMM), generalised estimation equations (GEE), etc. (More details: Dormann et al. (2007) Ecography 30) If spatial autocorrelation is not stationary: GWR
Outline Background Spatial autocorrelation Spatial non-stationarity Geographically Weighted Regressions (GWR)
Stationarity Classical regression models are valid under the assumptions that phenomena are stationary temporally and spatially (=statistical parameters such as the mean, the variance or the spatial autocorrelation do not vary depending on the geographic position) E.g. Coral bleaching = 0.55 Temperature Nutrients + … - … Studies (in various fields, including terrestrial ecology) have shown that they are rarely stationary
Global vs Local Statistics Simpson Paradox
Local spatial statistics Local Indicators of Spatial Association (LISA) Local Moran’s I (used to detect clustering) Getis-Ord Gi* (hotspot analysis) Look at GeoDa (free software from Luc Anselin Local regressions: GWR
Outline Background Spatial autocorrelation Spatial non-stationarity Geographically Weighted Regressions (GWR)
Brunsdon, Fortheringham and Charlton GWR
Increasingly used in various fields (mostly since 2006, and even more since integrated into ArcGIS) Sally: yes, it is also available in R… (spgwr)
Criticized by some authors (e.g. Wheeler 2005, Cho et al. 2009) when using collinear data, potentially leading to: Occasional inflation of the variance Rare inversion of the sign of the regression GWR
Windle, M., Rose, G., Devillers, R. and Fortin, M.-J. Exploring spatial non-stationarity of fisheries survey data using geographically weighted regression (GWR): an example from the Northwest Atlantic. ICES Journal of Marine Science, 67:
GWR Geographically Weighted Regression (GRW ) (μ,ν): geographic coordinates of the samples Multiple regression model (global) y: dependent variable, x 1 to x p : independent variables, β 0: origin, β 1 to β p : coefficients, ε: error.
Cod presence/absence (threshold at 5 kg) for the Fall 2001 Method Government fisheries scientific survey data (Fisheries and Oceans Canada)
Method – Data interpolation
Method
Combining data in a single point data file Exporting data points in a file (.dbf) Temperature Cod Crab Shrimp Year 2001 Method
GWR software (version 3.0) 200km used for tests About 25 minutes per file of 5500 points
Fixed Variable
Results Test of spatial stationarity of independent variables used in the regression Spatial stationarity Spatial non- stationarity
Results spatial stationarity Windle et al. (accepted) - MEPS Stationarity of bottom temperature used to model shrimp biomass
Results Comparison of regression models
Results Test of the spatial auto-correlation of the residuals
Results
K-means clustering of the t values of the GWR coefficients Positive relationship between crab and shrimp, weak relationship with the coast Negative relationship with crab and distance, positive with shrimp Stronger negative relationship with crab
Results GAM systematically has lower AIC values, suggesting a non-linear relationship between cod and the variables used in the analysis Strong Weak AIC: Akaike Information Criterion
Results Min and max GWR coefficients (R 2 ) Model power decreases with years
GWR coefficients– Capelan
GWR coefficients – Catch per Unit Effort
Conclusions The spatial structure of data matters Ecology (and mostly marine ecology) is still in the process of adopting such methods GWR is an interesting method but can be hard to interpret and should be used together with other methods
Questions? Technical questions beyond my knowledge: Matt Windle Technical questions beyond Matt’s knowledge: (allow for several months for an answer)