R for Macroecology Spatial data continued
Projections Cylindrical projections Lambert CEA
Behrmann EA Latitude of true scale = 30
Choosing a projection What properties are important? Angles (conformal) Area (equal area) Distance from a point (equidistant) Directions should be strait lines (gnomonic) Minimize distortion Cylindrical, conic, azimuthal
Projecting points project() function in the proj4 package is good Can also use project() from rgdal spTransform() (in rgdal) works for SpatialPoints, SpatialLines, SpatialPolygons... Can also handle transformations from one datum to another
Projecting points > lat = rep(seq(-90,90,by = 5),(72+1)) > long = rep(seq(-180,180,by = 5),each = (36+1)) > xy = project(cbind(long,lat),"+proj=cea +datum=WGS84 +lat_ts=30") > par(mfrow = c(1,2)) > plot(long,lat) > plot(xy)
Projecting a grid > mat = raster("MAT.tif") > mat = aggregate(mat,10) > bea = projectExtent(mat,"+proj=cea +datum=WGS84 +lat_ts=30") > mat class : RasterLayer dimensions : 289, 288, (nrow, ncol, ncell) resolution : , (x, y) extent : 0, 12, , (xmin, xmax, ymin, ymax) projection : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0 values : in memory min value : max value : > bea class : RasterLayer dimensions : 289, 288, (nrow, ncol, ncell) resolution : , (x, y) extent : 0, , , (xmin, xmax, ymin, ymax) projection : +proj=cea +datum=WGS84 +lat_ts=30 +ellps=WGS84 +towgs84=0,0,0 values : none
Projecting a grid > bea = projectExtent(mat,"+proj=cea +datum=WGS84 +lat_ts=30") > res(bea) = xres(bea) > matBEA = projectRaster(mat,bea) > mat class : RasterLayer dimensions : 289, 288, (nrow, ncol, ncell) resolution : , (x, y) extent : 0, 12, , (xmin, xmax, ymin, ymax) projection : +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs +towgs84=0,0,0 values : in memory min value : max value : > matBEA class : RasterLayer dimensions : 169, 288, (nrow, ncol, ncell) resolution : , (x, y) extent : 0, , , (xmin, xmax, ymin, ymax) projection : +proj=cea +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 +lat_ts=30 values : in memory min value : max value :
How does it look?
What happened? x = xFromCell(bea,1:ncell(bea)) y = yFromCell(bea,1:ncell(bea)) plot(x,y,pch = ".") xyLL = project(cbind(x,y), "+proj=cea +datum=WGS84 +latts=30”,inverse = T) plot(xyLL,pch = ".")
What happened Grid of points in lat-long (where each point corresponds with a BEA grid cell) Sample original raster at those points (with interpolation) Identical spacing in x direction Different spacing in y direction
What are the units? > matBEA class : RasterLayer dimensions : 169, 288, (nrow, ncol, ncell) resolution : , (x, y) extent : 0, , , (xmin, xmax, ymin, ymax) projection : +proj=cea +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 +lat_ts=30 values : in memory min value : max value : Meters, along the latitude of true scale (30N and 30S)
A break to try things out Projections
Spatially structured data Tobler’s first law of geography “Everything is related to everything else, but near things are more related than distant things.” Waldo Tobler Nearby data points often have similar conditions Understanding these patterns can provide insights into the data, and are critical for statistical tests
Visualizing spatial structure The correlogram Often based on Moran’s I, a measure of spatial correlation Positive autocorrelation Negative autocorrelation
Making Correlograms First goal – get x, y and z vectors x = xFromCell(mat,1:ncell(mat)) y = yFromCell(mat,1:ncell(mat)) z = getValues(mat) assignColors = function(z) { z = (z-min(z,na.rm=T))/(max(z,na.rm=T)-min(z,na.rm=T)) color = rep(NA,length(z)) index = which(!is.na(z)) color[index] = rgb(z[index],0,(1-z[index])) return(color) } plot(x,y,col = assignColors(z),pch = 15, cex = 0.2)
Pairwise distance matrix Making a correlogram starts with a pairwise distance matrix! (N data points requires ~ N 2 calculations) Big data sets need to be subsetted down > co = correlog(x,y,z,increment = 10,resamp = 0, latlon = T,na.rm=T) Error in outer(zscal, zscal) : allocMatrix: too many elements specified
Pairwise distance matrix Making a correlogram starts with a pairwise distance matrix! (N data points requires ~ N 2 calculations) Big data sets need to be subsetted down sample() can help us do this > co = correlog(x,y,z,increment = 100,resamp = 0, latlon = T,na.rm=T) Error in outer(zscal, zscal) : allocMatrix: too many elements specified > length(x) [1] > length(x)^2 [1]
Quick comments on sample() sample() takes a vector and draws elements out of it > v = c("a","c","f","g","r") > sample(v,3) [1] "r" "f" "c" > sample(v,3,replace = T) [1] "c" "a" "a" > sample(v,6,replace = F) Error in sample(v, 6, replace = F) : cannot take a sample larger than the population when 'replace = FALSE‘ > sample(1:20,20) [1]
Sampling What’s wrong with this? co = correlog( x[sample(1:length(x),1000,replace = F)], y[sample(1:length(y),1000,replace = F)], z[sample(1:length(z),1000,replace = F)], increment = 10, resamp = 0, latlon = T,na.rm=T)
Sampling, the right way > index = sample(1:length(x),1000,replace = F) > co = correlog(x[index],y[index],z[index],increment = 100,resamp = 0, latlon = T,na.rm=T) > plot(co)
Autocorrelation significance Assessed through random permutation tests Reassign z values randomly to x and y coordinates Calculate correlogram Repeat many times Does the observed correlogram differ from random? > index = sample(1:length(x),1000,replace = F) > co = correlog(x[index],y[index],z[index],increment = 100,resamp = 100, latlon = T,na.rm=T) > plot(co) Downside – this is slow!
Significant deviation from random
Why is spatial dependence important? Classic statistical tests assume that data points are independent Can bias parameter estimates Leads to incorrect P value estimates A couple of methods to deal with this (next week!) Simultaneous autoregressive models Spatial filters
Just for fun – 3d surfaces R can render cool 3d surfaces Demonstrate live rgl.surface() (package rgl)