Spatial modelling an introduction

Slides:

Advertisements

Similar presentations

Introduction to modelling extremes

Advertisements

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.

Sampling and monitoring the environment Marian Scott Sept 2006.

Spatial point patterns and Geostatistics an introduction

Spatial point patterns and Geostatistics an introduction

Outline Geostatistics Areal unit data Spatial point processes

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.

Introduction to Smoothing and Spatial Regression

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.

Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models

Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11

Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.

Basic geostatistics Austin Troy.

10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.

STAT 497 APPLIED TIME SERIES ANALYSIS

Spatial Interpolation

University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.

Correlation and Autocorrelation

Deterministic Solutions Geostatistical Solutions

Spatial Interpolation

Applied Geostatistics

Deterministic Solutions Geostatistical Solutions

Bayesian kriging Instead of estimating the parameters, we put a prior distribution on them, and update the distribution using the data. Model: Prior: Posterior:

Why Geography is important.

Ordinary Kriging Process in ArcGIS

Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.

Statistical Tools for Environmental Problems NRCSE.

Applications in GIS (Kriging Interpolation)

Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5

Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.

Gaussian process modelling

Simple Linear Regression

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.

Chapter 11 – Kriging Kriging is a spatial prediction method of nice statistical properties: BLUE (“best linear unbiased estimator”). The method was first.

Using ESRI ArcGIS 9.3 Spatial Analyst

Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)

Spatial Statistics in Ecology: Area Data Lecture Four.

Ecosystems are: Hierarchically structured, Metastable, Far from equilibrium Spatial Relationships Theoretical Framework: “An Introduction to Applied Geostatistics“,

Gridding Daily Climate Variables for use in ENSEMBLES Malcolm Haylock, Climatic Research Unit Nynke Hofstra, Mark New, Phil Jones.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Explorations in Geostatistical Simulation Deven Barnett Spring 2010.

Geographic Information Science

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Spatial Statistics in Ecology: Continuous Data Lecture Three.

GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.

Spatial Interpolation III

Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.

Geo479/579: Geostatistics Ch4. Spatial Description.

Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.

Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.

Introduction to kriging: The Best Linear Unbiased Estimator (BLUE) for space/time mapping.

Chapter 8: Simple Linear Regression Yang Zhenlin.

Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,

Short course on space-time modeling Instructors: Peter Guttorp Johan Lindström Paul Sampson.

Interpolation and evaluation of probable Maximum Precipitation (PMP) patterns using different methods by: tarun gill.

More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.

Geostatistics GLY 560: GIS for Earth Scientists. 2/22/2016UB Geology GLY560: GIS Introduction Premise: One cannot obtain error-free estimates of unknowns.

Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)

Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.

INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.

CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.

Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.

Spatial statistics: Spatial Autocorrelation

Ch9 Random Function Models (II)

NRCSE 2. Covariances.

Paul D. Sampson Peter Guttorp

Interpolation & Contour Maps

The Examination of Residuals

Presentation transcript:

Spatial modelling an introduction Duncan Lee, Adrian Bowman and Marian Scott Enviornmental statistics course August 2008

Outline Spatial point processes Areal unit data Geostatistics Spatio-temporal modelling

1. Spatial point processes ‘A Spatial point process is a set of locations, irregularly distributed within a designated region and presumed to have been generated by some form of stochastic mechanism’ - Diggle (2003). A realisation from a spatial point process is termed a spatial point pattern – a countable collection of events at locations{ui}. Here the locations of the events {ui} are random and are the data, no other variable is collected!

Example 1

Example 2

Notation A spatial point process is defined for a region A. Sub-regions within A are denoted A1, A2,…… Single locations within A are denoted u1,u2,…… We denote by N(A), the random variable representing the number of events in the region A. Similar defnitions apply to N(Ak) and N(uk).

“Does the point pattern have any spatial dependence?” Question of interest “Does the point pattern have any spatial dependence?” Three general types of structure are possible. Complete spatial randomness (CSR). –events occur at random. Clustered process – events occur close to existing events. Regular process. – events occur away from existing events.

Complete Spatial Randomness CSR asserts that: (i) For any subregion Ak, N(Ak)~Poisson(|Ak|). (ii) For disjoint sub-regions (A1, A2) , N(A1) and N(A2) are independent.  is termed the intensity and is the expected number of events per unit of area, so that |A| is the expected number of events in A. A process satisfying (i) and (ii) is called a homogeneous Poisson process (with intensity ).

Mean and covariance For a CSR process N(A)~Poisson(|A|). Mean – Constant across A. Therefore at a single location u1,with area 1, the mean of N(u1) equals . Covariance – The spatial dependence of the process between two points (u1, u2) is determined by the second order intensity function 2 (u1, u2) . However the latter is hard to work with.

K function Instead of working with the second order intensity function 2 (u1, u2) to measure spatial dependence, we work with the K function K(t) = E{N0(t)} /  where N0(t) is the number of events within a distance t of an arbitrary event.

Why is the K-function useful? Recall that K(t) = E(n0 of events within t of an arbitrary event) /  For a CSR process - K(t) =  t2. For a clustered process we would expect more points close together than under CSR, so for small t, K(t) >  t2. For a regular process we would expect less points close together than under CSR, so for small t, K(t) <  t2.

Determining if CSR holds Step 1 - estimate the intensity  by hat = N(A)/|A|. Step 2 - estimate K(t) for a given distance t, by calculating the average number of events (over all points in the pattern) within distance t of that event. Step 3 – Plot the theoretical function for CSR, K(t) =  t2, against t, and add a second line for the estimated K function for the point process. If CSR is reasonable they will be very similar.

Some examples K(t) t

Further models for Spatial point processes If CSR does not hold for the data in question there are other models that can be used. For example Poisson cluster process – Models clusters. Inhomogeneous Poisson process – spatially varying intensity. Cox process – incorporating time-varying intensity. Inhibition process – models regular processes.

Another example

Implementing point process models Point process models (including CRS and others) can be implemented in R using the add on libraries spatstat Splancs For further details see http://lib.stat.cmu.edu/R/CRAN/.

2. Areal unit data The region of interest A is split into n non-overlapping sub-regions A1,…,An . The random variable of interest is only available as an aggregated average or total for each sub-region, and is represented by Z1,…,Zn . The sub-regions are fixed, and it is the variable being measured for each region that is random. In comparison, for Point processes no variable Z was measured, as it was the location of the event that was random.

Motivating example Lip cancer rates for the 56 counties in Scotland. Two possible questions of interest: Does any environmental variable effect the number of new cases? Is there an outbreak of lip cancer cases in any part of Scotland? Map taken from a paper by Wakefield from Biostatistics 2007, 158-183.

Modelling areal unit data When modelling areal unit data z1,….zn from sub-regions A1,…,An consider the following: Response distribution – normal, Poisson, binomial, etc. Regression variables – e.g. sunlight in the lip cancer example. Spatial dependence – are areas close together related? Method of analysis – frequentist or Bayesian methods.

Spatial dependence Spatial dependence quantifies how the values of z1,…,zn are related to each other. There are three general types of dependence. Independence - the values of z1,…zn are not related. Negative dependence – if areas i and j are close together then zi and zj will have different values. Positive dependence – if areas i and j are close together then zi and zj will have similar values.

Modelling positive dependence A common method for modelling positive dependence is based on a neighbourhood or weight matrix W. A matrix of 1’s and 0’s, where element ij is 1 if areas i and j are neighbours and 0 otherwise. Neighbours can be defined in many ways including: Areas sharing a common border. Areas less than a distance d apart. Area i is one of the closest areas in terms of distance to area j.

Conditional autoregressive (CAR) models For simplicity assume that z1,…zn are normally distributed and there are no covariates, then the CAR model is given by Zi|Z-i So the expected value of zi is equal to the mean of its neighbours, as ni is the number of neighbours of area i.

3. Geostatistical data For a fixed region A, the variable of interest could be measured at any location. However due to time/cost constraints it has only been measured at n locations u1,…, un , which are typically chosen and not random. The random variables measured at all n locations are denoted by Z(u1),…, Z(un) . Therefore this is different from Point processes where the locations are the random variable. Areal data where the variable can only be measured as n aggregated averages (or totals) for sub-regions A1,…, An.

Goals of geostatistics Given observations Z(u1),…, Z(un), there are three general goals of a geostatistical analysis. How best to model the data? How to estimate Z(u0) where u0 is an unobserved location? How to draw a map of Z(u) for all points u in the region.

Modelling geostatistical data When modelling geostatistical data consider the following: Response distribution – normal, Poisson, binomial, etc. Spatial trend – e.g. regression variables or other trends. Spatial dependence – how are areas close to each other related. Method of analysis – frequentist or Bayesian methods.

General geostatistical model A general model for data Z=(Z(u1),…, Z(un)), is Z = µ + S The data Z are assumed to be normally distributed. µ is the mean function and models spatial trend. S is a stochastic process and models spatial dependence.

Modelling spatial trend A spatial trend is a systematic change in the mean function µ over the area of interest. It is generally smooth, although it may change abruptly in response to environmental forcing variables (e.g., bedrock geology). It can be modelled in numerous ways. Regression variables such as geology. Polynomials in the co-ordinates u1…un. Modelled within the spatial dependence component S (non-stationary).

Spatial dependence For the remainder of this course we assume that any spatial trend has been removed by the mean function µ. We assume positive spatial dependence rather than negative, that is the closer two points are the more similar their values of the variable will be.

Modelling spatial dependence A common model for spatial dependence is S ~ N(0 , C) which implies the data are normally distributed. Here C is the variance-covariance matrix, and is a transformed correlation matrix. If all observations have the same variance, then to C=σ2V, where V is the correlation matrix. σ2 is the common variance of each observation.

Correlation matrix V The correlation matrix typically has the following characteristics. The diagonal elements equal 1, as they represent the correlation of an observation with itself. The ijth element of V is close to one if locations ui and uj are close. As locations ui and uj get further apart, the ijth element gets closer to zero. Negative dependence (i.e. negative values in V) is rarely seen in geostatistical data.

Simplifying V or C The covariance / correlation (spatial dependence) structure in the data can have two simplifying properties. Stationarity – The covariance (or correlation) between ui and uj only depends on their difference ui – uj. so the locations of the two points does not matter, only their distance and direction from each other. Isotropy – The covariance (or correlation) between ui and uj only depends on the magnitude of their difference ||ui – uj.||, so the locations of the two points does not matter, only their distance apart.

Assuming the spatial dependence is stationary and isotropic, the covariance function between 2 points Z(u) and Z(u + t) simplifies to a function of the scalar distance between the two points. Similarly the correlation function is given by Where σ2 is the variance and also denoted by C(0).

Semi-variogram modelling However in the geostatistical literature spatial dependence is modelled in terms of the semi-variogram γ(t) = 0.5Var(Z(u+t) – Z(u)) = C(0) – C(t) = σ2 - C(t) rather than the covariance function.

Estimating the semi-variogram The semi-variogram for data Z(u1),…, Z(un) can be estimated by calculating for any value of t. Here N(t) is the set of points (ui, uj) that are distance t apart. This function is called the empirical semi-variogram, and it can be plotted against t to see the general shape.

Alternatively, you could plot the semi-variogram cloud, which is a plot of against for all pairs of points. This form gives more than one value for each distance t, so it is a scatterplot.

What should a semi-variogram look like?

The nugget is the limiting value of the semi-variogram as the distance t approaches zero. It quantifies the amount of spatial variability at very small spatial scales (those less than the separation between observations) and also measurement error. The sill is the horizontal asymptote of the variogram, if it exists, and represents the overall variance of the random process. The range is the distance t* at which the semi-variogram reaches the sill. Pairs of points that are further apart than the sill are uncorrelated

But what about in practice? Sometimes the semi-variogram only approaches the sill asymptotically, and in this case we define the practical range as the lag t* at which γ(t) = 0.95* sill = 0.95* σ2

Modelling spatial dependence Spatial dependence in the data can now be modelled in two stages. Plot the empirical semi-variogram and determine which family of semi-variogram models it resembles. Estimate the parameters (sill, nugget, range) of the chosen semi-variogram model by least squares methods.

Semi-variogram models A number of semi-variogram models exist that can be used. Nugget - random data Spherical Exponential Although these models may not fit the data particularly well.

Spatial prediction Once a trend and spatial dependence model have been fitted, it is of interest to estimate Z at some unobserved location u0. There are many methods for doing this including: Regression modelling using generalised least squares. Inverse distance weighted interpolation. Kriging.

The majority of these approaches predict z The majority of these approaches predict z*(u0) the variable at location u0 by a weighted average of the form The main difference between the methods is how the weights are estimated. A map can then be produced by predicting the surface at a regular grid of points.

137Cs deposition maps in SW Scotland prepared by different European teams (ECCOMAGS, 2002)

Kriging 1 Ordinary Kriging First, the trend is estimated using least squares methods. Then the observed values can be de-trended by subtracting the estimated trend from the data. Finally a model for the variogram is fitted to the de-trended data and used to generate the weights for the prediction.

Kriging 2 There are a number of other kriging methods, such as block kriging, indicator kriging and co-kriging. Some interesting issues concern the uncertainty in the prediction. We can use the kriging procedure to produce uncertainty maps, and recent work has been to develop approaches to incorporate this uncertainty in the variogram model.

Kriging in R There are routines to do kriging in the R libraries:- geoR fields gstat sgeostat spatstat spatdat

Choosing the locations u1…un The desired set of locations depends on the goal of the analysis. Point prediction – Locate points on a regular grid so that all prediction locations will be highly correlated with a few observed data points. Average estimation – If the aim is to estimate the average value of Z over the region A, then correlated points provides redundant information. Therefore you want the distance between pairs of points to be roughly the variogram range.

4. Spatio-temporal statistical modelling Spatio-temporal statistical modelling is a real challenge because: usually very large data sets and one ‘dimension’ may be richer than the other lots of stations, limited measurement in time. few stations, monitored very frequently in time. need to combine the techniques found in time series and spatial analysis.

Modelling spatial and temporal dependence One major difficulty concerns how to jointly model correlation through time correlation over space Is correlation through space constant over time, and correlation through time constant over space? if yes, then we have a ‘separable’ and stationary process. if not, then we need to build a space-time correlation structure (hard work).

General approach The general approach to spatio-temporal models is through stochastic spatio-temporal processes Z(u,t) - where u represents space and t represents time which may be a combination of a spatial and a time series process.

Simplifying assumptions Stationarity – natural extension from time series and spatial models. Isotropy – natural extension from spatial models. Separability – The covariance function of Z(u,t) can be split into space and time parts, i.e. cov[Z(u1,, t1), Z(u2, t2)] = Cu(u1,u2)CT(t1,t2) which means we can use the tools we have met previously.

Spatial Analysis Across Time At each time point a plane across space was fitted and Gaussian Variograms of the residuals were computed. The average of the variogram parameters’ estimates were used to obtain the spatial covariance matrix .

non-separable processes Much harder problem, still the basis of much statistical research.