Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Data Analysis: Surfaces. Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution.

Similar presentations


Presentation on theme: "Spatial Data Analysis: Surfaces. Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution."— Presentation transcript:

1 Spatial Data Analysis: Surfaces

2 Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution Z i  e.g., homicides numbers are Poisson ( ,  ).  The main objective of the analysis is to estimate the joint distribution of random variables Z = {Z 1,…,Z n } Model of continuous spatial variation  All of the area is a continuous surface  The main objective is to estimate the distribution Z(x), x  A

3 Models of Discrete Spatial Variation Random variable in area i n° of ill people n° of newborn babies per capita income

4 Models of Continuous Spatial Variation Sampling stations in locations marked by Temperature, Water ph, soil acidity... Location to predict value: shown as

5 Samples superfície contínua / grade Polygon data X,Y,Z Sample generation geoestatistics From Areas to Surfaces

6 Space as a planar subdivision

7 From Areas to Surfaces Space as a continuos surface Space as a planar subdivision

8 From Areas to Surfaces

9 Geostatistics Applicable to spatial distributions (fields) Typical situation  interpolation from field samples Water Availabilty Index Estimated Surface Estimated Uncertainty

10 What is Geostatistics? Analysis and inference of continuously-distributed variables  Pollution, Zync concentration, infant mortality rate Analysis  Describing the spatial variability of the phenomenon under study estudar ou descrever Inference  Estimating the unknown values Study area Field Samples Inferences

11 Why Geostatistics ? Techniques appropriate to statistical estimation of spatial phenomena Study area Deterministic Procedures G e o e s t a t i s t i c s Field samples

12 Thinking spatially Z 1 ~ N(  1,  1) Z 2 ~ N (  2,  2) How are they distributed? How are they related to each other? How can I infer a distribution from one sample?  1 =  2  1 =  2 corr(Z 1, Z 2 ) = f(h)

13 RESULTS Steps of the Geostatistical Process Exploratory Analysis Exploratory AnalysisStructuralAnalysisStructuralAnalysis Inference and Interpolation Interpolation DATA

14 Concept of a Regionalized Variable Regionalized Variable = structure + randomness Structure  Global distribution of natural phenomena  Average value of a phenomena in a given area is constant Random  Local variation within a given area  Values fluctuate around a mean + - Escala de poluição Zona A Zona B Área Poluída

15 Regionalized Variable Z(x) = m(x) +  (x) +  m(x): structural component (constant mean value)  (x): random component, spatially variant around m(x)  : uncorrelated random noise Zona A Zona B m(x)  ”(x) ’’’’

16 Geostatistics Each position on the field is a random variable  E : extent of the field   u  E, Z(u) is a random variable Each measurement is a realization of a random variable  Let z(u 1 ),...z(u n ) be the set of measures  Then, z(u  ) is a realization of Z(u  ),  = 1,..,n Problem  How can we estimate the joint distribution?

17 Var{ Z(u+h) – Z(u)} = 2  (h) Uncertainty: the Statistical Approach Basic hypothesis: Difference in values are similar for similar distances We call this a “stationary” spatial process We can find the “structure” of a stationary spatial process using a very simple technique  The variogram

18 is the number of pairs of samples separated by EXPERIMENTAL SEMIVARIOGRAM

19 Building the Experimental Semivariogram Step 1 (optional): Transforming area maps in samples

20 h h h h h h h  Vetor distância h Building the Experimental Semivariogram Step 2 : Measuring spatial variation For each pair Z(x) and Z(x+h), sepated by a distance h, we measure the square of the difference between them

21 VARIOGRAMAS DO I.D.H.

22 Spatial Model Fitting for Variograms After building an experimental variogram, we need to fit a theoretical function in order to model the spatial variation The adjustment procedure is interactive, where the user selects the theoretical model that best fits his data. Some useful models:  Gaussian, Exponential, Spherical models

23 Fitting the Semivariogram Experimental Theoretical  (h) h Range Sill Nugget Effect

24 Plotting the variogram

25 Analysing the variogram Later we will look at fitting a model to the variogram; but even without a model we can notice some features, which we define here only qualitatively:  Sill: maximum semi-variance; represents variability in the absence of spatial dependence  Range: separation between point-pairs at which the sill is reached; distance at which there is no evidence of spatial dependence  Nugget: semi-variance as the separation approaches zero; represents variability at a point that can’t be explained by spatial structure. In the previous slide, we can estimate the sill  1.9, the range  1200 m, and the nugget  0.5 i.e. 25% of the sill.

26 Using the experimental variogram to model the random process Notice that the semivariance of the separation vector  (h) is now given as the estimate of covariance in the spatial field. So it models the spatially-correlated component of the regionalized variable We must go from the experimental variogram to a variogram model in order to be able to model the random process at any separation.

27 Modelling the variogram From the empirical variogram we now derive a variogram model which expresses semivariance as a function of separation vector. It allows us to: Infer the characteristics of the underlying process from the functional form and its parameters; Compute the semi-variance between any point-pair, separated by any vector; Interpolate between sample points using an optimal interpolator (“kriging”)

28 “Authorized” Models Any variogram function must be able to model the following: 1. Monotonically increasing  possibly with a fluctuation (hole) 2. Constant or asymptotic maximum (sill) 3. Non-negative intercept (nugget) 4. Anisotropy Variograms must obey mathematical constraints so that the resulting kriging equations are solvable (e.g., positive definite between-sample covariance matrices). The permitted functions are called authorized models.

29 Spherical Model CoCoCoCo h  C1C1C1C1 C = C o + C 1 a Sill

30 Exponential Model CoCoCoCo h a  C1C1C1C1

31 Gaussian Model CoCoCoCo h a  C1C1C1C1

32 What sample size to fit a variogram model? Can’t use non-spatial formulas for sample size, because spatial samples are correlated, and each sample is used multiple times in the variogram estimate  No way to estimate the true error, since we have only one realisation  Stochastic simulation from an assumed true variogram suggests:  < 50 points: not at all reliable  100 to 150 points: more or less acceptable  > 250 points: almost certaintly reliable More points are needed to estimate an anisotropic variogram. This is very worrying for many environmental datasets (soil cores, vegetation plots,... ) especially from short-term fieldwork, where sample sizes of 40 – 60 are typical. Should variograms even be attempted on such small samples?

33 Cross Validation Re-estimate the samples to find errors in the model – Error Statistics – Error Histogram – Erro Spatial diagram – observed x estimated value OK? Variogram Model Yes NO ? ? ? ? ? 1 2 34 5

34 Cross Validation

35 Approaches to spatial prediction This is the prediction of the value of some variable at an unsampled point, based on the values at the sampled points. This is often called interpolation, but strictly speaking that is only for points that are geographically inside the sample set (otherwise it is extrapolation.

36 Approaches to prediction: Local predictors Value of the variable is predicted from “nearby” samples  Example: concentrations of soil constituents (e.g. salts, pollutants)  Example: vegetation density

37 Local Predictors Each interpolator has its own assumptions, i.e. theory of spatial variability  Nearest neighbour  Average within a radius  Average of n nearest neighbours  Distance-weighted average within a radius  Distance-weighted average of n nearest neighbours  Optimal” weighting -> Kriging

38 Ordinary Kriging The theory of regionalised variables leads to an “optimal” interpolation method, in the sense that the prediction variance is minimized. This is based on the theory of random functions, and requires certain assumptions.

39 Optimal local interpolation: motivation Problems with average-in-circle methods:  1. No objective way to select radius of circle or number of points Problems with inverse-distance methods:  1. How to choose power (inverse, inverse squared... )?  2. How to choose limiting radius? In both cases:  1. Uneven distribution of samples could over– or under– emphasize some parts of the field  2. prediction error must be estimated from a separate validation dataset

40 An “optimal” local predictor would have these features: Prediction is made as a linear combination of known data values (a weighted average). Prediction is unbiased and exact at known points Points closer to the point to be predicted have larger weights Clusters of points “reduce to” single equivalent points, i.e., over-sampling in a small area can’t bias result Closer sample points “mask” further ones in the same direction Error estimate is based only on the sample configuration, not the data values Prediction error should be as small as possible.

41 Kriging A “Best Linear Unbiased Predictor” (BLUP) that satisfies certain criteria for optimality. It is only “optimal” with respect to the chosen model! Based on the theory of random processes, with covariances depending only on separation (i.e. a variogram model) Theory developed several times (Kolmogorov 1930’s, Wiener 1949) but current practise dates back to Matheron (1963), formalizing the practical work of the mining engineer D G Krige (RSA).

42 How do we use Kriging? 1. Sample, preferably at different resolutions 2. Calculate the experimental variogram 3. Model the variogram with one or more authorized functions 4. Apply the kriging system, with the variogram model of spatial dependence, at each point to be predicted  Predictions are often at each point on a regular grid (e.g. a raster map)  These ‘points’ are actually blocks the size of the sampling support  Can also predict in blocks larger than the original support 5. Calculate the error of each prediction; this is based only on the sample point locations, not their data values.

43 Prediction with Ordinary Kriging (OK) In OK, we model the value of variable z at location si as the sum of a regional mean m and a spatially- correlated random component e(si): Z(s i ) = m+e(s i ) The regional mean m is estimated from the sample, but not as the simple average, because there is spatial dependence. It is implicit in the OK system.

44 Prediction with Ordinary Kriging (OK) Predict at points, with unknown mean (which must also be estimated) and no trend Each point x is predicted as the weighted average of the values at all samples The weights assigned to each sample point sum to 1 Therefore, the prediction is unbiased “Ordinary”: no trend or strata; regional mean must be estimated from sample ?           i ii * λ ZλZ n 1i 0 x x

45 Simple and Ordinary Kriging Linear combination of nearest neighbours x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4 x0x0x0x0 Inverse Distance Weights Kriging Local Means 2 1 d λ ZλZ i ii * n 1i 0           x x ?           i ii * λ ZλZ n 1i 0 x x

46 x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4 x0x0x0x0 Variogram analysis 1 Variogram adjustment 2 4 Kriging estimator Modelo de ajuste do semivariograma 3 Ordinary Kriging

47 = Substituting the values we find the weightsSubstituting the values we find the weights Kriging estimator:Kriging estimator: Variance Variance Covariance matrix elements Covariance matrix elements Ordinary Kriging )(γCC)()C(C 1 0 ij hh0    γ    : n  1 CC.........C1 CC.........C1 :::: CC.........C1 11.........10 11121n 21222n n1n2nn C C : C 1 10 20 n0          ii * ZλZ n 1i 0 x x  kλ T 1 0 2 ko CC   σ

48 Estimator: 5050 x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4 x0x0x0x0 Matrix elements: C ij = C 0 + C 1 -  (h)Matrix elements: C ij = C 0 + C 1 -  (h) Modelo Teórico C 12 = C 21 = C 04 = C 0 + C 1 -  (50 2) = 9,84 = (2+20) - Kriging example

49 C 14 = C 41 = C 02 = (C 0 + C 1 ) -   V  (100) 2 + (50) 2 ] = 4,98 50 50 x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4 x0x0x0x0 C 13 = C 31 = (C 0 + C 1 ) -   V  (150) 2 + (50) 2 ] = 1,23 C 23 = C 32 = (C 0 + C 1 ) -   V  (100) 2 + (100) 2 ] = 2,33 C 24 = C 42 = (C 0 + C 1 ) -   V  (100) 2 + (150) 2 ] = 0,29 C 34 = C 43 = (C 0 + C 1 ) -   V  (200) 2 + (50) 2 ] = 0 C 01 = (C 0 + C 1 ) -   (50) = 12,66 C 01 = (C 0 + C 1 ) -   (50) = 12,66 C 03 = (C 0 + C 1 ) -   (150) = 1,72 C 03 = (C 0 + C 1 ) -   (150) = 1,72 C 11 = C 22 = C 33 = C 44 = (C 0 + C 1 ) -   (0) = 22 C 11 = C 22 = C 33 = C 44 = (C 0 + C 1 ) -   (0) = 22 Kriging example

50 50 50 x1x1x1x1 x2x2x2x2 x3x3x3x3 x4x4x4x4 x0x0x0x0 Substituting the values C ij, we find the following weights: The estimator is 1 = 0,518 2 = 0,022 3 = 0,089 4 = 0,371 1 = 0,518 2 = 0,022 3 = 0,089 4 = 0,371 0,518 z(x 1 ) + 0,022 z(x 2 ) + 0,089 z(x 3 ) + 0,371 z(x 4 ) Kriging example  * x o Z

51 Sampling configurations There is no agreement on a “universally” optimal sampling configuration for geostatistical research (i.e., variogram modelling, followed by spatial prediction), but: for spatial prediction, regular (lattice, or triangular) sampling is optimal (in case of isotropy; otherwise stretched lattices); for variogram modelling, all distances should be present, including sufficient information about short distances (which are not present when sampling regularly) cross validation on a regular sampling grid will not reveal deficiencies in modelled short distance behaviour of the variogram; interpolated maps will be dominated by this short distance variogram behaviour. compromise: most effort put to regular spread, sufficient effort to short distance replicates. related questions: adding sampling points to an existing design, or reducing (“optimizing”) an existing monitoring network.

52 Questions about kriging what do sill, nugget, range, and anisotropy tell about spatial variability of an observed variable? what happens if we predict a value at an observation location? what does the prediction variance measure? why is the interpolator discontinuous at observation locations when the nugget is positive? why is the prediction variance pattern independent on data, but only dependent on data configuration? what are the causes for positive nugget effect?

53 H.D.I. – human development index (UN) H.D.I.= longevity + education + income (0 < HDI < 1) 3 Spatial Indices

54 HDI – From Areas to Surfaces

55 HDI Variograms

56 Human Development Index in São Paulo IDH = 0 HDI= 1

57 1996 1999 Estimate of homicide rates using ordinary kriging Trend Surfaces for Homicide Rates in São Paulo

58 19961999 Trend Surfaces for Homicide Rates : Binomial Kriging

59 Krigeagem Ordinária Krigeagem Binomial Binomial x Ordinary Kriging - 1996

60 Krigeagem Ordinária Krigeagem Binomial Binomial x Ordinary Kriging - 1999

61 Practical Example Analise of Apgar values in newborn by buroughs, Rio de Janeiro, 1994. Apgar index  Vitality of newborn baby in first and fifth minute after birth  Respiration, heartbeat, response to stimula Sample of 152 georeferenced samples. Thematic classification  High: 77,4 a 83,3  Medium High: 74,4 a 77,4  Average: 69,5 a 74,4  Medium Low: 63,4 a 69,5  Low: 44,1 a 63,4

62 Bairros Excluídos Bairros do Municipio do Rio de Janeiro Practical Example

63 Exploratory Data Analysis

64 Omnidirecional 45 o 90 o 135 o Semivariogramas Modelo de Ajuste Modelo de Ajuste Tipo: Gaussiano Efeito Pepita = 16 Contribuição = 128 Alcance = 32000 Spatial Correlation Analysis

65 Spatial Variability of the APGAR index Kriging variance - + - + Kriging results

66 Areal data grouped By quintiles 44,1 a 63,4 66,4 a 69,5 69,5 a 74,4 74,4 a 77,4 77,4 a 83,3 Excluded - + Comparison


Download ppt "Spatial Data Analysis: Surfaces. Model-Driven Approaches Model of discrete spatial variation  Each subregion is described by is a statistical distribution."

Similar presentations


Ads by Google