Introduction Statistical options tend to be limited in most GIS applications. This is likely to be redressed in the future. We will look at spatial statistics.

Slides:

Advertisements

Similar presentations

Spatial point patterns and Geostatistics an introduction

Advertisements

Spatial point patterns and Geostatistics an introduction

Assumptions underlying regression analysis

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Parametric/Nonparametric Tests. Chi-Square Test It is a technique through the use of which it is possible for all researchers to:  test the goodness.

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.

Workshop 2: Spatial scale and dependence in biogeographical patterns Objective: introduce the following fundamental concepts on spatial data analysis:

GIS and Spatial Statistics: Methods and Applications in Public Health

Variance and covariance M contains the mean Sums of squares General additive models.

The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.

Correlation and Autocorrelation

9. SIMPLE LINEAR REGESSION AND CORRELATION

Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.

Applied Geostatistics

Mixed models Various types of models and their relation

Chapter 11 Multiple Regression.

SA basics Lack of independence for nearby obs

3-1 Introduction Experiment Random Random experiment.

Linear and generalised linear models

Why Geography is important.

Ordinary Kriging Process in ArcGIS

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.

Statistical Tools for Environmental Problems NRCSE.

Applications in GIS (Kriging Interpolation)

Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5

Lecture II-2: Probability Review

Introduction to Regression Analysis, Chapter 13,

Correlation & Regression

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Objectives of Multiple Regression

The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.

بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

Geographic Information Science

Examining Relationships in Quantitative Research

Spatial Statistics in Ecology: Continuous Data Lecture Three.

Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.

Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.

Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.

Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.

Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.

Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.

Lecture 6: Point Interpolation

Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.

Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Review of statistical modeling and probability theory Alan Moses ML4bio.

Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.

Nonparametric Statistics

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.

METU, GGIT 538 CHAPTER II REVIEW OF BASIC STATISTICAL CONCEPTS.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Estimating standard error using bootstrap

Inference about the slope parameter and correlation

Nonparametric Statistics

Spatial statistics: Spatial Autocorrelation

Chapter 5 Part B: Spatial Autocorrelation and regression modelling.

Spatial Analysis Longley et al..

Nonparametric Statistics

Spatial Data Analysis: Intro to Spatial Statistical Concepts

The Examination of Residuals

Concepts and Applications of Kriging

Modeling Spatial Phenomena

Presentation transcript:

Introduction Statistical options tend to be limited in most GIS applications. This is likely to be redressed in the future. We will look at spatial statistics in general terms, and conclude with a review of the software available.

Basic Concepts Spatial statistics differ from ‘ordinary’ statistics by the inclusion of locational properties. This makes spatial statistics more complex. The book by Bailey and Gatrell (1995) provides an accessible introduction. They identify four categories: –Point pattern data; –Spatially continuous data; –Areal data; and –Interaction data. Obvious correspondence with conceptual models.

Scale Levels Attribute data can be classified by measurement scale: –Nominal: e.g. 1=females, 2=males. –Ordinal: e.g. 1=good, 2=medium, 3=poor. –Interval (+ ratio): e.g. degrees Centigrade, percentages. Bailey and Gatrell classify techniques by purpose: –Visualisation –Exploration –Modelling – this is involved in all statistical inference and hypothesis testing)

Random Variables Statistical models deals with phenomena that are stochastic (i.e. are subject to uncertainty). A random variable Y has values that are subject to uncertainty (but may not necessarily be random). The distribution of possible values is referred to as the probability distribution. Represented by a function f Y (y) Random variables may be discrete or continuous.

Probabilities Probability that y is between a and b is given by: if Y is discrete i f Y is continuous (probability density) Cumulative probability (or distribution function) F Y is given by: if Y is discrete if Y is continuous

Expected Values The expected value of Y is its mean E(Y): or The expected value of a function of Y, say g(Y) is : or Variance is: VAR(Y) =  ([Y - E(Y)] 2 ) The square root of this is the standard deviation (  Y )

Joint Probability Can generalise to situations where there is more than one random variable. Joint probability distribution (or density): f XY (x,y) Covariance: COV(X,Y) =  ((X - E(X)).(Y - E(Y))) Correlation:  X,Y = COV(X,Y) /  X.  y Independence: Neither variable affects the other. Joint probability is product of individual probabilities: f XY (x,y)=f X (x).f Y (y)

Statistical Models A statistical model specifies the probability distribution for the phenomenon being modelled. If modelling ozone levels in a region R we would have a probability distribution for each location s (where s is a 2x1 vector of x,y coordinate pairs). Individual points can be referred to as s 1, s 2 etc. The complete set of random variables may be referred to as a spatial stochastic process. The probability distribution for near points will probably be more similar than for distant points, so our random variables will probably not be independent.

Specifying Models To specify a model we need to specify its probability distribution. For the ozone model we would need to specify the joint distribution of every possible subset of random variables. For a fair die: f Y (y) = 1/6 For more complex models (e.g. ozone) we can use observed data: (y 1, y 2, …) These data are a realistion – i.e. one outcome from the joint probability distribution {Y 1, Y 2, …} One set of data does not get us very far. Even with more data observations we must make reasonable assumptions, based either on theory or prior observations.

Specifying Models(2) Assumptions may be expressed in general terms (e.g. a Normal distribution, a regression model) with unspecified parameters. The model can be fitted using observed data to estimate the parameters. After evaluating the model we may decide to change its general form.

A Regression Model To illustrate, to model our ozone data we might make the following assumptions: –The random variables {Y(s), s  R} are independent; –They have the same distribution, but different means; –Their means are a simple linear function of location, say E(Y(s)) =  0 +  1 s 1 +  2 s 2 ; –Each Y(s) has a normal distribution about this mean with the same variance  2. These assumptions would enable us to estimate the parameters from the available data.

Maximum Likelihood Most frequently used method is maximum likelihood. We can write down the general form of the joint probability distribution e.g. f(y 1,y 2, … y n ;  ) where  is a vector of parameters - (  0,  1,  2,  2 ) in our regression model. Given that we have actual values for y 1 … y n, this joint probability distribution is the probability of getting these actual values. This is referred to as the likelihood and would usually be denoted L(y 1, y 2, … y n ;  ). Our objective is to identify the parameter values  that maximise L. In practice we usually maximise the logarithm of L (log likelihood) denoted l(y 1, y 2, … y n ;  ).

Parameter Estimation This is the basic approach, but the actual estimation may be complicated. Parameter estimation of our multiple linear regression involving assumptions of independence, normal distributions and equal variance reduces to using the method of ordinary least squares. Relaxing the independence and equal variance, we can still use generalised least squares. Standard errors provide a measure of the reliability of each parameter estimate. Likelihood ratios can be used to compare alternative models.

Hypothesis Testing Hypothesis testing entails comparing the fit of two models, one of which incorporates assumptions which reflect the hypothesis, the other incorporating a less specific set of assumptions. All modelling inevitably involves some assumptions about the phenomenon under study; hence hypothesis testing will always involve comparison of the fit of a hypothesised model with that of an alternative which also incorporates assumptions, albeit of a more general nature.

Spatial Data Modelling Spatial data often exhibit spatial correlation (or autocorrelation). Assumptions of independence may therefore be unrealistic. Can make a distinction between: –First order effects: variation in the mean due to global trend; –Second order effects: caused by spatial correlation. Can illustrate using analogy of iron filings and magnets. Real-world patterns are often an outcome of a mix of first and second order effects.

Spatial Data Modelling(2) To allow for second order effects, spatial models may need to assume a covariance structure. The second order effects may be modelled as a stationary spatial process – i.e. –Its statistical properties (mean, variance) are independent of absolute location; –Covariance depends only on relative location. A process is said to be isotropic if it is stationary, and covariance depends only on distance and not direction. If the mean, variance or covariance ‘drifts’ over the study area, then the process exhibits non-stationarity or heterogeneity.

Spatial Data Modelling(3) Heterogeneity in the mean, combined with stationarity in second order effects, is a useful spatial modelling assumption. The modelling of a spatial process often tends to proceed by first identifying any heterogeneous 'trend' in mean value and then modelling the 'residuals', or deviations from this 'trend', as a stationary process.

Geographically Weighted Regression Covariates are often incorporated in a multiple regression model taking the general form: The model assumes the coefficients are homogeneous or stationary. Fotheringham et al. proposed an alternative model: To allow the model to be fitted, it is assumed the parameters are non-stationary but are a function of location. Parameters can be mapped.

Point Pattern Techniques Bailey and Gatrell discuss various techniques, organised by data type. Point pattern techniques include: –Quadrat analysis –Kernel estimation –Nearest neighbour analysis –K-functions Normally used to test null hypothesis of complete spatial randomness (i.e. homogeneous Poisson process), but can also examine heterogeneous Poisson processes.

Spatially Continous Data Techniques used to explore field data. Sometimes referred to as geostatistics. –Spatial moving averages –Trend surface analysis –Delauney triangulation / Thiesen polygons / TINs –Kernel estimation (for the values at sample points) –Variograms / covariograms / kriging –Principal components analysis / factor analysis –Procrustes analysis –Cluster analysis –Canonical correlation

Area Data Techniques for analysing areal data (i.e. polygon attributes) include: –Spatial moving averages –Kernel estimation –Spatial autocorrelation (Moran’s I, Geary’s c) –Spatial correlation and regression Generalised linear models provide a family of techniques for dealing with special types of data: e.g. counts (Poisson regression), proportions (logistic regression). Bayesian techniques often used to model rates based on small numbers.

Spatial Interaction Data Techniques for modelling spatial interactions are most based on some variant of the gravity model. This postulates that the amount of interaction between two places is a function of their sizes (measured using an appropriate metric) and is inversely related to the distance between them.

Software ArcGIS. Geostatistical Analyst a step forward. Idrisi. GIS Analysis | Statistics menu has a lot of options. S-Plus. The S+SpatialStats addon provides a lot of options. R. R is an open-source version of S-Plus. There are a number of projects currently developing tools for spatial statistics (e.g. sp, spatstat, DCluster, spgwr). BUGS. Software for Bayesian statistics. There is a free version for Windows (WinBUGS). Includes a spatial subset called GeoBUGS.