Edoardo PIZZOLI, Chiara PICCINI NTTS 2013 - New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION.

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

Transformations & Data Cleaning
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Basic geostatistics Austin Troy.
How Many Samples are Enough? Theoretical Determination of the Critical Sampling Density for a Greek Clay Quarry. by K. Modis and S. Stavrou, Nat. Tech.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
GIS and Spatial Statistics: Methods and Applications in Public Health
Visual Recognition Tutorial
x – independent variable (input)
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
Deterministic Solutions Geostatistical Solutions
Spatial Interpolation
Concept Course on Spatial Dr. A.K.M. Saiful Islam Developing ground water level map for Dinajpur district, Bangladesh using geo-statistical analyst.
Applied Geostatistics
Geo-statistical Analysis
Deterministic Solutions Geostatistical Solutions
Lecture 4. Interpolating environmental datasets
Why Geography is important.
Esri UC 2014 | Technical Workshop | Creating Surfaces Steve Kopp Steve Lynch.
Ordinary Kriging Process in ArcGIS
Geostatistics Mike Goodchild. Spatial interpolation n A field –variable is interval/ratio –z = f(x,y) –sampled at a set of points n How to estimate/guess.
Business Statistics - QBM117 Statistical inference for regression.
Applications in GIS (Kriging Interpolation)
Method of Soil Analysis 1. 5 Geostatistics Introduction 1. 5
Geostatistic Analysis
Chapter 9 Statistical Data Analysis
Spatial Interpolation
Weed mapping tools and practical approaches – a review Prague February 2014 Weed mapping tools and practical approaches – a review Prague February 2014.
Using ESRI ArcGIS 9.3 Spatial Analyst
$88.65 $ $22.05/A profit increase Improving Wheat Profits Eakly, OK Irrigated, Behind Cotton.
Interpolation.
Intro. To GIS Lecture 9 Terrain Analysis April 24 th, 2013.
Interpolation Tools. Lesson 5 overview  Concepts  Sampling methods  Creating continuous surfaces  Interpolation  Density surfaces in GIS  Interpolators.
Explorations in Geostatistical Simulation Deven Barnett Spring 2010.
Geographic Information Science
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Geo479/579: Geostatistics Ch16. Modeling the Sample Variogram.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Spatial Interpolation III
Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Geo479/579: Geostatistics Ch4. Spatial Description.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Concepts and Applications of Kriging
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Grid-based Map Analysis Techniques and Modeling Workshop
Esri UC 2014 | Technical Workshop | Concepts and Applications of Kriging Eric Krause Konstantin Krivoruchko.
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Lecture 6: Point Interpolation
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Interpolation and evaluation of probable Maximum Precipitation (PMP) patterns using different methods by: tarun gill.
Esri UC2013. Technical Workshop. Technical Workshop 2013 Esri International User Conference July 8–12, 2013 | San Diego, California Concepts and Applications.
Geostatistics GLY 560: GIS for Earth Scientists. 2/22/2016UB Geology GLY560: GIS Introduction Premise: One cannot obtain error-free estimates of unknowns.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (2)
ISRIC Spring School – Hand’s on Global Soil Information Facilities, 9-13 May 2016 Uncertainty quantification and propagation Gerard Heuvelink.
Fundamentals of Data Analysis Lecture 10 Correlation and regression.
Spatial Analysis & Dissemination of Census Data
Creating Surfaces Steve Kopp Steve Lynch.
Lecture 6 Implementing Spatial Analysis
Inference for Geostatistical Data: Kriging for Spatial Interpolation
Paul D. Sampson Peter Guttorp
Process Capability.
Interpolation & Contour Maps
Spatial interpolation
Concepts and Applications of Kriging
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Interpolating Surfaces
Empirical Bayesian Kriging and EBK Regression Prediction – Robust Kriging as Geoprocessing Tools Eric Krause.
Concepts and Applications of Kriging
Presentation transcript:

Edoardo PIZZOLI, Chiara PICCINI NTTS New Techniques and Technologies for Statistics SPATIAL DATA REPRESENTATION: AN IMPROVEMENT OF STATISTICAL DISSEMINATION FOR POLICY ANALYSIS Brussels, 5-7 March 2013

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Hypothesis Statistical units bring information on the territory they belong to: their individual characteristics are proxies of the territorial characteristics. Available data at the statistical units level can be used to expand information on administrative areas as soon as there is empirically a space correlation in the studied phenomena. If such a statistical relationship exists among a set of units with respect to a specific characteristic, a spatial analysis is possible. A representation on a map or cartogram will be always considered to represent in space the phenomena under investigation. Introduction/1

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Geostatistics Usually applied for the analysis of natural phenomena. The main assumptions are: point values are influenced by the values in nearest points (spatial autocorrelation) and the phenomenon is continuously distributed over the territory. Socio-economic variables, discrete by nature, can be elaborated by geostatistical methods assuming both that their values are spatially dependent, and that the analyzed phenomenon is continuously distributed over the territory. The basic hypothesis is that the analyzed units can represent measurement points of phenomena distributed over the whole territory. The graphical result is certainly better than a thematic cartogram which forces the data into administrative boundaries, often arbitrarily set. Introduction/2

Spatial data each data value is associated with a location in space and there is at least an implied connection between the location and the data value Spatial autocorrelation and geostatistics The spatial autocorrelation is defined as the variation of a property within a geo-space: characteristics at proximal locations appear to be correlated, either positively or negatively. Spatial autocorrelation is the matter of geostatistics. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

Spatial prediction models (algorithms) Arbitrary or empirical models. No estimate of model error. e.g. Thiessen polygons, Inverse distance weighting, Trend surfaces, Splines. More primitive and often suboptimal; in some situations they can perform as good as the statistical methods or even better. Model parameters estimated in an objective way. Estimate of the prediction error available. e.g. Kriging, Environmental correlation, Bayesian-based models, mixed models. Input dataset usually need to satisfy strict statistical assumptions. Statistical Deterministic Geostatistical modelling NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

Basic steps of geostatistical analysis: 1. estimation of semivariogram 2. estimation of the parameters of the semivariogram model 3. estimation of the surface. A validation of the estimation can be added. Kriging algorithm is an optimal interpolator - generates best linear unbiased estimate at each location, employing semivariogram model. The most commonly used kriging algorithm is the Ordinary Kriging (OK). A normal distribution of the data is usually a prerequisite for the application of geostatistics: OK may give unacceptable results if the data are severely non-normal. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

Indicator kriging (IK) is another geostatistical approach to geospatial modeling, which makes no assumption of normality and is essentially a non-parametric counterpart to OK. IK uses indicator (0 or 1) variables to generate probabilities that a critical value was exceeded or not at each location in the study area, and then proceeds the same as OK. If a threshold is used to create the indicator variable, the resulting interpolation map would show the probabilities of exceeding (or being below) the threshold. This allows to produce probability maps and risk maps. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

The problem of outliers Outliers may provide some useful information concerning the magnitude of the phenomenon; at the same time may unduly influence the results of the analysis. Possible solutions:  logarithmic transformation;  winsorization;  trimming. A logarithmic transformation allows to approach a normal distribution, but on the other hand flattens out the data, completely loosing the information brought by outliers. In a trimmed dataset, the extreme values are discarded; in a Winsorized dataset, the extreme values are instead replaced by other values. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

Data from ASIA-Agricoltura (year 2009). Process of representing data is as follows: 1.specification of the territorial administrative level of interest; 2.normalization of the units’ addresses (data quality control); 3.assignment of geographical coordinates to units, starting from addresses; 4.correction of errors on coordinates; 5.mapping point location, visualizing their spatial distribution; 6.variographic analysis of data; 7.estimation in non-sampled points by means of the appropriate algorithm; 8.mapping estimated values over the territory; 9.cross-validation to test the accuracy of the estimation. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Application example - the Province of Palermo

Results/1 Revenues/number of employees ratio (R/N) NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 a) original dataset b) logarithmic transformation c) winsorized dataset d) trimmed dataset

Results/2 Deterministic interpolator: the Radial Basis Function (RBF) NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013

The semivariogram is a function that describes the differences between samples separated by varying distances. In both cases, the spatial autocorrelation is weak, but a model can be adjusted to the point semivariograms, Spherical for R/N and Exponential for log(R/N). Their parameters can be used in the OK algorithm to estimate maps. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Variogram modelling a) R/N b) log(R/N)

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/1 Estimated map of R/N by OK Prediction error map of R/N by OK

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/2 Estimated map of log(R/N) by OK Prediction error map of log(R/N) by OK Contour lines arrangement looks very flattened out towards low values.

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/3 Estimated map of R/N by OK (winsorized dataset) Prediction error map of R/N by OK (winsorized dataset) The value of the outlier has been lowered.

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Estimated maps/4 Estimated map of R/N by OK (trimmed dataset) Prediction error map of R/N by OK (trimmed dataset) Deleting the outlier causes a loss of information. The estimation error is high everywhere.

The variable R/N was transformed in an indicator variable using as threshold the value 26 (proposed by the software). The spatial autocorrelation is almost absent, but an Exponential model can be adjusted to the point semivariogram. Its parameters are used in the Kriging algorithm to estimate a probability map. NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Indicator Kriging/1 Indicator variable

NTTS 2013 New Techniques and Technologies for Statistics Brussels, 5-7 March 2013 Probability of exceeding the threshold Prediction error map Indicator Kriging/2

Estimated maps of socio-economic indicators, based on available micro-data at the statistical units level, represent a clear enhancement on the dissemination of statistical information. Visualizing statistical information on a map is not just a further dimension - the space - added to the data, but an improvement of socio-economic information linking them to the geographical characteristics. Maps are useful to identify areas of policy intervention, to plan and evaluate actions, to perform simulations and to get future scenarios. The error map is a further improvement, showing the limitations and the reliability of the statistical information available. The choice of the interpolation method depends essentially upon the type of available data and upon the objective of the elaboration. Brussels, 5-7 March 2013 Conclusioni

Brussels, 5-7 March 2013 Thank you for your attention