A Comparison of Zonal Smoothing Techniques Prof. Chris Brunsdon Dept. of Geography University of Leicester

Slides:



Advertisements
Similar presentations
Autocorrelation and Heteroskedasticity
Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Andrew Smith Denoising UK house prices 14th April 2010.
Decomposition Method.
Linear Regression.
Forecasting OPS 370.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Statistical approaches for detecting clusters of disease. Feb. 26, 2013 Thomas Talbot New York State Department of Health Bureau of Environmental and Occupational.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Models with Discrete Dependent Variables
GIS and Spatial Statistics: Methods and Applications in Public Health
1 Lecture 8 Regression: Relationships between continuous variables Slides available from Statistics & SPSS page of Social.
Section 4.2 Fitting Curves and Surfaces by Least Squares.
Correlation and Autocorrelation
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.
Spatial Interpolation
MAE 552 Heuristic Optimization
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Chapter 3 Forecasting McGraw-Hill/Irwin
ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011
Why Geography is important.
Forecasting McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
IS415 Geospatial Analytics for Business Intelligence
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Objectives of Multiple Regression
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Forecasting McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)
Spatial Analysis.
Ch4 Describing Relationships Between Variables. Pressure.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Time Series Forecasting Chapter 16.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Time Series Forecasting Chapter 13.
Geographic Information Science
Model Construction: interpolation techniques 1392.
Forecasting to account for seasonality Regularly repeating movements that can be tied to recurring events (e.g. winter) in a time series that varies around.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Curve-Fitting Regression
3.2 Least Squares Regression Line. Regression Line Describes how a response variable changes as an explanatory variable changes Formula sheet: Calculator.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Managerial Economics Demand Estimation & Forecasting.
Evaluating Transportation Impacts of Forecast Demographic Scenarios Using Population Synthesis and Data Simulation Joshua Auld Kouros Mohammadian Taha.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Lecture 2 Basics of probability in statistical simulation and stochastic programming Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius,
Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Grid-based Map Analysis Techniques and Modeling Workshop
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Characterizing Rural England using GIS Steve Cinderby, Meg Huby, Anne Owen.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
MODEL DIAGNOSTICS By Eni Sumarminingsih, Ssi, MM.
Looking for statistical twins
Chapter 4: Basic Estimation Techniques
Spatial statistics: Spatial Autocorrelation
Chapter 4 Basic Estimation Techniques
Part 5 - Chapter
Ch3: Model Building through Regression
Summary of Prev. Lecture
Machine learning, pattern recognition and statistical data modelling
of Temperature in the San Francisco Bay Area
Spatial Analysis Longley et al..
Statistical Methods For Engineers
Spatial interpolation
Linear Model Selection and regularization
Presentation transcript:

A Comparison of Zonal Smoothing Techniques Prof. Chris Brunsdon Dept. of Geography University of Leicester

Background Much social science data comes aggregated over irregular spatial zones Census Wards Police beat zones Neighbourhood renewal areas CDRP Special Areas

Typical Problems Changing from one set of geographical units to another Areas of special concern for crime reduction (not the aggregation units used to report crime rates) Compare crime rates with social data (different aggregation units) One solution Convert to surface - re-aggregate to new zones

Factors to Consider Data Collection Statistical Issues Software Issues Underlying Theory Diagnostics Organisational Issues

Background (1) CAMSTATS web site Developed at UCL as a consultancy (Muki Hacklay) Gives public access to crime data - going back to April 2000 Designed so that police officers (or civilians) can update web page in a single button click Has run without problems or need for advice or intervention

Background (2) Crime rates are mapped for a number of areal units Wards Police Sectors Neighbourhood Renewal Areas Special Areas

Approaches Roughness Penalty Pycnophylactic Interpolation Naive Averaging

Form of Problem Estimate an underlying crime risk surface from zonal data Continuous version of model: In some approaches only

Discrete Approximation: This is an over specified regression model. NB - error term only in some approaches

Over-Specified? What does this mean? More variables than observations Solution is not unique ie - for a given zone set all pixels to zero, and set one to crime count set all pixels to 1/n of crime count if n is number of pixels in region

A Discrete roughness penalty Rougness Penalty In fact there are an infinite number of solutions to equation on earlier slide Favour those with a lower roughness penalty c.f. regularization problems Aim to minimise sum of squared errors + const. x roughness Roughness at

This Can be solved by matrix algebra Contains info relating pixels to zones Encapsulates ‘total roughness’ for all pixels Controls roughness penalty Observed zonal count X is an indicator matrix showing which pixel is in which zone

Software Techniques here are not ‘off the shelf’ Statistical/numerical as well as GIS techniques Here the ‘R’ package used Statistical programming language Good graphical support Open Source (with lots of libraries - including GIS- type support)

Pycnophylactic Interpolation Similar to Roughness Penalty - but no errors allowed - cf Tobler 1979 Can be solved as a quadratic programming problem

Naive Approach Assume that the density within each areal unit is constant

HOUSING DENSITY: Is it sensible to assume intensity of household burglaries is smooth?

Model Modification Densities can be obtained with David Martin’s SURPOP approach - can apply this modification to all approaches described earlier

Routine activity Theory We now assume risk per household is smooth Perhaps in line with Cohen & Felson’s ROUTINE ACTIVITY THEORY? Offenders choose targets according to their usual movement patterns Familiary with a pixel suggests familiarity with its neighbours But potential targets have to be there as well!

Evaluation Camstats web site ( Monthly household burglary rates from April 2003 to March 2006 Aggregated over a number of different zones Models are calibrated by UK census wards (64x64 pixels) Then tested against two special interest areas Camden Town / King’s Cross

Results MethodKing’s CrossCamden Town Pycnophylactic (HH) Pycnophylactic Naive (HH) Naive Roughness (HH) Roughness Numbers are mean absolute deviations in estimated burglary counts - lowest in red, runner up in green

Discussion Is simplest best? Further findings show simple estimators work best on areas close to the edge of the region, but smoothing based approaches work best further inside the region

Camden Isn’t An ISLAND!

Consequences Smoothing based approaches ‘borrow information’ from nearby places cf Toblers First Law of Geography: Everything is related to everything else, but near things are more related than distant things Because Camden isn’t an island, things are going on beyond the ‘edges’. But we don’t know what they are! So we can’t reliably borrow information So probably simpler methods perform better near the ‘edges’

A real-world problem In practice organisations sub-divide data geographically But without data sharing, individual regions appear (at least mathematically) as islands!

Conclusions - Further Work ? For Camden Town, Roughness Penalty performed best. For King’s Cross, the Naive method worked best In both cases, taking household density into account proved best Edge effects? Merging predictors? Further work - kernel based approaches...