IS415 Geospatial Analytics for Business Intelligence

Slides:



Advertisements
Similar presentations
Our Approach: Use a separate regression function for different regions. Problem: Need to find regions with a strong relationship between the dependent.
Advertisements

Analysis of variance and statistical inference.
Geographically weighted regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
Inference for Regression
Spatial Autocorrelation using GIS
Self Organization: Competitive Learning
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
Spatial Autocorrelation Basics NR 245 Austin Troy University of Vermont.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
GIS and Spatial Statistics: Methods and Applications in Public Health
More than just maps. A Toolkit for Spatial Analysis GUI access to the most frequently used tools ArcToolbox – an expandable collection of ready-to-use.
Correlation and Autocorrelation
Chapter 12 Simple Regression
Applied Geostatistics Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Applied Geostatistics
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Clustered or Multilevel Data
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Linear Regression and Correlation Analysis
SA basics Lack of independence for nearby obs
Multiple Regression and Correlation Analysis
Why Geography is important.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
COMPUTATIONAL ASPECTS OF LOCAL REGRESSION MODELLING: taking spatial analysis to another level Stewart Fotheringham Martin Charlton Chris Brunsdon Spatial.
Area Objects and Spatial Autocorrelation Chapter 7 Geographic Information Analysis O’Sullivan and Unwin.
Brian Klinkenberg Geography
Chapter 9 Statistical Data Analysis
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Spatial Statistics in Ecology: Area Data Lecture Four.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Sampling Populations Ideal situation - Perfect knowledge Not possible in many cases - Size & cost Not necessary - appropriate subset  adequate estimates.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Analysis A statistical procedure used to find relations among a set of variables.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Introduction. Spatial sampling. Spatial interpolation. Spatial autocorrelation Measure.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Regression Analysis: A statistical procedure used to find relations among a set of variables B. Klinkenberg G
INTRODUCTION Despite recent advances in spatial analysis in transport, such as the accounting for spatial correlation in accident analysis, important research.
Spatial Databases First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. Lecture.
Stats Methods at IC Lecture 3: Regression.
Multiple Regression.
Chapter 13 Simple Linear Regression
Why Model? Make predictions or forecasts where we don’t have data.
Spatial statistics: Spatial Autocorrelation
Luciano Gutierrez*, Maria Sassi**
Chapter 2: The Pitfalls and Potential of Spatial Data
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
Lecture 6 Implementing Spatial Analysis
Multiple Regression.
Spatial Autocorrelation
Gerald Dyer, Jr., MPH October 20, 2016
Spatial interpolation
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Why are Spatial Data Special?
Spatial Data Analysis: Intro to Spatial Statistical Concepts
Presentation transcript:

IS415 Geospatial Analytics for Business Intelligence Lesson 10: Geospatial Data Analysis Part 1: Lattice Analysis

What will you learn from this lesson The differences between GIS analysis and geospatial data analysis Challenges face in analysing geospatial data The basic concepts of point patterns and point patterns analysis techniques

Core Competencies Capable to setup a Geospatial Web Server Capable to design and implement web map services that are conformed to OCG WMS standard Capable to design and implement web feature services that are conformed to OCG WFS standard

Tobler’s First law of Geography Everything is related to everything else, but near things are more related than distant things Structure of Spatial Dependence Distance decay Closeness = Similarity Source: http://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography and http://en.wikipedia.org/wiki/Waldo_R._Tobler

Question?

Defining Spatial Weight Matrices Adjacency criterion: 1 if location j is adjacent to i, wij =  0 if location j is not adjacent to i.   Distance criterion: 1 if location j is within distance d from i, wij (d) =  0 otherwise. A general spatial distance weight matrices: wij (d) = dij-ab

Spatial Weight Matrix: Contiguity Weight

Identifying Global Pattern of Spatial Distribution Moran I: Geary C: Moran I (Z value) is positive: observations tend to be similar; negative: observations tend to be dissimilar; approximately zero: observations are arranged randomly over space. Geary C: large C value (>>1): observations tend to be dissimilar; small C value (<<1) indicates that they tend to be similar.

Identifying Local Patterns of Spatial Distribution Local Moran: significant and negative if location i is associated with relatively low values in surrounding locations; significant and positive if location i is associated with relatively high values of the surrounding locations. Local Geary: significant and small Local Geary (t<0) suggests a positive spatial association (similarity); significant and large Local Geary (t>0) suggests a negative spatial association (dissimilarity).

Modifiable Areal Unit Problem (MAUP) Aggregation Problem Special case of ecological fallacy Spatial heterogeneity A million spatial allocarrelation coefficients Zonation Problem Both in size and spatial arrangement

Why do relationships vary spatially? Sampling variation Nuisance variation, not real spatial non-stationarity Relationships intrinsically different across space Real spatial non-stationarity Model misspecification Can significant local variations be removed?

Some definitions Spatial non-stationarity: the same stimulus provokes a different response in different parts of the study region Global models: statements about processes which are assumed to be stationary and as such are location independent Local models: spatial decompositions of global models, the results of local models are location dependent – a characteristic we usually anticipate from geographic (spatial) data

Regression Regression establishes relationship among a dependent variable and a set of independent variable(s) A typical linear regression model looks like: yi=0 + 1x1i+ 2x2i+……+ nxni+i With yi the dependent variable, xji (j from 1 to n) the set of independent variables, and i the residual, all at location i When applied to spatial data, as can be seen, it assumes a stationary spatial process The same stimulus provokes the same response in all parts of the study region Highly untenable for spatial process

Geographically weighted regression Local statistical technique to analyze spatial variations in relationships Spatial non-stationarity is assumed and will be tested Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related

GWR Addresses the non-stationarity directly Allows the relationships to vary over space, i.e., s do not need to be everywhere the same This is the essence of GWR, in the linear form: yi=i0 + i1x1i+ i2x2i+……+ inxni+i Instead of remaining the same everywhere, s now vary in terms of locations (i)

A Hedonic House Pricing Model Housing hedonic model in Milwaukee Data: MPROP 2004 – 3430+ samples used Dependent variable: the assessed value (price) Independent variables: air conditioner, floor size, fire place, house age, number of bathrooms, soil and Impervious surface (remote sensing acquired)

The global model

The global model 62% of the dependent variable’s variation is explained All determinants are statistically significant Floor size is the largest positive determinant; house age is the largest negative determinant Deteriorated environment condition (large portion of soil&impervious surface) has significant negative impact

GWR run: summary Number of nearest neighbors for calibration: 176 (adaptive scheme) AIC: 76317.39 (global: 81731.63) GWR performs better than global model

GWR run: non-stationarity check F statistic Numerator DF Denominator DF* Pr (> F) Floor Size 2.51 325.76 1001.69 0.00 House Age 1.40 192.81 1 001.69 Fireplace 1.46 80.62 0.01 Air Conditioner 1.23 429.17 Number of Bathrooms 2.49 262.39 Soil&Imp. Surface 1.42 375.71 Tests are based on variance of coefficients, all independent variables vary significantly over space

General conclusions Except for floor size, the established relationship between house values and the predictors are not necessarily significant everywhere in the City Same amount of change in these attributes (ceteris paribus) will bring larger amount of change in house values for houses locate near the Lake than those farther away

General conclusions In the northwest and central eastern part of the City, house ages and house values hold opposite relationship as the global model suggests This is where the original immigrants built their house, and historical values weight more than house age’s negative impact on house values

Calibration of GWR Local weighted least squares Weights are attached with locations Based on the “First Law of Geography”: everything is related with everything else, but closer things are more related than remote ones

Weighting schemes Determines weights Most schemes tend to be Gaussian or Gaussian-like reflecting the type of dependency found in most spatial processes It can be either Fixed or Adaptive Both schemes based on Gaussian or Gaussian-like functions are implemented in GWR3.0 and R

Fixed weighting scheme Bandwidth Weighting function

Problems of fixed schemes Might produce large estimate variances where data are sparse, while mask subtle local variations where data are dense In extreme condition, fixed schemes might not be able to calibrate in local areas where data are too sparse to satisfy the calibration requirements (observations must be more than parameters)

Adaptive weighting schemes Weighting function Bandwidth

Adaptive weighting schemes Adaptive schemes adjust itself according to the density of data Shorter bandwidths where data are dense and longer where sparse Finding nearest neighbors are one of the often used approaches

Calibration Surprisingly, the results of GWR appear to be relatively insensitive to the choice of weighting functions as long as it is a continuous distance-based function (Gaussian or Gaussian-like functions) Whichever weighting function is used, however the result will be sensitive to the bandwidth(s)

Calibration An optimal bandwidth (or nearest neighbors) satisfies either Least cross-validation (CV) score CV score: the difference between observed value and the GWR calibrated value using the bandwidth or nearest neighbors Least Akaike Information Criterion (AIC) An information criterion, considers the added complexity of GWR models