Our Approach: Use a separate regression function for different regions. Problem: Need to find regions with a strong relationship between the dependent.

Slides:



Advertisements
Similar presentations
Regression Analysis: Estimating Relationships Regression Analysis is a study of relationship between a set of independent variables and the dependent variable.
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
Chapter 5 Multiple Linear Regression
Introduction to Smoothing and Spatial Regression
Statistical Techniques I EXST7005 Simple Linear Regression.
Geographically weighted regression
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
Introduction to Applied Spatial Econometrics Attila Varga DIMETIC Pécs, July 3, 2009.
Applications of Scaling to Regional Flood Analysis Brent M. Troutman U.S. Geological Survey.
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Simple Linear Regression
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
CORRELATION AND REGRESSION Research Methods University of Massachusetts at Boston ©2006 William Holmes.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Why Geography is important.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Business Statistics - QBM117 Statistical inference for regression.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Ensemble Learning (2), Tree and Forest
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Spatial Regression Model For Build-Up Growth Geo-informatics, Mahasarakham University.
IS415 Geospatial Analytics for Business Intelligence
Data Mining Techniques
Chapter 7: Demand Estimation and Forecasting
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
Introduction to Linear Regression
Extracting Regional Knowledge from Spatial Datasets Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Why is Regional.
Spatial Data Analysis Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What is spatial data and their special.
MTH 161: Introduction To Statistics
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
Geographic Information Science
Managerial Economics Demand Estimation & Forecasting.
Chapter 7: Demand Estimation and Forecasting McGraw-Hill/Irwin Copyright © 2011 by the McGraw-Hill Companies, Inc. All rights reserved.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Geo479/579: Geostatistics Ch4. Spatial Description.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Spatial Analysis of Crime Data: A Case Study Mike Tischler Presented by Arnold Boedihardjo.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Correlation. Correlation Analysis Correlations tell us to the degree that two variables are similar or associated with each other. It is a measure of.
Correlation & Regression Analysis
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
Geog. 579: GIS and Spatial Analysis - Lecture 10 Overheads 1 1. Aspects of Spatial Autocorrelation 2. Measuring Spatial Autocorrelation Topics: Lecture.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
URBDP 422 URBAN AND REGIONAL GEO-SPATIAL ANALYSIS
The simple linear regression model and parameter estimation
Why Model? Make predictions or forecasts where we don’t have data.
Labs Put your name on your labs! Layouts: Site 1 Photos Title Legend
Spatial statistics: Spatial Autocorrelation
Linear Regression.
Chapter 5 Part B: Spatial Autocorrelation and regression modelling.
Supervised Time Series Pattern Discovery through Local Importance
CH 5: Multivariate Methods
Ch9 Random Function Models (II)
Predict House Sales Price
Simple Linear Regression
Spatial Autocorrelation
Regression in the 21st Century
Spatial interpolation
Product moment correlation
Presentation transcript:

Our Approach: Use a separate regression function for different regions. Problem: Need to find regions with a strong relationship between the dependent and independent variable. Problems to be solved: 1. Discovering the Regions 2. Extracting Regional Regression Functions 3. Develop a method to select which regression function to use for a new object to be predicted. Idea Regional Regression Source:

Motivation In geo-referenced dataset, most relationships only exist at regional level but not at the global level. 1st law of geography: Everything is related to everything else but nearby things are more related than distant things (Tobler) Coefficient estimates in geo-referenced datasets spatially vary we need regression methods to discover regional coefficient estimates that captures underlying structure of data. Using human-made boundaries (zip code etc.) is not good idea since they do not reflect patterns in spatially variance. Regional Knowledge & Coefficient Estimates

Motivation Regression Trees Data is split in a top-down approach using a greedy algorithm; uses constants as regression functions Discovers only rectangular shapes Geographically Weighted Regression (GWR) an instance-based, local spatial statistical technique used to analyze spatial non-stationarity. generates a separate regression for each possible query point online determined using a grid or kernel a weight assigned to each observation that is based on its distance to the query point. Geo-Regression Analysis Methods

Motivation Regression Result: A positive linear regression line (Arsenic increases with increasing Fluoride concentration) Example 1: Why We Need Regional Knowledge? Fluoride Arsenic

Motivation A negative linear Regression line in both locations (Arsenic decreases with increasing Fluoride concentration) A reflection of Simpsons paradox[16]. Example 1: Why We Need Regional Knowledge? Fluoride Arsenic Location 1 Location 2

Motivation Example 2: Houston House Price Estimate Dependent variable: House_Price Independent variables: noOfRooms, squareFootage, yearBuilt, havePool, attachedGarage, etc..

Global Regression (OLS) produces the coefficient estimates, R 2 value, and error etc.. a model This model assumes all areas have same coefficients E.g. attribute havePool has a coefficient of +9,000 (~having a pool adds $9,000 to a house price) In reality this changes. A house of $100K and a house of $500K or different zip codes or locations. Having a pool in a house in luxury areas is very different (~$40K) than having a pool in a house in Suburbs(~$5K). Example 2: Houston House Price Estimate Motivation

Solution: To apply local regression to each zip code produces 50+ sets of parameter estimates it captures spatial variations in the relationship better than global model But it is very naïve and has problems there is spatial variation within zip codes assumes discontinuity but most spatial patterns are continuous and they do not stop & start at the border. Example 2: Houston House Price Estimate Motivation

Example 2: Houston House Price Estimate $180,000 $350,000 Houses A, B have very similar characteristics OLS produces single parameter estimates for predictor variables like noOfRooms, squareFootage, yearBuilt, etc

Motivation Example 2: Houston House Price Estimate If we use zip code as regions, they are in same region If we use a grid structure They are in different regions but some houses similar to B (lake view) are in same region with A and this will effect coefficient estimate More importantly, the house around U-shape lake show similar pattern and should be in the same region, we miss important information.