Briggs Henan University 2010

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Econ 140 Lecture 81 Classical Regression II Lecture 8.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Local Measures of Spatial Autocorrelation
Correlation and Autocorrelation
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 6: Multiple Regression
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
SIMPLE LINEAR REGRESSION
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Statistical inference for regression.
Using GeoDA geodatacenter.asu.edu
Assumption of Homoscedasticity
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Point Pattern Analysis
Example of Simple and Multiple Regression
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
Lecture 15 Basics of Regression Analysis
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
SIMPLE LINEAR REGRESSION
Correlation and Regression
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Regression Method.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Hierarchical Binary Logistic Regression
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
CHAPTER 14 MULTIPLE REGRESSION
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistical methods for real estate data prof. RNDr. Beáta Stehlíková, CSc
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 13 Simple Linear Regression
Chapter 15 Multiple Regression Model Building
Regression Analysis.
PowerPoint Slides Prepared by Robert F. Brooker, Ph.D. Slide 1
Correlation and Simple Linear Regression
Regression Analysis Simple Linear Regression
Prepared by Lee Revere and John Large
SIMPLE LINEAR REGRESSION
Chapter 7: The Normality Assumption and Inference with OLS
Product moment correlation
Chapter 13 Additional Topics in Regression Analysis
Presentation transcript:

Briggs Henan University 2010 Regression in geoDA Example regression analyses for Illiteracy Rate ( ILLITERACY) ChinaData.shp (n=35) 1. Simple regression with URBAN_POP_ ChinaData_29 (n=29) 2. Simple regression with URBAN_POP 3. Multiple regression with URBAN_POP and RMB_PC_UR_ 4. Spatial lag and error multiple regression 5. Multiple regression with log of Illiteracy Briggs Henan University 2010

Running Regression in geoDA: I 1 File>Open Shape File ChinaData Tools>Weights> Open or Create Need weights to test for spatial autocorrelation. Generally, always use a weights file. You can begin with Method>Regress if --very large number of observations (over 1,000) --no spatial weights --data only in a .dbf file 2 Methods>Regress Place as below If you have a large number of observations, do not Need this for Moran’ s I for residuals

Running Regression in geoDA: II Select one dependent variable One or more independent variables Select type of regression: Classic or Lag or Error Warning-bug! Use Suggested name. The names are reversed here! Click OK to save these. Saves values for Predicted Y and Residuals in the table --use Table>>Promotion to see them in table. --you can map them or draw graphs --use Table >> Save to Shapefile if you want to keep them permanently Click RUN, then Click SAVE

Running Regression in geoDA: III Results are saved in this text file. It is saved in the same folder as the shapefile. You can rename it and change location. Click OK to see the results. (You can also open the file later with a program such as Notepad) --scroll to end of file since results are added to end if file already exists Warning: if you want the residuals (see previous slide) you must click Save before clicking OK Click Reset to run a different regression The results

Summary: Running Regression in geoDA Warning-bug! Use Suggested name. The names are reversed here! Select variables as below. Select type of regression: Classic Lag Error File>Open Shape File ChinaData Tools>Weights> Open or Create (need weights to test for spatial autocorrelation in residuals) Methods>Regress Place as below Click OK to save these. Use Table>Promotion to see them in table. Click OK in Regression window to see results --scroll to end of file since results are added to end if file exists already Click RUN, then Click SAVE

Regression for Provinces: n = 35 Next slide shows results from running a simple regression with ChinaData.shp Y = Illiteracy rate (ILLITERACY) X = % of population urban (URBAN_POP_) All provinces included Note problems with Extreme value for Xizang/Tibet Zeros (0) for missing data on X variable (Taiwan, Macau, Hong Kong, P’eng-hu) Solution: Reduced data set to 29 using ArcGIS (do not know how to do this in geoDA!) Briggs Henan University 2010

Results for simple regression Display table: Table >Promotion Plot using: Explore >ScatterPlot Results for simple regression Note: mean of residuals is always zero Residual Variation OLS_Resid v. Urban Pop% Total Variation Illiteracy v. Urban Pop% Predicted by Regression OLS_Predict v. Urban Pop% Extreme value identified by linking: Xizang/Tibet Briggs Henan University 2010

Partitioning the Variance on Y Residual Variation OLS_Resid v. Urban Pop% Total Variation Illiteracy v. Urban Pop% Predicted by Regression OLS_Predict v. Urban Pop% (Y-Ỹ) Y Ỹ Y SS Residual or Error Sum of Squares SS Total or Total Sum of Squares SS Regression or Explained Sum of Squares Briggs Henan University 2010

Simple Regression Results from GeoDA: general Statistics for dependent variable n = 35 Not statistically significant Results for overall regression explains only 4.6% of variance in Y Sigma-square= Variance of the estimate = 1368.89/33=41.4816 SE of regression=standard error of the estimate=√41.4816=6.44062 Identical in simple regression Results for each regression coefficient Y= 11.3146 - 6.578X Briggs Henan University 2010

Simple Regression Results from GeoDA: spatial Moran’s I for regression residuals --not statistically significant (p=.09) Space > Univariate Moran for variable: OLS_Resid Same results! Briggs Henan University 2010

Results with omitted observations: much better! Now explains 33.41% But probably non-linear Statistically significant Spatial autocorrelation not a problem Data for China Provinces 29: excludes Xizang/Tibet, Macao, Hong Kong, Hainan, Taiwan, P'eng-hu Briggs Henan University 2010

Briggs Henan University 2010 Multiple Regression Results n = 29 Illiteracy with % Pop Urban and Urban Income Overall Results Results for each variable significant Not significant Spatial Results Not significant Briggs Henan University 2010

Residual Analysis: Illiteracy v. Urban Pop % and UrbanIncomePerCapita Moran’s I = .0226 p = 0.5520 Not statistically significant No Spatial autocorrelation in residuals Briggs Henan University 2010

Spatial Error Model Results illustrative only: not needed Spatial error not significant Briggs Henan University 2010

Spatial Lag Model Results illustrative only: not needed Spatial lag not significant Briggs Henan University 2010

Regression Results Summary Overall Urban Pop Urban Income *Spatial Term R2 Adj2 Akaike F F-prob coeff Test Stat prob coef Simple-35 0.046 0.017 231.65 1.60 0.215 -6.58 -1.263   0.1636 1.678 0.0934 Simple-29 0.334 0.309 155.42 13.55 0.001 -16.15 -3.681 0.0272 0.578 0.5631 Multiple 0.384 0.337 155.16 8.11 0.002 -26.80 -3.151 0.000 0.00041 1.452 0.159 -0.0226 0.383 0.7015 Spatial Error 0.385 155.13 -27.02 -3.411 1.572 0.116 -0.0389 -0.162 0.8716 Spatial Lag 0.387 157.05 -26.00 -3.128 0.006 0.00040 1.486 0.137 0.0720 0.340 0.7339 *Spatial Term OLS: for Moran's I For Multiple Regression 29 Lag: for W_Illiteracy Robust LM (lag) 1.312 0.2520 Error: for Lambda Robust LM (error) 1.220 0.2693 Briggs Henan University 2010

Note on: Variables Saved for Spatial Models Again, labels are reversed. Use suggested variable names. ERR_ indicates use of Spatial Error model. LAG_indicates use of Spatial Lag Model OLS_ indicates use of classic model For the spatial lag model, there is a distinction between the residual and the prediction error. The latter is the difference between the observed value and the predicted value that uses only exogenous variables, rather than treating the spatial lag Wy as observed. (Documentation for 905i, page 53) Prediction error (xxx_PRDERR): calculated without including spatial term. Residual error (xxx_RESIDU): calculated including spatial term Briggs Henan University 2010

Improving the model Relationship is Non-linear Use log of Illiteracy Table >> Add Column Table >> Field Calculator Improving the model Relationship is Non-linear Use log of Illiteracy Briggs Henan University 2010

The same plots using Excel Relationship is Non-linear Illiteracy Log of Illiteracy Urban pop % Briggs Henan University 2010

Briggs Henan University 2010 Y = Log of Illiteracy R2 increases from 38% to 83% ! Urban Income now significant and Urban Population is not! Briggs Henan University 2010

Log of Illiteracy: makes relationship linear Overall Urban Pop Urban Income *Spatial Term R2 Adj2 Akaike F F-prob coeff Test Stat prob coef Simple-35 0.046 0.017 231.65 1.60 0.215 -6.58 -1.263   0.1636 1.678 0.0934 Simple-29 0.334 0.309 155.42 13.55 0.001 -16.15 -3.681 0.0272 0.578 0.5631 Multiple 0.384 0.337 155.16 8.11 0.002 -26.80 -3.151 0.000 0.00041 1.452 0.159 -0.0226 0.383 0.7015 Multiple Log Y 0.837 0.824 560.07 66.69 -3962.73 -1.800 0.083 -6446.67 -2.975 0.006 -0.1192 -0.548 0.5839 *Spatial Term OLS: for Moran's I Urban Income now significant, and % urban not significant. --these two variables are highly intercorrelated --see next slide Briggs Henan University 2010

Inter-Correlation between Urban Population and Urban Income R2 for Urban Pop versus Urban Income 0.84 R is .92 N=29 Urban Population Urban Income Briggs Henan University 2010

Briggs Henan University 2010 Table >> Add Column then use Table >> Field Calculator Creating a better model Transforming dependent and/or independent variables can often improve the predictive capability of regression models geoDA has several capabilities to support this. Briggs Henan University 2010

Other software options for multiple regression Multiple regression of the type discussed here is not available in ArcGIS Only geographically weighted regression available (there is a multiple regression for raster data but it is only in ArcInfo Workstation—difficult to use) Use geoDA to create spatial lag variables, then use standard statistical packages such as SAS, SPSS or STATA Use R Free open source software, but difficult to use http://cran.r-project.org/web/views/Spatial.html CrimeStat III has some support for spatial regression http://www.icpsr.umich.edu/NACJD/crimestat.html For a good list of spatial software sources, go to: http://en.wikipedia.org/wiki/List_of_spatial_analysis_software Briggs Henan University 2010

What have we learned today? How to use geoDA to run classic regression models Spatial Lag models Spatial Error Models Importance of examining data for “problems” Can have a very large affect on results Missing data and zeros Extreme values can dominate results Using transformations to create a better model Briggs Henan University 2010

Briggs Henan University 2010

Geographically Weighted Regression Briggs Henan University 2010

Geographically Weighted Regression The idea of Local Indicators can also be applied to regression Its called geographically weighted regression It calculates a separate regression for each polygon and its neighbors, then maps the parameters from the model, such as the regression coefficient (b) and/or its significance value Mathematically, this is done by applying the spatial weights matrix (Wij) to the standard formulae for regression See Fotheringham, Brunsdon and Charlton Geographically Weighted Regression Wiley, 2002 Xi Briggs Henan University 2010

Problems with Geographically Weighted Regression Each regression is based on few observations the estimates of the regression parameters (b) are unreliable Need to use more observations than just those with shared border, but how far out do we go? How far out is the “local effect”? Need strong theory to explain why the regression parameters are different at different places Serious questions about validity of statistical inference tests since observations not independent Xi Briggs Henan University 2010

Briggs Henan University 2010 GWR in ARCGIS Requires ArcInfo, Spatial Analyst or Geostat. Analyst license Shapefile is created: Open its table to see results for each polygon there are standard regression results Condition variable: indicates when the results are unstable due to local multicollinearity Results not good if condition > 30, Null, or -1.79e+308 Use source_ID to join with FID of original data to identify observations Briggs Henan University 2010

Usage Tips from ArcGIS Help Use projected data Observations included in each regression depend on kernal type, bandwidth method and bandwidth distance parameters set by user Max of 1,000 observations in any one local regression Multicollinearity can be a problem if variables cluster spatially if use binary/nominal/categorical variables Never use dummy variables (1/0) to index spatial regions (Multicollinearity: intercorrelation between independent variables) Not appropriate for small data sets: need several hundred observations Shapefiles cannot store “nul l” values: treated as zero. Be sure there is no missing data Briggs Henan University 2010

Briggs Henan University 2010 Running GWR in ArcGIS Briggs Henan University 2010

Execution Dialog for GWR in ArcGIS Results presumable for global regression????? --R2 value does not agree with results from geoDA? Briggs Henan University 2010

Mapping Results from GWR in ArcGIS (Default) standardized residuals --the bigger the absolute value the poorer the prediction? Regression coefficient for % Urban Pop --larger impact of urban pop in south east China. Briggs Henan University 2010

Briggs Henan University 2010 Join with the original shapefile using FID and Source_Id in order to identify provinces Briggs Henan University 2010

GWR output: R2 and Y values Output table (part) (Columns reordered. Highlighted columns obtained from join with original data.) Observed: values on the dependent variable Y Predicted values and residuals are based upon each local regression and are not the same as those for a global regression. Briggs Henan University 2010

GWR output: regression coefficients and standard errors Standard error of the estimate Regression coefficients (b) Standard error of the coefficients No statistical significance results provided --statistical significance tests in GWR have been severely criticized. Briggs Henan University 2010