Robust Regression. Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
CHAPTER 2 Building Empirical Model. Basic Statistical Concepts Consider this situation: The tension bond strength of portland cement mortar is an important.
Inference for Regression
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
Statistics for Managers Using Microsoft® Excel 5th Edition
Linear Regression with One Regression
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 24 Multiple Regression (Sections )
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Pertemua 19 Regresi Linier
Chapter 15: Model Building
Business Statistics - QBM117 Statistical inference for regression.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Analysis (2)
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Statistical Methods Statistical Methods Descriptive Inferential
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Lecture 10: Correlation and Regression Model.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
MATHS CORE AND OPTIONAL ASSESSMENT STANDARDS Found in the Subject Assessment Guidelines (SAG) Dated January 2008.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 15 Multiple Regression Model Building
Chapter 4 Basic Estimation Techniques
Kin 304 Regression Linear Regression Least Sum of Squares
CHAPTER 12 More About Regression
BPK 304W Regression Linear Regression Least Sum of Squares
Diagnostics and Transformation for SLR
LESSON 4.4. MULTIPLE LINEAR REGRESSION. Residual Analysis
Migration and the Labour Market
CHAPTER 12 More About Regression
Simple and Multiple Regression
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
CHAPTER 12 More About Regression
Diagnostics and Transformation for SLR
Presentation transcript:

Robust Regression

Regression Methods  We are going to look at three approaches to robust regression:  Regression with robust standard errors  Regression with robust standard errors including the cluster option  Regression with random effect  Regression with fixed effect

 We will look at a model that predicts the api 2000 scores  Our focus is whether the average class size in K through 3 (acs_k3) and average class size 4 through 6 (acs_46) affect the academic performance

 use a new data set  mapi2

4.1.1 Regression with Robust Standard Errors  The Stata regress command includes a robust option for estimating the standard errors using the Huber- White sandwich estimators.  Such robust standard errors can deal with a collection of minor concerns about failure to meet assumptions, Minor problems about normality Heteroscedasticity Some observations that exhibit large residuals, leverage or influence.

 With the robust option, the point estimates of the coefficients are exactly the same as in ordinary OLS, but the standard errors take into account issues concerning heterogeneity and lack of normality.

 As with the robust option, the estimate of the coefficients are the same as the OLS estimates, but the standard errors take into account that the observations within districts are non-independent.  If you have a very small number of clusters compared to your overall sample size it is possible that the standard errors could be quite larger than the OLS results. For example, if there were only 3 districts, the standard errors would be computed on the aggregate scores for just 3 districts.

Using the Cluster Option  The elemapi2 dataset contains data on 400 schools that come from 37 school districts. It is very possible that the scores within each school district may not be independent, and this could lead to residuals that are not independent within districts.  We can use the cluster option to indicate that the observations are clustered into districts (based on dnum) and that the observations may be correlated within districts, but would be independent between districts.

 Control for random effect (school district)

 Control for fixed effect (school district)

2.4 Examine Distribution Assumption  Classical regression assumption requires that the outcome (dependent) to be normally distributed.  In large sample, this assumption is not that important because of Central Limit Theory  In small sample, however, the distribution assumption could be relevant  We will investigate issues concerning normality.

 Here we check the normality of enroll  We start with making some graphs Hisgram Kdesnity

We can use the normal option to superimpose a normal curve on this graph and the bin(20) option to use 20 bins. The distribution looks skewed to the right.

 An alternative to histograms is the kernel density plot, which approximates the probability density of the variable.  Kernel density plots have the advantage of being smooth and of being independent of the choice of origin, unlike histograms.  Stata implements kernel density plots with the kdensity command.

 Having concluded that enroll is not normally distributed, how should we address this problem?  We may try to transform enroll to make it more normally distributed. Potential transformations include taking the log, the square root or raising the variable to a power.  Stata includes the ladder and gladder commands to help selecting the right transformation. Ladder reports numeric results and gladder produces a graphic display.

 This indicates that the log transformation would help to make enroll more normally distributed.  Let's use the generate command with the log function to create the variable lenroll which will be the log of enroll.  Note that log in Stata will give you the natural log, not log base 10. To get log base 10, type log10(var)

2. 5 Summary  Simple Regression  Multiple Regression  Hypothesis Testing  Examine the normality assumption

Quiz I  Make graphs of api99: histogram, kdensity plot  What is the correlation between api99 and meals?  Regress api99 on meals.  Create and list the fitted (predicted) values.  Graph meals and api99 with and without the regression line.

Quiz II  Look at the correlations among the variables api99 meals ell avg_ed using the corr and pwcorr commands.  Perform a regression predicting api99 from meals and ell. Interpret the output.