Checking Regression Model Assumptions

Slides:



Advertisements
Similar presentations
Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Objectives (BPS chapter 24)
Simple Linear Regression
Multiple Regression Predicting a response with multiple explanatory variables.
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Regression Model Building LPGA Golf Performance
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Environmental Modeling Basic Testing Methods - Statistics III.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Thursday, May 12, 2016 Report at 11:30 to Prairieview
Lecture Slides Elementary Statistics Twelfth Edition
Chapter 4: Basic Estimation Techniques
Lecture 11: Simple Linear Regression
Chapter 20 Linear and Multiple Regression
Regression Analysis AGEC 784.
Chapter 12 Simple Linear Regression and Correlation
Correlation, Bivariate Regression, and Multiple Regression
CHAPTER 7 Linear Correlation & Regression Methods
Statistics for Managers using Microsoft Excel 3rd Edition
Regression Chapter 6 I Introduction to Regression
Inference for Regression
Chapter 11: Simple Linear Regression
Essentials of Modern Business Statistics (7e)
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 13 Simple Linear Regression
Quantitative Methods Simple Regression.
Business Statistics Multiple Regression This lecture flows well with
BPK 304W Correlation.
Simple Linear Regression
Checking Regression Model Assumptions
Basic Statistical Terms
Chapter 12 Simple Linear Regression and Correlation
Regression Transformations for Normality and to Simplify Relationships
PENGOLAHAN DAN PENYAJIAN
Hypothesis testing and Estimation
Basic Practice of Statistics - 3rd Edition Inference for Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression
Linear Regression and Correlation
Product moment correlation
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Diagnostics and Remedial Measures
Linear Regression and Correlation
Chapter Nine: Using Statistics to Answer Questions
Diagnostics and Remedial Measures
Presentation transcript:

Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights

Data Description / Model Heights (X) and Weights (Y) for 505 NBA Players in 2013/14 Season. Other Variables included in the Dataset: Age, Position Simple Linear Regression Model: Y = b0 + b1X + e Model Assumptions: e ~ N(0,s2) Errors are independent Error variance (s2) is constant Relationship between Y and X is linear No important (available) predictors have been ommitted

Regression Model

Checking Normality of Errors Graphically Histogram – Should be mound shaped around 0 Normal Probability Plot – Residuals versus expected values under normality should follow a straight line. Rank residuals from smallest (large negative) to highest (k = 1,…,n) Compute the percentile for the ranked residual: p=(k-0.375)/(n+0.25) Obtain the Z-score corresponding to the percentiles: z(p) Expected Residual = √MSE*z(p) Plot Ordered residuals versus Expected Residuals Numerical Tests: Correlation Test: Obtain correlation between ordered residuals and z(p). Critical Values for n up to 100 are provided by Looney and Gulledge (1985)). Shapiro-Wilk Test: Similar to Correlation Test, with more complex calculations. Printed directly by statistical software packages

Normal Probability Plot / Correlation Test Extreme and Middle Residuals The correlation between the Residuals and their expected values under normality is 0.9972. Based on the Shapiro-Wilk test in R, the P-value for H0: Errors are normal is P = .0859 (Do not reject Normality)

Checking the Constant Variance Assumption Plot Residuals versus X or Predicted Values Random Cloud around 0  Linear Relation Funnel Shape  Non-constant Variance Outliers fall far above (positive) or below (negative) the general cloud pattern Plot absolute Residuals, squared residuals, or square root of absolute residuals Positive Association  Non-constant Variance Numerical Tests Brown-Forsyth Test – 2 Sample t-test of absolute deviations from group medians Breusch-Pagan Test – Regresses squared residuals on model predictors (X variables)

Equal (Homogeneous) Variance - I

Equal (Homogeneous) Variance - II

Brown-Forsyth and Breusch-Pagan Tests Brown-Forsyth Test: Group 1: Heights ≤ 79”, Group 2: Heights ≥ 80” H0: Equal Variances Among Errors (Reject H0) Breusch-Pagan Test: H0: Equal Variances Among Errors (Reject H0)

Linearity of Regression

Linearity of Regression

Height and Weight Data – n=505, c=18 Groups Do not reject H0: mj = b0 + b1Xj

Box-Cox Transformations Automatically selects a transformation from power family with goal of obtaining: normality, linearity, and constant variance (not always successful, but widely used) Goal: Fit model: Y’ = b0 + b1X + e for various power transformations on Y, and selecting transformation producing minimum SSE (maximum likelihood) Procedure: over a range of l from, say -2 to +2, obtain Wi and regress Wi on X (assuming all Yi > 0, although adding constant won’t affect shape or spread of Y distribution)

Box-Cox Transformation – Obtained in R Maximum occurs near l = 0 (Interval Contains 0) – Try taking logs of Weight

Results of Tests (Using R Functions) on ln(WT) Normality of Errors (Shapiro-Wilk Test) > shapiro.test(e2) Shapiro-Wilk normality test data: e2 W = 0.9976, p-value = 0.679 > nba.mod2 <- lm(log(Weight) ~ Height) > summary(nba.mod2) Call: lm(formula = log(Weight) ~ Height) Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) 3.0781 0.0696 44.20 <2e-16 Height 0.0292 0.0009 33.22 <2e-16 Residual standard error: 0.06823 on 503 degrees of freedom Multiple R-squared: 0.6869, Adjusted R-squared: 0.6863 F-statistic: 1104 on 1 and 503 DF, p-value: < 2.2e-16 Constant Error Variance (Breusch-Pagan Test) > bptest(log(Weight) ~ Height,studentize=FALSE) Breusch-Pagan test data: log(Weight) ~ Height BP = 0.4711, df = 1, p-value = 0.4925 Linearity of Regression (Lack of Fit Test) nba.mod3 <- lm(log(Weight) ~ factor(Height)) > anova(nba.mod2,nba.mod3) Analysis of Variance Table Model 1: log(Weight) ~ Height Model 2: log(Weight) ~ factor(Height) Res.Df RSS Df Sum of Sq F Pr(>F) 1 503 2.3414 2 487 2.2478 16 0.093642 1.268 0.2131 Model fits well on all assumptions