Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
Simple Linear Regression
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Multiple Regression Predicting a response with multiple explanatory variables.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Chapter 12 Simple Regression
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
The Simple Regression Model
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Quantitative Business Analysis for Decision Making Simple Linear Regression.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 7 Forecasting with Simple Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Regression Model Building LPGA Golf Performance
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Using R for Marketing Research Dan Toomey 2/23/2015
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Correlation, Bivariate Regression, and Multiple Regression
Checking Regression Model Assumptions
Simple Linear Regression
Checking Regression Model Assumptions
Diagnostics and Remedial Measures
Linear Regression and Correlation
Diagnostics and Remedial Measures
Presentation transcript:

Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights

Data Description / Model Heights (X) and Weights (Y) for 505 NBA Players in 2013/14 Season. Other Variables included in the Dataset: Age, Position Simple Linear Regression Model: Y =  0 +  1 X +  Model Assumptions:   N(0,  2 )  Errors are independent  Error variance (  2 ) is constant  Relationship between Y and X is linear  No important (available) predictors have been ommitted

Regression Model

Checking Normality of Errors Graphically  Histogram – Should be mound shaped around 0  Normal Probability Plot – Residuals versus expected values under normality should follow a straight line. Rank residuals from smallest (large negative) to highest (k = 1,…,n) Compute the percentile for the ranked residual: p=(k-0.375)/(n+0.25) Obtain the Z-score corresponding to the percentiles: z(p) Expected Residual = √MSE*z(p) Plot Ordered residuals versus Expected Residuals Numerical Tests:  Correlation Test: Obtain correlation between ordered residuals and z(p). Critical Values for n up to 100 are provided by Looney and Gulledge (1985)).  Shapiro-Wilk Test: Similar to Correlation Test, with more complex calculations. Printed directly by statistical software packages

Normal Probability Plot / Correlation Test Extreme and Middle Residuals The correlation between the Residuals and their expected values under normality is Based on the Shapiro-Wilk test in R, the P-value for H 0 : Errors are normal is P =.0859 (Do not reject Normality)

Checking the Constant Variance Assumption Plot Residuals versus X or Predicted Values  Random Cloud around 0  Linear Relation  Funnel Shape  Non-constant Variance  Outliers fall far above (positive) or below (negative) the general cloud pattern  Plot absolute Residuals, squared residuals, or square root of absolute residuals  Positive Association  Non-constant Variance Numerical Tests  Brown-Forsyth Test – 2 Sample t-test of absolute deviations from group medians  Breusch-Pagan Test – Regresses squared residuals on model predictors (X variables)

Equal (Homogeneous) Variance - I

Equal (Homogeneous) Variance - II

Brown-Forsyth and Breusch-Pagan Tests Brown-Forsyth Test: Group 1: Heights ≤ 79”, Group 2: Heights ≥ 80” H 0 : Equal Variances Among Errors (Reject H 0 ) Breusch-Pagan Test: H 0 : Equal Variances Among Errors (Reject H 0 )

Linearity of Regression

Height and Weight Data – n=505, c=18 Groups Do not reject H 0 :  j =  0 +  1 X j

Box-Cox Transformations Automatically selects a transformation from power family with goal of obtaining: normality, linearity, and constant variance (not always successful, but widely used) Goal: Fit model: Y’ =  0 +  1 X +  for various power transformations on Y, and selecting transformation producing minimum SSE (maximum likelihood) Procedure: over a range of from, say -2 to +2, obtain W i and regress W i on X (assuming all Y i > 0, although adding constant won’t affect shape or spread of Y distribution)

Box-Cox Transformation – Obtained in R Maximum occurs near = 0 (Interval Contains 0) – Try taking logs of Weight

Results of Tests (Using R Functions) on ln(WT) Normality of Errors (Shapiro-Wilk Test) > shapiro.test(e2) Shapiro-Wilk normality test data: e2 W = , p-value = > nba.mod2 <- lm(log(Weight) ~ Height) > summary(nba.mod2) Call: lm(formula = log(Weight) ~ Height) Coefficients: Est Std. Error t value Pr(>|t|) (Intercept) <2e-16 Height <2e-16 Residual standard error: on 503 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 1104 on 1 and 503 DF, p- value: < 2.2e-16 Constant Error Variance (Breusch-Pagan Test) > bptest(log(Weight) ~ Height,studentize=FALSE) Breusch-Pagan test data: log(Weight) ~ Height BP = , df = 1, p-value = Linearity of Regression (Lack of Fit Test) nba.mod3 <- lm(log(Weight) ~ factor(Height)) > anova(nba.mod2,nba.mod3) Analysis of Variance Table Model 1: log(Weight) ~ Height Model 2: log(Weight) ~ factor(Height) Res.Df RSS Df Sum of Sq F Pr(>F) Model fits well on all assumptions