1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan.

Slides:



Advertisements
Similar presentations
Multiple Regression in Practice The value of outcome variable depends on several explanatory variables. The value of outcome variable depends on several.
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
2004 Car Prices Scott Ning Sean Meade. Predict vehicle retail price Sports Car Sport Utility Vehicle Wagon Minivan Pickup All-Wheel Drive Rear-Wheel Drive.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Predictive Analysis in Marketing Research
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
The Gas Guzzling Luxurious Cars Tony Dapontes and Danielle Sarlo.
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Non-continuous Relationships If the relationship between the dependent variable and an independent variable is non-continuous a slope dummy variable can.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
ANOVA for Regression ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Multiple Regression INCM 9102 Quantitative Methods.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Linear Models Alan Lee Sample presentation for STATS 760.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
1 Simple Linear Regression Example - mammals Response variable: gestation (length of pregnancy) days Explanatory: brain weight.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
US car manufacturers: Still a competitor?
US car manufacturers: Still a competitor?
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
Understanding Regression Analysis Basics
Regression Diagnostics
Regression Analysis Simple Linear Regression
BUSI 410 Business Analytics
Simple Linear Regression
Multiple Regression BPS 7e Chapter 29 © 2015 W. H. Freeman and Company.
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Regression Forecasting and Model Building
Indicator Variables Response: Highway MPG
Presentation transcript:

1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan

2 Explanatory Variables Engine size (liters) Cylinders (number) Horsepower Weight Wheel Base Length Width

3 “Best” Model The 7-variable model with SUV, Minivan, All Wheel, Engine, Horsepower, Weight and Wheel Base Appears to be the “best” model.

4 Prediction Equation Predicted Highway MPG = – 3.15*SUV – 3.28*Minivan – 2.08*All Wheel – 1.65*Engine – *Horsepower – *Weight *Wheel Base

5 Summary All variables add significantly. R 2 = adj R 2 = RMSE = C p =

6

7

8

9 Box Plot – Potential Outliers Vehicle Name Highway MPG Predicted MPG ResidualStandardized Residual, z Honda Civic HX 2dr Toyota Echo 2dr manual Toyota Prius 4dr (gas/electric) Volkswagen Jetta GLS TDI 4dr

10 Bonferroni Correction Adjust what is a small P-value. If a P-value is less than , then the standardized residual is statistically significant.

11 Standardized Residual Vehicle Name Standardized Residual, z Prob > |z| Honda Civic HX 2dr Toyota Echo 2dr manual Toyota Prius 4dr (gas/electric) Volkswagen Jetta GLS TDI 4dr

12 Outliers Both the Toyota Prius and the Volkswagon Jetta have standardized residuals so extreme that they are considered statistically significant (P-value < ).

13 Leverage Because we have multiple explanatory variables, there is not an easy formula for leverage, h. The leverage, h, value takes into account all of the explanatory variables.

14 Rule of Thumb High Leverage Value if n = 100, p = 7,

15 Leverage There are 10 vehicles that have leverage, h, greater than Of these, 2 have F-statistics large enough to produce P- values smaller than

16 Leverage hFProb > F Chevrolet Corvette convertible 2 dr Porsche 911 GT2 2 dr

17 Leverage What makes the leverage so high? Have to look for extreme values for the explanatory variables.

18 Leverage Chevy Corvette Has the 2 nd largest engine of all the vehicles – 5.7 liter and the 2 nd highest horsepower – 350 horsepower. Porsche 911 Has the highest horsepower of all the vehicles – 477 horsepower.

19 Influence – Cook’s D None of the vehicles has a value of Cook’s D that is greater than 1. The largest value of Cook’s D is 0.16 for the Toyota Prius 4dr (gas/electric). The second largest is 0.12 for the VW Jetta.

20 Influence Just because there are no vehicles with Cook’s D greater than 1, you should still look at the Studentized residuals.

21

22 Studentized Residual Vehicle Name Studentized Residual, r s Prob > | r s | Honda Civic HX 2dr Toyota Echo 2dr manual Toyota Prius 4dr (gas/electric) Volkswagen Jetta GLS TDI 4dr

23 Outliers Both the Toyota Prius and the Volkswagon Jetta have studentized residuals so extreme that they are considered statistically significant (P-value < ).

24 Summary Toyota Prius and Volkswagon Jetta are statistically significant outliers. Chevy Corvette and Porsche 911 are statistically significant high leverage values. Toyota Prius and Volkswagon Jetta exert statistically significant influence.

25

26 Prediction Equation All 100 vehicles Predicted Highway MPG = – 3.15*SUV – 3.28*Minivan – 2.08*All Wheel – 1.65*Engine – *Horsepower – *Weight *Wheel Base

27

28 Prediction Equation Excluding Prius and Jetta Predicted Highway MPG = – 2.54*SUV – 2.41*Minivan – 1.65*All Wheel – 1.15*Engine – *Horsepower – *Weight *Wheel Base

29 Comment Note that Engine has a P-value of and so is not significant at the 0.05 level. This suggests that there is a different “best” model if we exclude Prius and Jetta.

30 Comment A similar thing happens if you exclude the Porsche 911 and the Chevy Corvette.

31 Multicollinearity High correlation among explanatory variables is called multicollinearity. Multicollinearity causes standard errors of estimates to be larger than they should be.

32 Variance Inflation Factor A general measure of the effect of multicollinearity is the variance inflation factor, VIF.

33 Multiple R 2 is the value of R 2 among the k – 1 explanatory variables excluding explanatory variable i. There are k values of.

34 Variance Inflation Factor The VIF gives how much the variance of an estimate is inflated by multicollinearity. The square root of the VIF gives how much the standard error of an estimate is inflated by multicollinearity.

35 Multiple R 2 SUV excluded: Minivan excluded: All Wheel excluded: Engine excluded: Horsepower excluded: Weight excluded: Wheel Base excluded: 0.673

36 VIF i SUV: 2.19 Minivan: 1.44 All Wheel: 1.55 Engine: 4.52 Horsepower: 2.75 Weight: 7.46 Wheel Base: 3.06

37 Interpretation The VIF = 2.19 for SUV. This means that the standard error for SUV is 1.48 (the square root of 2.19) times bigger than it would be if SUV were uncorrelated with the other explanatory variables.

38 Interpretation The VIF = 7.46 for Weight. This means that the standard error for Weight is 2.73 (the square root of 7.46) times bigger than it would be if Weight were uncorrelated with the other explanatory variables.

39 Comment If the standard error is 2.73 times what it could be, then the t-statistic is 2.73 times smaller than it could be. The corresponding P-value would be less than