1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There.

Slides:



Advertisements
Similar presentations
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Advertisements

Polynomial Regression and Transformations STA 671 Summer 2008.
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
Multiple Linear Regression
2004 Car Prices Scott Ning Sean Meade. Predict vehicle retail price Sports Car Sport Utility Vehicle Wagon Minivan Pickup All-Wheel Drive Rear-Wheel Drive.
1 Home Gas Consumption Interaction? Should there be a different slope for the relationship between Gas and Temp after insulation than before insulation?
1 Multiple Regression Response, Y (numerical) Explanatory variables, X 1, X 2, …X k (numerical) New explanatory variables can be created from existing.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Beyond Gasoline: Fuel Economy. CAFE Standards Corporate Average Fuel Economy Mileage requirements for new vehicles.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Stat 112: Lecture 8 Notes Homework 2: Due on Thursday Assessing Quality of Prediction (Chapter 3.5.3) Comparing Two Regression Models (Chapter 4.4) Prediction.
Lecture 24: Thurs., April 8th
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Chapter 15: Model Building
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
The Gas Guzzling Luxurious Cars Tony Dapontes and Danielle Sarlo.
Regression Model Building
Correlation & Regression
Car Project Andrew Waldecker Chevy Silverado Extended Cab (Used) 2 Wheel drive 4.3 liter V6 engine 4 speed automatic transmission.
In the above correlation matrix, what is the strongest correlation? What does the sign of the correlation tell us? Does this correlation allow us to say.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Linear Regression. Determine if there is a linear correlation between horsepower and fuel consumption for these five vehicles by creating a scatter plot.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Answers to Part 2 Review Pg #1  % over 50 – r=.69  % under 20 – r=-.71  Full time Fac – r=.09  Gr. On time-r=-.51.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan.
Stat 112 Notes 6 Today: –Chapter 4.1 (Introduction to Multiple Regression)
Hybrid CarsHybrid Cars. What is a hybrid?What is a hybrid?  Hybrid electric vehicles (HEVs) include both a combustion engine as well as an electric motor.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
1 Simple Linear Regression Example - mammals Response variable: gestation (length of pregnancy) days Explanatory: brain weight.
Stat 112 Notes 8 Today: –Chapters 4.3 (Assessing the Fit of a Regression Model) –Chapter 4.4 (Comparing Two Regression Models) –Chapter 4.5 (Prediction.
Model selection and model building. Model selection Selection of predictor variables.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 8 Linear Regression.
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
BUSI 410 Business Analytics
Simple Linear Regression
Introduction to the Automobile
Chapter 6: Multiple Linear Regression
Multiple Regression BPS 7e Chapter 29 © 2015 W. H. Freeman and Company.
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Hypothesis testing and Estimation
Regression Model Building
Regression Model Building
Regression is the Most Used and Most Abused Technique in Statistics
Indicator Variables Response: Highway MPG
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Chapter 3.2 Regression Wisdom.
Chapter 9 Regression Wisdom.
Presentation transcript:

1 Model Selection Response: Highway MPG Explanatory: 13 explanatory variables Indicator variables for types of car – Sports Car, SUV, Wagon, Minivan There is an indicator for Pickup but there are no pickups in the data.

2 Indicator Variables The indicator variable takes on the value 1 if it is that kind of vehicle and 0 otherwise. If all four indicator variables are 0, then the vehicle is a Sedan.

3 Explanatory Variables Indicator variables for All Wheel and Rear Wheel drive. If both indicator variables are 0, then the vehicle has Front Wheel drive.

4 Explanatory Variables Engine size (liters) Cylinders (number) Horsepower Weight (pounds) Wheel Base (inches) Length (inches) Width (inches)

5 Forward Selection Fit Model – Personality: Stepwise Y, Response – Highway MPG Put all 13 variables into the Construct Model Effects box. Click on Run Model

6 Stepwise Fit Stopping Rule: P-value Threshold Prob to Enter = Prob to Leave = Direction: Forward Click on Go

7

8 Forward Selection Three variables are added Weight Horsepower Wheel Base All variables added are still statistically significant.

9 Forward Selection Model with Weight, Horsepower and Wheel Base. R 2 = , adj R 2 = RMSE = AICc = , BIC = C p =

10 Stepwise Fit Stopping Rule: P-value Threshold Prob to Enter = Prob to Leave = Direction: Backward Enter All Click on Go

11

12 Backward Selection Eight variables are removed Length, Rear Wheel, Wagon, Width, Engine, Wheel Base, Weight, Sports Car. All variables left are statistically significant.

13 Backward Selection Model with SUV, Minivan, All Wheel, Cylinders and Horsepower. R 2 = , adj R 2 = RMSE = AICc = , BIC = C p =

14 Backward Selection The final model from Backward selection is better than the final model from Forward selection. It has a higher R 2 value, higher adj R 2 value, lower RMSE, AICc, BIC and C p value.

15 Mixed Selection (Forward) Stopping Rule: P-value Threshold Prob to Enter = Prob to Leave = Direction: Mixed Click on Go

16

17 Mixed Selection (Forward) Three variables are added Weight Horsepower Wheel Base No variables are removed. This is the same as with Forward Selection.

18 Mixed Selection (Backward) Stopping Rule: P-value Threshold Prob to Enter = Prob to Leave = Direction: Mixed Enter All Click on Go

19

20 Mixed Selection (Backward) Eight variables are removed Length, Rear Wheel, Wagon, Width, Engine, Wheel Base, Weight, Sports Car. No variables are added. This is the same as with Backward Selection.

21 All Possible Models 2 13 – 1 = 8191 models possible. 1-variable models – listed in order of the R 2 value. 2-variable models – listed in order of the R 2 value. etc. 13-variable (full) model.

22 All Possible Models Can specify the maximum number of variables in a model. Can specify the maximum number of models displayed for each number of variables.

23 All Possible Models Model with all 13 variables has the highest R 2 value. R 2 = Is this the “best” model? No, several variables are not statistically significant.

24 All Possible Models Model with 7 variables has the lowest RMSE value. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight RMSE =

25 Model with lowest RMSE Is this the “best” model? No, several variables are not statistically significant. Sports Car: F=3.847, P-value= Horsepower: F=3.761, P-value= Weight: F=3.653, P-value=0.0591

26 All Possible Models Model with 7 variables has the lowest C p value. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight C p = This is the same model as the one with the lowest RMSE.

27 All Possible Models Model with 7 variables has the lowest AICc and BIC values. Sports Car, SUV, Minivan, All Wheel, Cylinders, Horsepower, Weight AICc = , BIC = This is the same model as the one with the lowest RMSE and C p.

28 Strategies Start with the “best” 1-variable model. Find a 2-variable model that beats it. Find a 3-variable model that beats the “best” 2-variable model. Etc.

29 Strategies Start with the full (13-variable) model. Is it “best”? Go to the 12-variable models. Are any of these “best”? Etc.

30 “Best” Model The 7-variable model with SUV, Minivan, All Wheel, Engine, Horsepower, Weight and Wheel Base Appears to be the “best” model.

31 Prediction Equation Predicted Highway MPG = – 3.15*SUV – 3.28*Minivan – 2.08*All Wheel – 1.65*Engine – *Horsepower – *Weight *Wheel Base

32 Summary All variables add significantly. R 2 = 0.705, adj R 2 = RMSE = AICc = , BIC = C p =

33

34

35 Box Plot – Potential Outliers Vehicle Name Highway MPGPredicted MPGResidual Honda Civic HX 2dr Toyota Echo 2dr manual Toyota Prius 4dr (gas/electric) Volkswagen Jetta GLS TDI 4dr

36 Outlier How do we determine if a potential outlier identified on the box plot is statistically significant?

37 Unusual Points in Regression Outlier for Regression A point with an unusually large residual.

38 Unusual Points in Regression High leverage point. A point with an extreme value for one, or more, of the explanatory variables

39 Influential Points Does a point influence where the regression line goes? An outlier can. A high leverage point can. Are they statistically significant?