1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Chapter 5 Multiple Linear Regression
Chapter Outline 3.1 Introduction
Probability & Statistical Inference Lecture 9
12-1 Multiple Linear Regression Models Introduction Many applications of regression analysis involve situations in which there are more than.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
12 Multiple Linear Regression CHAPTER OUTLINE
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Specific to General Modelling The traditional approach to econometrics modelling was as follows: 1.Start with an equation based on economic theory. 2.Estimate.
Chapter 4 Multiple Regression.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 Chapter 1 Introduction Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Chapter 15: Model Building
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Regression Model Building
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Selecting Variables and Avoiding Pitfalls Chapters 6 and 7.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Business Statistics, 4e, by Ken Black. © 2003 John Wiley & Sons Business Statistics, 4e by Ken Black Chapter 15 Building Multiple Regression Models.
Chapter 8 Curve Fitting.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
MARE 250 Dr. Jason Turner Multiple Regression. y Linear Regression y = b 0 + b 1 x y = dependent variable b 0 + b 1 = are constants b 0 = y intercept.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
BUILDING THE REGRESSION MODEL Data preparation Variable reduction Model Selection Model validation Procedures for variable reduction 1 Building the Regression.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chap 6 Further Inference in the Multiple Regression Model
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Specification: Choosing the Independent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Canadian Bioinformatics Workshops
Model selection and model building. Model selection Selection of predictor variables.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
Validation of Regression Models
Business Statistics, 4e by Ken Black
CHAPTER 29: Multiple Regression*
Regression Model Building
Regression Model Building
Simple Linear Regression
Tutorial 1: Misspecification
Lecture 20 Last Lecture: Effect of adding or deleting a variable
Chapter 11 Variable Selection Procedures
Business Statistics, 4e by Ken Black
Presentation transcript:

1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung

2 9.1 Introduction The Model-Building Problem Ensure that the function form of the model is correct and that the underlying assumptions are not violated. A pool of candidate regressors Variable selection problem Two conflicting objectives: –Include as many regressors as possible: the information content in these factors can influence the predicted values, y

3 –Include as few regressors as possible: the variance of the prediction increases as the number of the regressors increases “Best” regression equation??? Several algorithms can be used for variable selection, but these procedures frequently specify different subsets of the candidate regressors as best. An idealized setting: –The correct functional forms of regressors are known. –No outliers or influential observations

4 Residual analysis Iterative approach: 1.A variable selection strategy 2.Check the correct functional forms, outliers and influential observations None of the variable selection procedures are guaranteed to produce the best regression equation for a given data set.

Consequences of Model Misspecification The full model The subset model

6

7

8

9 Motivation for variable selection: –Deleting variables from the model can improve the precision of parameter estimates. This is also true for the variance of predicted response. –Deleting variable from the model will introduce the bias. –However, if the deleted variables have small effects, the MSE of the biased estimates will be less than the variance of the unbiased estimates.

Criteria for Evaluating Subset Regression Models Coefficient of Multiple Determination:

11 –Aitkin (1974) : R 2 -adequate subset: the subset regressor variables produce R 2 > R 2 0

12

13

14

15

16

17 Uses of Regression and Model Evaluation Criteria –Data description: Minimize SS Res and as few regressors as possible –Prediction and estimation: Minimize the mean square error of prediction. Use PRESS statistic –Parameter estimation: Chapter 10 –Control: minimize the standard errors of the regression coefficients.

Computational Techniques for Variable Selection All Possible Regressions Fit all possible regression equations, and then select the best one by some suitable criterions. Assume the model includes the intercept term If there are K candidate regressors, there are 2 K total equations to be estimated and examined.

19 Example 9.1 The Hald Cement Data

20

21 R 2 p criterion:

22

23

24

25

26

27

28

Stepwise Regression Methods Three broad categories: 1.Forward selection 2.Backward elimination 3.Stepwise regression

30

31 Backward elimination –Start with a model with all K candidate regressors. –The partial F-statistic is computed for each regressor, and drop a regressor which has the smallest F-statistic and < F OUT. –Stop when all partial F-statistics > F OUT.

32 Stepwise Regression A modification of forward selection. A regressor added at an earlier step may be redundant. Hence this variable should be dropped from the model. Two cutoff values: F OUT and F IN Usually choose F IN > F OUT : more difficult to add a regressor than to delete one.