BUILDING THE REGRESSION MODEL Data preparation Variable reduction Model Selection Model validation Procedures for variable reduction 1 Building the Regression.

Slides:



Advertisements
Similar presentations
F-tests continued.
Advertisements

/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 13 Introduction to Linear Regression and Correlation Analysis
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Multiple Linear Regression
Regression Diagnostics Checking Assumptions and Data.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Linear Regression.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Chapter 13 Multiple Regression
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Statistics 350 Review. Today Today: Review Simple Linear Regression Simple linear regression model: Y i =  for i=1,2,…,n Distribution of errors.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 13 Simple Linear Regression
Chapter 15 Multiple Regression Model Building
Inference for Least Squares Lines
Regression Analysis Part D Model Building
Chow test.
CHAPTER 26: Inference for Regression
Multiple Regression Models
Product moment correlation
Adequacy of Linear Regression Models
Regression Forecasting and Model Building
Regression Analysis.
Adequacy of Linear Regression Models
Presentation transcript:

BUILDING THE REGRESSION MODEL Data preparation Variable reduction Model Selection Model validation Procedures for variable reduction 1 Building the Regression Model

 List independent variables that could coceivably be related to the dependent variable under study.  Some of the independent variables can be screened out. An independent variable: 1. May not be fundamental to the problem 2. May be subjected to large measurement errors, and/or 3. May effectively duplicate another independent variables in the list Independent variables that cannot be measured may either be deleted or replaced by proxy variables taht are highly correlated with them. 2 Building the Regression Model

 The number of cases to be collected depends on the size of the pool of independent variable (usually 6 to 10 cases for every variable in the pool  After the data have been collected, edit checks and plots should be performed to identify gross data errors as well as extreme outliers.  The formal modeling process can begin; a variety of diagnostics should be employed to identify important independent variables, the functional forms in which the independent variables should enter the regression model, and important relationships. 3 Building the Regression Model

Yes No Collect data Preliminary Checks on data quality Diagnostic for relationship and strong interaction Are remedial measure needed? Remedial measures 4 Building the Regression Model

 Selecting a few “good” subesets of X variables should include not only the potential independent variables in first-order form but also any needed quadratic and other curvature terms and any necessary interaction terms.  Several reason in reducing the independent variables: 1. A regression model with a large number of independent variables is difficult to maintain 2. Regression models with a limited number of independent variables are easier to work with and understand 3. The presence of many highly intercorrelated independent variables may add little to the predictive power of the model while substantially increasing the sampling variation of the regression coefficients, detracting from the the model’s descriptive abilities, and increasing the problem of roundoff errors 5 Building the Regression Model

 After successfully reducing the number of independent variables, select a small number of potential “good” regression models, each of which contains those independent variables that are known to be essential.  More detailed checks of curvature and interaction effects are desireble.  Diagnostic on residuals are needed in order to identify influential outlying observations, multicollinearity, etc 6 Building the Regression Model

 The final step in the model building process is to validat the selected regression model.  Model Validity refers to the stability and reasonableness of the regression coefficients, the plausibility and usability of the regression function, and the ability to generalize inferences drawn from the regression analysys. 7 Building the Regression Model

 Three basic ways of validating a regression model are: 1. Collection of new data to check the model and its predictive ability. 2. Comparison of results with theoritical expectations, earlier empirical results, and simulation results. 3. Use of a hold-out sample to check the model and its predictive ability 8 Building the Regression Model

The purpose of collecting new data is to be able to examine whether the regression model developed from the earlier data is still applicable for the new data. Some methods of examining the validity of the regression model against the new data :  Reestimate the model form chosen earlier using the new data then compared the estimated coeff and various characteristic of the fitted values to those of the regression model based on the earlier data.  Resestimate from the new data all of the “good” models had been considered to see if the selected regression model is the preferred model according to the new data.  Designed to calibrate the predictive capability of the selected regression model 9 Building the Regression Model

Data splitting  The preferred method to validate using the new data is neither practical nor feasible.  A reasonable alternative when the data set is large enough is to split the data into two sets, the model building set and the validation or prediction set.  This validation is called cross-validation 10 Building the Regression Model

A mean of measuring the actual predictive capability of the selected regression model is to use this model to predict each case in the new data set and then to calculate the mean of the squared prediction errors, denoted MSPR (mean squared prediction error): Where Y i : the value of the response in the ith validation case Ŷ i : the predictive value for the ith validation case based on the model building data set n* : the number cases in the validation data set If the MSPR is fairly close to MSE based on the regression fit to the model-building data set, then the MSE for the selected regression model is not seriously biased and gives an appropriate indication of the predictive ability of the model 11 Building the Regression Model

 Some procedures for variable reductions are: 1. Forward procedure 2. Backward procedure  Some criteria for comparing the regression models: R 2 p, MSE p, C p and PRESS p. Where P is the number of potensial parameters and the all- possible regressions approach assumes that the number of observations n exceeds the maximum number of potential parameters n > P, and 1 < p < P 12 Building the Regression Model

 An examination of the coefficient of multiple determination R 2 R 2 p = SSM p /SST= 1 – (SSE p /SST)  Where SST is constant for all possible regression  R 2 p varies inversely with SSE p and  R 2 p will be a maximum when all P-1 potential X variables are included in the regression model. 13 Building the Regression Model

 MSE p criterion is the adjusted coefficient of multiple determination R 2 a which takes the number of parameters in the model into account through the df.  Seek for min(MSE p ) 14 Building the Regression Model

 This criterion is concern with the total mean squared error of the n fitted values for each subset regression model.  The model which includes all P-1 potential X variables is assumed to have been carefully chosen so that MSE(X 1,..., X p ) is an unbiased estimator of σ2, and SSE p is the error sum of squares for the fitted subset regression model with p parameters. The C p formula is defined as follows:  C p criterion suggests to seek the small C p and its value is near p. 15 Building the Regression Model

 The PRESS (prediction sum of squares) selection criterion is based on the deleted residuals d i.  Models with small PRESS values are considered good candidate models 16 Building the Regression Model

Predicting survival in patients undergoing a particular type of liver operation: X 1 : blood cloting score X 2 : prognostic index, which includes the age of patient X 3 : enzyme function test score X 4 : liver function test score Y : survival time N = 54 patients Building the Regression Model 17