DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Review of Univariate Linear Regression BMTRY 726 3/4/14.
Guide to Using Minitab For Basic Statistical Applications To Accompany Business Statistics: A Decision Making Approach, 6th Ed. Chapter 14: Multiple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 6: Multiple Regression
Multiple Regression MARE 250 Dr. Jason Turner.
Lecture 24: Thurs., April 8th
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Regression Diagnostics Checking Assumptions and Data.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Ch 2 and 9.1 Relationships Between 2 Variables
Chapter 15: Model Building
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Multiple Regression Analysis
Lecture 12 Model Building BMTRY 701 Biostatistical Methods II.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression. Multiple Regression  Usually several variables influence the dependent variable  Example: income is influenced by years of education.
©2006 Thomson/South-Western 1 Chapter 14 – Multiple Linear Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western Concise.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Ch15: Multiple Regression 3 Nov 2011 BUSI275 Dr. Sean Ho HW7 due Tues Please download: 17-Hawlins.xls 17-Hawlins.xls.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y =  0 +  1 x 1 +  +  p x p +  Partial Regression.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
1 BUSI 6220 By Dr. Nick Evangelopoulos, © 2012 Brief overview of Linear Regression Models (Pre-MBA level)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Chapter 9 Multiple Linear Regression
Multiple Regression.
Chapter 15 – Multiple Linear Regression
Lecture 12 Model Building
Multiple Regression Chapter 14.
Regression Forecasting and Model Building
Presentation transcript:

DSCI 346 Yamasaki Lecture 6 Multiple Regression and Model Building

Multiple Regression y =  0 +  1 x 1 +  2 x 2 + …+  p x p So we will be using p different variables to predict y. Conceptually we will be doing the same thing as in simple linear regression, except having more variables. We will be approaching this topic mostly by example. Several of the concepts are also applicable to simple linear regression. 2DSCI 346 Lect 6 (15 pages)

Example We are trying to see which factors influence the total amount of dollars spent on medical care by a person with diabetes. Our outcome (dependent variable) is the amount spent (net pay) Our predictors (independent variables) will be Age, gender, Severity of Illness Score, whether or not the patient had a test for blood sugar level, whether or not the patient had a test for cholesterol level, whether or not the patient was on a blood pressure medication, and the percent of the severity of illness score that was attributable to each of the following disorders: metabolic, musculoskeletal, psychiatric, respiratory, diabetes, neoplasms, cardiovascular 3DSCI 346 Lect 6 (15 pages)

4 Normality and Transformations Do a histogram of the net pay variable Not normal, transform variable by taking ln (base e logarithm) of net pay

DSCI 346 Lect 6 (15 pages)5 Normality and Transformations Do a histogram of the severity of illness variable Not normal, transform variable by taking ln (base e logarithm)

DSCI 346 Lect 6 (15 pages)6 Do a scattergram of ln net pay vs ln severity, look for non-linear pattern Do scattergram of ln net pay vs age Checking model

Other data transfomations Variables that yes/no type of variables are transformed into 1/0 variables (e.g. gender variable becomes female =1 and male = 0 since in regression models all variables must be numeric 7DSCI 346 Lect 6 (15 pages)

Multicollinearity Multicollinearity exists when independent variables are correlated with each other and therefore have some redundancy with regard to the information they provide in explaining the variation in the independent variable. One major impact of multicollinearity is that the significance tests of independent variables is not accurate. 8DSCI 346 Lect 6 (15 pages)

Multicollinearity One way to measure multicollinearity is called the Variance Inflation Factor (VIF) VIF (for each independent variable) = 1/(1-R 2 j ) where R 2 j is the coefficient of determination when the j th independent variable is regressed against the remaining k-1 independent variables VIFs > 5.0 indicate issues with multicollinearity Example (Birthweight data) VariableR 2 j VIF j Multicollinearity Age No LWT No FTV No 9DSCI 346 Lect 6 (15 pages)

Interactions Sometimes the affect of one of the variables is impacted by the value of another variable. To model this another variable is created by multiplying the variables together. For example, if you believed that the impact of the severity of illness was different for females than for males you could create an interaction variable that was created by multiplying the gender variable by the severity variable. In this example, I believed that the impact of the blood sugar test, the cholesterol test and the use of blood pressure medication was impacted by the severity so I created three interaction terms 10DSCI 346 Lect 6 (15 pages)

11 Fit model and check for outliers by looking at standardized residuals (residuals that have been transformed so they have standard normal distribution)

Remove outliers by deleting those data points and refit model Check again for severe outliers, delete data points and repeat until no more severe outliers 12DSCI 346 Lect 6 (15 pages)

Model fitting and variable selection Backwards selection You want a significant fit and all the variables to be significant and a reasonable adjusted R 2 Significant fit since pvalue <.05 Not all variables significant 13DSCI 346 Lect 6 (15 pages)

Remove least significant predictor, refit and repeat until all variables are significant 14DSCI 346 Lect 6 (15 pages)

15 Check model fit using residuals Underpredicting high values and overpredicting low values

Other model building strategies Forward selection (Start with independent with highest R 2, add in second significant independent that creates the highest model R 2, etc) Best possible subset (Do all possible combinations, then choose the best. Issue is best is not universal defined; common criteria include Adjusted R 2, Mallows C p, all independent variables significant, makes sense, etc) DSCI 346 Lect 6 (15 pages)16