Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Slides:



Advertisements
Similar presentations
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 10 Simple Regression.
Statistics for Managers Using Microsoft® Excel 5th Edition
BA 555 Practical Business Analysis
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Multiple Regression
Lecture 22 Multiple Regression (Sections )
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Chapter 11 Multiple Regression.
Lecture 23 Multiple Regression (Sections )
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Simple Linear Regression Analysis
Go to Table of ContentTable of Content Analysis of Variance: Randomized Blocks Farrokh Alemi Ph.D. Kashif Haqqi M.D.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Analysis (2)
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Simple Linear Regression Models
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
Economics 173 Business Statistics Lecture 19 Fall, 2001© Professor J. Petry
Lecture 10: Correlation and Regression Model.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Economics 173 Business Statistics Lecture 18 Fall, 2001 Professor J. Petry
Multiple Regression Reference: Chapter 18 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 13 Simple Linear Regression
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Regression Analysis AGEC 784.
Inference for Least Squares Lines
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistics for Managers using Microsoft Excel 3rd Edition
Comparing Three or More Means
Multiple Regression Analysis and Model Building
Regression Analysis Simple Linear Regression
Multiple Regression.
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Multiple Regression Models
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

Additional Reading For additional reading see Chapter 15 and Chapter 14 in Michael R. Middleton’s Data Analysis Using Excel, Duxbury Thompson Publishers, Example described in this lecture is based in part on Chapter 17 and Chapter 18 of Keller and Warrack’s Statistics for Management and Economics. Fifth Edition, Duxbury Thompson Learning Publisher, 2000.

Objectives To learn the assumptions behind and the interpretation of multiple variable regression. To use Excel to calculate multiple regression. To test hypothesis using multiple regression.

Multiple Regression Model We assume that k independent variables are potentially related to the dependent variable using the following equation: The objective is to find such that the difference between y and is minimized if: ^ y ^ y

Similarity With Single Variable Regression Same method of finding best fit with data by minimizing sum of square of residuals Same assumptions regarding Normal distribution of residuals and constant standard deviation of residuals New issues related to finding optimal combination of variables that can predict response variable

Multiple Regression in Excel Arrange y and x variables as columns with each case as a row Select tools, data analysis, regression Enter the range for Y variable Enter the range for all X values Select output range and at a minimum select for output normal plot and residual plots

Example Examine which variable affects the profitability of health centers. Download dataDownload data Regress profit measure (profit divided by revenue) on: (1)Number of visits (2)Maximum distance among clinics in the center (3)Number of employers in the area (4)Percent of community enrolled in college (5)Median income of community in thousands (6)Distance to downtown

Regression Statistics 49% of variance in Y is explained by the regression equation

ANOVA for Regression Null hypothesis MSR is equal to MSE Alternative hypothesis MSR is greater than MSE F statistic is 17 with probability of 0.00 to be observed under null hypothesis Null hypothesis is rejected

Analysis of Coefficients Null hypotheses: Coefficients are zero Alternative hypothesis: coefficients are different from zero Are P values below 0.05? All null hypotheses are rejected except college enrollment and distance to downtown

Discussion of Direction of Coefficients One visit to the competitor decreases operating margin by 0.01 One more mile distance among clinics decreases operating margin by 1.65 One more employer in the community increases operating margin by 0.02 One thousand dollars more income decreases operating margin by 0.41

Check Assumptions 1.Does the residual have a Normal distribution? 2.Is variance of residuals is constant? 3.Are errors independent? 4.Are there observations that are inaccurate or do not belong to the target population?

Does the Residual Have a Normal Distribution? Plot the Normal Probability Plot It looks near Normal. The assumption seems reasonable

Is the Variance of Residuals Constant? Residuals seem randomly distributed Range of the residuals at particular values of visits to competitors seems similar

Is the Variance of Residuals Constant? Residuals seem randomly distributed Range of the residuals at particular values of visits to competitors seems similar

Is the Variance of Residuals Constant?

What if assumptions are violated? Consider non-linear regression (see options under trend-line) Transform the response variable, instead of Y use one of the following that best corrects the problem: Log of y y to power of a constant, e.g. Reciprocal of y or 1/y

Example of Violation of Assumption Suppose in regression of Y on X we observed the plot to the right Variance of residual depends on values of X

Correcting the Violation Create a new column named “transformed Y” which is the log of y Repeat regression The variability in the variance at different levels of x is reduced

What to Do If Variables Are Non Linear? Use nonlinear regression (see trend line command) Use Linear regression Transform the x variable and create a new column of data. Choose transformations based on the shape of the relationship you see in the data Use x to power of a constant Use log of x Use reciprocal of x Use the transformed column of data in the linear regression

Relationship Among Regression Components SSEFAssessment 001Perfect Small Close to 1 LargeGood Large Close to 0 SmallPoor 00Useless Number of observation is shown as n. Number of variables as k. The variation in Y is

Multicollinearity Problem in interpretation of regression coefficients when independent variables are correlated New Assumption Unique to Multiple Regression

Sample Problem Download data Construct a measure of severity of substance abuse to predict length of stay of patients in treatment programs. The more the severity the shorter the stay. 30 patients were followed and their length of stay as well as their scores on 10 co-morbidities were recorded. Higher score indicates more of the factor is present.

Correlations Between Response and Independent Variables Length of stay is related to individual variables. 4 to 46 percent of variance would be explained by a single variable. For example, regress length of stay on the teen or elderly variable. The R 2 explained is significant at alpha levels lower than 0.01.

Multiple Regression Note that adjusted R 2 measures the percent of variance in Y (length of stay) explained. 31% is explained by the linear combination of the variables.

ANOVA Statistics We cannot reject the hypothesis that variation explained by regression is higher than the variation in residuals. The regression model is not effective.

Test of coefficients Null hypothesis: Coefficient is zero Alternative hypothesis: coefficient is different from zero None of the null hypotheses can be rejected

But if we look at it in single variable regressions … Regress length of stay on teen or elderly variable Coefficient is statistically significant

Why would a single variable relationship disappear when looking at it in a multiple regression?

Explanation of Collinearity Collinearity exists when independent variables are correlated Collinearity increases sample variability and SSE Previously significant relationships no longer are significant when they enter into the equation with other collinear variables Conceptually the percent of variance explained by colinear independent variables is shared among the independent variables

Detection of Collinearity Exists in almost all situation except when full factorial designs are used to set up the experiment The key question is how much collinearity is too much A heuristic is that correlations above 0.30 are problematic

Correlations

How to Correct for Collinearity? Choose independent variable with low collinearity Use stepwise regression A procedure in which the most correlated variable is entered into the equation first, then remaining variance in Y is explained by the next most correlated variable and so on. In this procedure the order of entry of the variables matter.

Non Interval Independent Variables Some time the independent variable is measured on an ordinal or nominal scale e.g., gender To use regression assign 0 to absence of the variable and 1 to presence and use this indicator variable in your regression analysis If more than two levels use multiple indicator variables one for each level except for the reference level

Example of An Indicator Variable Type of clinician includes the following levels: psychiatrist, psychologist, counselor, social worker Use 3 indicator variables: Presence of psychiatrist Presence of psychologist Presence of counselor When all three are not present then it is assumed that the clinician was a social worker

Another Example of An Indicator Variable Patients diagnoses may be any of the following levels No MI MI MI with complications Use 2 indicator variables: Presence of MI Presence of MI with complications When both indicators are zero, then diagnoses is no MI

Test for Interactions Consider two variables x 1 and x 2. We had assumed Sometimes there are interactions in first order linear models so that we can look at Multiply x 1 column of data with x 2 column of data and put into a separate column

Example Anger and Abused is an independent variable A new variable is created named angry and abused which is the multiplication of these two variables Note that the new variable is 0 when any of the two components area zero

Regression with Interaction Terms Include the new column in the regression Not that being abused is not related to length of stay Note that being angry and abused is related

Test for Interactions (Continued) Previous example showed interaction term between two variables You can include interaction terms between any pair of variables But be careful not to have too many variables in the model Number of observations should be at least 3-4 times number of variables in a regression equation

Which Interactions to Include? Do not go fishing for interaction terms in the data by including all interactions until something significant is found Look at the underlying problem and think through if conceptually an interaction term makes sense

Take Home Lesson Multiple regression is similar to single variable regression in concept. Similar F test for regression. Similar t test for coefficients. Similar concept of. Test the assumptions that residuals have a Normal distribution, constant variance, and are independent. Test for collinearity. Test for interactions. Test if there are non-linear relationships