Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Slides:



Advertisements
Similar presentations
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Advertisements

Week 12 November Four Mini-Lectures QMM 510 Fall 2014.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Multiple Linear Regression Model
Statistics for Managers Using Microsoft® Excel 5th Edition
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Statistics for Managers Using Microsoft® Excel 5th Edition
Intro to Statistics for the Behavioral Sciences PSYC 1900
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Predictive Analysis in Marketing Research
Topic 3: Regression.
Multiple Linear Regression
Regression Diagnostics Checking Assumptions and Data.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Multiple Regression Dr. Andy Field.
Relationships Among Variables
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Objectives of Multiple Regression
Descriptive Methods in Regression and Correlation
ECON 6012 Cost Benefit Analysis Memorial University of Newfoundland
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression with 2 IVs Generalization of Regression from 1 to 2 Independent Variables.
Regression Method.
Chapter 12 Multiple Regression and Model Building.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 14 Introduction to Multiple Regression
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Testing Hypotheses about Differences among Several Means.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Multiple Linear Regression ● For k>1 number of explanatory variables. e.g.: – Exam grades as function of time devoted to study, as well as SAT scores.
Chapter 16 Data Analysis: Testing for Associations.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Correlation & Regression Analysis
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Multiple Regression.
Multiple Regression.
Product moment correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared deviations from a plane. We will begin be focusing on Simultaneous Regression. The regression coefficients for each predictor is estimated while holding the other predictor variables constant. Thus, The slope for a particular predictor variable may change with the presence of different co-predictors or when used as a solitary predictor.

After testing bivariate assumptions, there may remain multivariate outliers. That is, outliers based on a combination of scores. For example being 6 feet tall will not make one an outlier, nor will being 120 pounds. Being 6 feet tall and being 120 lbs will make one an outlier. Distance: based on residuals, identifying outliers in the criterion Leverage: identifying outliers in the predictors (multivariate) Influence: combines distance and leverage to identify unusually influential observations. Cook”s D measures how much change there would be in the SS Slope would occur if that single observation was removed. (Is calculated for each observation). Tolerance: the degree to which a predictor can be predicted by the other predictors. Singular: occurs when a predictor is perfectly predictable from the other predictors.

Standardized regression Coefficients This is sometimes related to the question of the relative importance of the predictors. Remember, slope is sensitive to units of measurement. Thus, larger units produce smaller values than will the same angle but in smaller units of measurement. For example, if x is measured in seconds, the value of the slope will be smaller than if x were measured in minutes. Standardized Coefficients (beta) measure the change in the criterion (now measured in standard deviations) that is produced by a one standard deviation change in the predictor. Determining which predictor is more important is not merely a matter of comparing the betas, as some textbooks may suggest. There are theoretical and practical matters to be considered. Additionally, there is the matter of the variances found in the predictors across differing samples, i.e., the standard deviations may change.

Adding predictors may change the regression coefficients and the betas. is the multiple correlation coefficient. It measures the degree of association (-1.0 to 1.) between the criterion and the predictor variable taken simultaneously. is the coefficient of multiple determination. It indicates the percentage of the variance in the criterion variable accounted for by the predictors taken together. Adding additional predictor variables will never reduce the coefficient of multiple determination, but it doesn’t necessarily mean that the added predictors are either statistically or theoretically important. (An additional variable may fail to add to the coefficient of multiple determination only if it is uncorrelated to the criterion. Just as r is adjusted for n, so or can be adjusted for the number of predictor variables used in the regression.

Standard Error of the Estimated Coefficient is a measure of the variability that would be found among the different slopes estimated from other samples drawn from the same population. (n held constant)] This is analogous to the standard error of the mean and serves an analogous purpose. One way to look at the standard error of b is to see it as a measure of how sensitive the slope is to a change in a small number of data points from the sample.

A1A2 B1 B2 C1 C2 In the three panels (A,B,C) the data points are fixed From left to right except for the five larger points. In the cases of A and B, a few changes in data points produced large changes in the slopes. This is not the case with C. Why? The variability in the x variable is greater in the case of C. Furthermore, the correlation is greater in case of C. (this latter means that the standard error will be smaller.

Standard Error of the Slope with

In multiple regression we may wish to test a hypothesis concerning all of the predictors or some subset of the predictors, in addition to tests of the individual slopes. t-tests of individual coefficients, with all other predictors held constant F-tests of whether taken together the predictors are a significant predictor of the criterion F-tests of whether some subset of predictors is significant Example of strange outcomes. t-test of b1 may be non-significant t-test of b2 may be non-significant F-test of b1, b2 may be significant When two predictors are correlated, the standard errors of the coefficients are larger than they would be in the absence of the other predictor variable. Or k and p = # of predictors df = p(or k), (N-p(or k)-1)

Limitations of test of significance: both of individual predictors and the overall model If there are small differences in the betas of the various predictors, different patterns of significance may easily arise from another sample. The relative variabilities of the predictors may change. A significant beta does NOT necessarily mean that the variable is of theoretical or of practical importance. The issue of IMPORTANCE is a difficult one. The relative size of the betas is not always the solution. (For example, your ability to manipulate a variable may be as important an issue in practical terms.)

Difference between two K = # of variable in the regression

Types of Analysis: Data types Cross-sectional: cases represent different objects at one point in time Time-Series: same object and variables are tested over time - a lagged dependent variable (criterion) (value at previous time) can be used as an independent variable (predictor) Continuous versus dummy variables Dummy variables: categorical, binary, dichotomous (0 and 1). There may be more than two categories. For example there may be four categories. This would produce three dummy variables. Let us say that there are four types of people: A, B, C, and D. There would be three variables: A (yes/no, 0/1), B(0/1) C(0/1) and all zeros would make the fourth category a yes and is reflected in the intercept.

Interactions: derived variables Between two continuous variables or between one continuous and one dummy variable. If x1 is the continuous variable, then b1 tell us its effect on the criterion when x2 = 0. B1 + b3 will tell us the effect when x2 =1. B3 tell us the difference in the two slopes. For two continuous variable, the additional interaction term will indicate if the effect of x1 at low values of x2 is greater or less than its effect at higher values of x2. Remember, adding additional predictor variables, even interaction terms, can change the betas of all other predictors.

Basic issues If you omit a relevant variable(s) from your regression, the betas of the included variables will be at best unreliable and at worst invalid. If you include irrelevant predictor variable, the betas of the other relevant variables remain unbiased but, however, to the extent that this irrelevant variable is correlated with some of the other predictors, it will increase the size of the Standard Errors (reduce power). If the underlying function between one or more of the predictors and the criterion is something other than linear, then the betas will be biased and unreliable. This is one reason why it is important to look at all bivariate plots prior to the analysis.

Addressing Collinearilty Ideally, you should collect new data that is free of multiple collinearity. This usually requires an experimental design (creating true independent variables). This is usually not feasible or it would have been done in the first place. 1.Model Respecification: combining correlated variable through various techniques or choosing remove some. (Theoretical & Statistical) 2.Statistical Variable Selection a.Step-wise procedures: can be deceptive and often fails to maximize b.Examine all subsets: may reveal subsets with similar but the resulting solution may not fit with either the research question or the theoretical approach.