1 Multiple Regression Interpretation. 2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and.

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Here we add more independent variables to the regression.
1 Multiple Regression Here we add more independent variables to the regression.
Objectives 10.1 Simple linear regression
Simple Linear Regression 1. 2 I want to start this section with a story. Imagine we take everyone in the class and line them up from shortest to tallest.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
1 One Tailed Tests Here we study the hypothesis test for the mean of a population when the alternative hypothesis is an inequality.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
1 Difference Between the Means of Two Populations.
1 More Regression Information. 2 3 On the previous slide I have an Excel regression output. The example is the pizza sales we saw before. The first thing.
1 Multiple Regression Here we add more independent variables to the regression. In this section I focus on sections 13.1, 13.2 and 13.4.
1 Qualitative Independent Variables Sometimes called Dummy Variables.
The Basics of Regression continued
SIMPLE LINEAR REGRESSION
Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 2000 LIND MASON MARCHAL 1-1 Chapter Twelve Multiple Regression and Correlation Analysis GOALS When.
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
1 T-test for the Mean of a Population: Unknown population standard deviation Here we will focus on two methods of hypothesis testing: the critical value.
1 Confidence Interval for Population Mean The case when the population standard deviation is unknown (the more common case).
Relationships Among Variables
Inferential Statistics
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
Example of Simple and Multiple Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
SIMPLE LINEAR REGRESSION
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
Introduction to Linear Regression and Correlation Analysis
Correlation and Linear Regression
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Anthony Greene1 Correlation The Association Between Variables.
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
QMS 6351 Statistics and Research Methods Regression Analysis: Testing for Significance Chapter 14 ( ) Chapter 15 (15.5) Prof. Vera Adamchik.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
ANOVA, Regression and Multiple Regression March
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Elementary Statistics
Quantitative Methods Simple Regression.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Statistical Inference about Regression
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Regression & Correlation (1)
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Presentation transcript:

1 Multiple Regression Interpretation

2 Correlation, Causation Think about a light switch and the light that is on the electrical circuit. If you and I collect data about someone flipping the switch and the lights going on and off we would be able to say that there is correlation from a statistical point of view. In fact, you and I know we can say something even stronger. We can say in this case there is causation. In the world of business (and other areas) we want to find relationships between variables. We would hope to find correlation and if we have a compelling theory maybe we could say we have causation.

3 Example Say we are interested in crop yield on a farm. What variables are correlated with crop yield? You and I know the amount of water has been shown to have an impact on yield, as has fertilizer and soil type, among other things. In a multiple regression setting, if y = yield, x1 = water amount, and x2 = amount of fertilizer, the a multiple regression would be of the form y = Bo +B1x1 + B2x2 + e and our estimated regression would be of the form y hat = bo +b1x1 + b2x2.

4 F Test In a multiple regression, a case of more than one x variable, we conduct a statistical test about the overall model. The basic idea is do all the x variables as a package have a relationship with the y variable? The null hypothesis is that there is no relationship and we write this in a shorthand notation as Ho: B1 = B2 = … =0. If this null hypothesis is true the equation for the line would mean the x’s do not have an influence on y. The alternative hypothesis is that at least one of the beta’s is not zero. Rejecting the null means that the x’s as a group are related to y. The test is performed with what is called the F test. From the sample of data we can calculate a number called the F statistic and use this value to perform the test. In our class we will have F calculated for us because it is a tedious calculation.

5 F Under the null hypothesis the F statistic we calculate from a sample has a distribution similar to the one shown. The F test here is a one tailed test. The farther to the right the statistic we get in the sample is, the more we are inclined to reject the null because extreme values are not very likely to occur under the null hypothesis. In practice we pick a level of significance and use a critical F to define the difference between accepting the null and rejecting the null.

6 F To pick the critical F we have two types of degrees of freedom to worry about. We have the numerator and the denominator degrees of freedom to calculate. They are called this because the F stat is a fraction. Numerator degrees of freedom = number of x’s, in general called p. Denominator degrees of freedom = n – p – 1, where n is the sample size. As an example, if n = 10 and p = 2 we would say the degrees of freedom are 2 and 7 where we start with the numerator value. You would see from a book (maybe page 672 of a stats book) the critical F is 4.74 when alpha is.05. Many times the book also has a table for alpha =.025 and.01. Area we make = alpha Critical F

7

8 F In our example here the critical F is If from the sample we get an F statistic that is greater than 4.74 we would reject the null and conclude the x’s as a package have a relationship with the variable y. On the previous slide is an example and the F stat is and so the null hypothesis would be rejected in that case. Area we make = alpha =.05 here 4.74 here

9 F P-value The computer printout has a number on it that means we do not even have to look at the F table if we do not want to. But, the idea is based on the table. Here you see is in the rejection region. I have colored in the tail area for this number. Since 4.74 has a tail area = alpha =.05 here, we know the tail area for must be less than.05. This tail area is the p-value for the test stat calculated from the sample and on the computer printout is labeled Significance F. In the example the value is Area we make = alpha =.05 here 4.74 here

10 SOOOOOOO, Using the F table, Reject the null if the F stat > critical F in the table, or If the Significance F < alpha. If you can NOT reject the null then at this stage of the game there is no relation between the x’s and the y and our work here would be done. So from here out I assume we have rejected the null. T tests After the F test we would do a t test on each of the slopes similar to what we did in a simple linear regression case to make sure that each variable on its own has a relationship with y. There we reject the null of a zero slope when the p-value on the slope is less than alpha.

11 Multicollinearity Can you say multicollinearity? Sure you can. Let’s all say it together on the count of 3. 1, 2, 3 multicollinearity! Very good class, now listen up! Multicollinearity is an idea that volumes have been written about. We want to have a basic feel for the problem here. You and I want x variables that help explain y. The reason is so that we can predict and explain movement in y. As an example, if we can predict and explain crop yield maybe we can make yield higher so that we can feed the world! So, we want x’s that are correlated with y. This is a good thing. But, sometimes the x’s will be correlated with each other. This is called multicollinearity. The problem here is that sometimes we can not see the separate influence an x has on y because the other x’s have picked up the influence due to their correlation.

12 From a practical point of view multicollinearity could have the following affect on your research. You reject the null hypothesis of no relationship between all the x variables and y with the F test, but you can not reject some or all of the separate t tests for the separate slopes. Don’t freak out (yet!). Let’s think about crop yield. Some farmers have water systems. The more it rains in a summer the less water the farmers directly apply. (Okay, maybe I am ignorant here and farmers here can use all the water they can apply – its an example.) If you included both inches of rain and water applied there is a correlation between the two. This may make it difficult to see the separate impact of either the rain or the water from the system. If the x’s (the independent variables) have correlations more extreme than.7 or -.7 then multicollinearity could be a problem

13 r square r square on the regression printout is a measure designed to indicate the strength of the impact of the x’s on y. The number can be between 0 and 1, with values closer to 1 meaning the stronger the relationship. r square is actually the percentage of the variation in y that is accounted for by the x variables. This is also an important idea because although we may have a significant relationship we may not be explaining much. From the yield example the more variation we can explain then the more we can control yield and thus feed the world, perhaps. Or maybe in business setting the more variation we can explain the more profit we can make.