TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Multiple Regression and Model Building
Topic 12: Multiple Linear Regression
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Objectives 10.1 Simple linear regression
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 10 Simple Regression.
SIMPLE LINEAR REGRESSION
ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.
Multiple Regression Applications
More Simple Linear Regression 1. Variation 2 Remember to calculate the standard deviation of a variable we take each value and subtract off the mean and.
ANOVA Single Factor Models Single Factor Models. ANOVA ANOVA (ANalysis Of VAriance) is a natural extension used to compare the means more than 2 populations.
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
Lecture 5 Correlation and Regression
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Chapter 12: Analysis of Variance
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 One Way Analysis of Variance – Designed experiments usually involve comparisons among more than two means. – The use of Z or t tests with more than two.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
EQT 272 PROBABILITY AND STATISTICS
Econ 3790: Business and Economics Statistics
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
One-Way ANOVA ANOVA = Analysis of Variance This is a technique used to analyze the results of an experiment when you have more than two groups.
Chapter 13 Multiple Regression
Regression Analysis Relationship with one independent variable.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Essentials of Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly Copyright © 2014 by McGraw-Hill Higher Education. All rights.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 1 Slide © 2011 Cengage Learning Assumptions About the Error Term  1. The error  is a random variable with mean of zero. 2. The variance of , denoted.
1 Lecture 15 One Way Analysis of Variance  Designed experiments usually involve comparisons among more than two means.  The use of Z or t tests with.
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 19 Measure of Variation in the Simple Linear Regression Model (Data)Data.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Analysis of variance approach to regression analysis … an (alternative) approach to testing for a linear association.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 20 Linear and Multiple Regression
Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Statistics for Managers using Microsoft Excel 3rd Edition
BA 240 Yamasaki Solutions to Practice 5
Regression model with multiple predictors
Relationship with one independent variable
Quantitative Methods Simple Regression.
Econ 3790: Business and Economic Statistics
Comparing Several Means: ANOVA
More Multiple Regression
ANOVA Table Models can be evaluated by examining variability.
More Multiple Regression
Relationship with one independent variable
SIMPLE LINEAR REGRESSION
Chapter 10 – Part II Analysis of Variance
F test for Lack of Fit The lack of fit test..
Presentation transcript:

TESTING THE STRENGTH OF THE MULTIPLE REGRESSION MODEL

Test 1: Are Any of the x’s Useful in Predicting y? We are asking: Can we conclude at least one of the  ’s (other than  0 )  0? H 0 :  1 =  2 =  3 =  4 = 0 H A : At least one of these  ’s  0  =.05

Idea of the Test Measure the overall “average variability” due to changes in the x’s Measure the overall “average variability” that is due to randomness (error) IS A LOT LARGERIf the overall “average variability” due to changes in the x’s IS A LOT LARGER than “average variability” due to error, we conclude at least  is non-zero, i.e. at least one factor (x) is useful in predicting y

“Total Variability” Just like with simple linear regression we have total sum of squares due to regression SSR, and total sum of squares due to error, SSE, which are printed on the EXCEL output. –The formulas are a more complicated (they involve matrix operations)

“Average Variability” “Average variability” (Mean variability) for a group is defined as the Total Variability divided by the degrees of freedom associated with that group: Mean Squares Due to Regression MSR = SSR/DFR Mean Squares Due to Error MSE = SSE/DFE

Degrees of Freedom Total number of degrees of freedom DF(Total) always = n-1 Degrees of freedom for regression (DFR) = the number of factors in the regression (i.e. the number of x’s in the linear regression) Degrees of freedom for error (DFE) = difference between the two = DF(Total) -DFR

The F-Statistic The F-statistic is defined as the ratio of two measures of variability. Here, Recall we are saying if MSR is “large” compared to MSE, at least one β ≠ 0. Thus if F is “large”, we draw the conclusion is that H A is true, i.e. at least one β ≠ 0.

The F-test “Large” compared to what? F-tables give critical values for given values of  TEST: REJECT H 0 (Accept H A ) if: F = MSR/MSE > F ,DFR,DFE

RESULTS If we do not get a large F statistic –We cannot conclude that any of the variables in this model are significant in predicting y. If we do get a large F statistic –We can conclude at least one of the variables is significant for predicting y. –NATURAL QUESTION -- WHICH ONES?

DFR = #x’s DFE = Total DF- DFR Total DF = n-1 SSR SSE Total SS =  (y i - ) 2

MSR = SSR/DFR MSE = SSE/DFE F = MSR/MSE P-value for the F test

Results We see that the F statistic is This would be compared to F.05,3,34 –From the F.05 Table, the value of F.05,3,34 is not given. –But F.05,3,30 = 2.92 and F.05,3,40 = –And > either of these numbers. –The actual value of F.05,3,34 can be calculated by Excel by FINV(.05,3,34) = USE SIGNIFICANCE FUSE SIGNIFICANCE F p-value –This is the p-value for the F-Test –Significance F = 7.46 x = <.05 –Can conclude that at least one x is useful in predicting y

Test 2: Which Variables Are Significant IN THIS MODEL? The question we are asking is, “taking all the other factors (x’s) into consideration, does a change in a particular x (x 3, say) value significantly affect y. This is another hypothesis test (a t-test). To test if the age of the house is significant: in this model H 0 :  3 = 0 (x 3 is not significant in this model) in this model H A :  3  0 (x 3 is significant in this model)

The t-test for a particular factor IN THIS MODEL Reject H 0 (Accept H A ) if:

t-value for test of  3 = 0p-value for test of  3 = 0

Reading Printout for the t-test Simply look at the p-value –p-value for  3 = 0 is <.05 in this modelThus the age of the house is significant in this model The other variables –p-value for  1 = 0 is <.05 in this modelThus square feet is significant in this model –p-value for  2 = 0 is >.05 in this modelThus the land (acres) is not significant in this model

Does A Poor t-value Imply the Variable is not Useful in Predicting y? NO IN THIS MODELIt says the variable is not significant IN THIS MODEL when we consider all the other factors. In this model – land is not significant when included with square footage and age. But if we would have run this model without square footage we would have gotten the output on the next slide.

p-value for land is In this model Land is significant.

Can it even happen that F says at least one variable is significant, but none of the t’s indicate a useful variable? YES EXAMPLES IN WHICH THIS MIGHT HAPPEN: –Miles per gallon vs. horsepower and engine size –Salary vs. GPA and GPA in major –Income vs. age and experience –HOUSE PRICE vs. SQUARE FOOTAGE OF HOUSE AND LAND There is a relation between the x’s – –Multicollinearity

Approaches That Could Be Used When Multicollinearity Is Detected Eliminate some variables and run again Stepwise regression This is discussed in a future module.

Test 3 --What Proportion of the Overall Variability in y Is Due to Changes in the x’s? R2R2 R 2 = Overall 44% of the total variation in sales price is explained by changes in square footage, land, and age of the house.

What is Adjusted R 2 ? Adjusted R 2 adjusts R 2 to take into account degrees of freedom. By assuming a higher order equation for y, we can force the curve to fit this one set of data points in the model – eliminating much of the variability (See next slide). But this is not what is going on! R 2 might be higher – but adjusted R 2 might be much lower Adjusted R 2 takes this into account Adjusted R 2 = 1-MSE/SST

Scatterplot This is not what is really going on

Review Are any of the x’s useful in predicting y IN THIS MODEL –Look at p-value for F-test – Significance F –F = MSR/MSE would be compared to F ,DFR,DFE Which variables are significant in this model? –Look at p-values for the individual t-tests What proportion of the total variance in y can be explained by changes in the x’s? –R2–R2 –Adjusted R 2 takes into account the reduced degrees of freedom for the error term by including more terms in the model

1-regression equation 3- p-values for t-tests Which variables are significant in this model? 4- R 2 What proportion of y can be explained by changes in x? 4 Places to Look on Excel Printout 2- Significance F Are any variables useful?