Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.

Slides:



Advertisements
Similar presentations
3.3 Hypothesis Testing in Multiple Linear Regression
Advertisements

Topic 12: Multiple Linear Regression
Chapter 12 Simple Linear Regression
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 14 Introduction to Multiple Regression
Chapter 12 Simple Regression
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Further Inference in the Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Lecture 23 Multiple Regression (Sections )
Multiple Regression and Correlation Analysis
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Objectives of Multiple Regression
Chapter 12 Multiple Regression and Model Building.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Multiple Regression
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Review Session Linear Regression. Correlation Pearson’s r –Measures the strength and type of a relationship between the x and y variables –Ranges from.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Multiple Regression Reference: Chapter 18 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
1 Multiple Regression. 2 Model There are many explanatory variables or independent variables x 1, x 2,…,x p that are linear related to the response variable.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Multiple Regression.
Decomposition of Sum of Squares
Meadowfoam Example Continuation
Analysis of Variance in Matrix form
Multiple Regression Analysis and Model Building
CHAPTER 29: Multiple Regression*
Multiple Regression.
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana,
24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)
Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables.
Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their.
Presentation transcript:

week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess whether or not the model have any predictive ability. The test statistics is If H 0 is true, the above test statistics has an F distribution with k, n-k-1 degrees of freedom.

week 102 F-Test versus t-Tests in Multiple Regression In multiple regression, the F test is designed to test the overall model while the t tests are designed to test individual coefficients. If the F-test is significant and all or some of the t-tests are significant, then there are some useful explanatory variables for predicting Y. If the F-test is not significant (large P-value), and all the t-tests are not significant, it means that no explanatory variable contribute to the prediction of Y. If the F-test is significant and all the t-tests are not significant, then it is an indication of “multicolinearity” – i.e., correlated X’s. It means that individual X’s don’t contribute to the prediction of Y over and above other X’s.

week 103 If the F-test is not significant and some of the t-tests are significant, it is an indication of one of two things:  The model has no predictive ability but if there are many predictors, we can expect to get some type I errors in t-tests.  Predictors were chosen poorly. If one useful predictor is added to many that are unrelated to the outcome its contribution may not be enough for model to have statistically significant predictive ability.

week 104 CIs and Pls in Multiple Regression The standard error of the estimate of the mean value of Y at new values of the explanatory variables (X h ) is: 100(1-α)% CI for the mean value of Y at X h is: The standard error of the predicted value of Y at new values of the explanatory variables (X h ) is: 100(1-α)% CI for the predicted value of Y at X h is:

week 105 Example Consider the house prices example. Suppose we are interested in predicting the price of a house with 2 bdr, 750 sqft, 1 fp, 5 rms, storm windows (st=1), 25 foot lot, 1.5 baths and a 1 car garage. Then X h is ….

week 106 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their individual influence on the response. The fitted regression equation is unstable. The estimated regression coefficients vary widely from data set to data set (even if data sets are very similar) and depending on which predictor variables are in the model. The estimated regression coefficients may even have opposite sign than what is expected (e.g, bedroom in house price example).

week 107 The regression coefficients may not be statistically significant from 0 even when corresponding explanatory variable is known to have a relationship with the response. When some X’s are perfectly correlated, we can’t estimate β because X’X is singular. Even if X’X is close to singular, its determinant will be close to 0 and the standard errors of estimated coefficients will be large.

week 108 Assessing Multicollinearity To asses multicolinearity we calculate the Variance Inflation Factor for each of the predictor variables in the model. The variance inflation factor for the i th predictor variable is defined as where is the coefficient of multiple determination obtained when the i th predictor variable is regressed against other predictor variables. Large value of VIF i is a sign of multicollinearity.

week 109 Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables in the model we define dummy variables. A dummy variable takes only two values, 0 and 1. In categorical variable with j categories we need j-1 indictor variables.

week 1010 Example Meadowfoam is a small plant found in the US Pacific Northwest. Its seed oil is unique among vegetable oils for its long carbon strings, and it is nongreasy and highly stable.A study was conducted to find out how to elevate meadowfoam production to a profitable crop. In a growth chamber, plants were grown under 6 light intensities (in micromol/m^2/sec) and two timings of the onset of the light treatment, either late (coded 0) or early (coded 1). The response variable is the average number of flowers per plant for 10 seedlings grown under each of the 12 treatment conditions. This is an example of an experiment in which we can make causal conclusions. There are two explanatory variables, light intensity and timing. There are 24 data points, 2 at each treatment combination.

week 1011 Question of Interests What is the effect of timing on the seedling growth? What are the effects of the different light intensity? Does the effect of intensity depend on timing?

week 1012 Indicator Variables in Meadowfoam Example To include the variable time in the model we define a dummy variable that takes the value 1 if early timing and the value 0 if late timing. The variable intensity has 6 levels (150, 300, 450, 600, 750, 900). We will treat these levels as 6 categories. It is useful to do so if we expect a complex relationship between response variable and intensity and if the goal is to determine which intensity level is “best”. The cost in using dummy variables is degrees of freedom since we need multiple dummy variables for each of the multiple categories. We define the dummy variables as follows….

week 1013 Partial F-test Partial F-test is designed to test whether a subset of β’s are 0 simultaneously. The approach has two steps. First we fit a model with all predictor variables. We call this model the “full model”. Then we fit a model without the predictor variables whose coefficients we are interested in testing. We call this model the “reduced model”. We then compare the SSR and SSE in these two models….

week 1014 Test Statistic for Partial F-test To test whether some of the coefficients of the explanatory variables are all 0 we use the following test statistic:. Where Extra SS = SSEred - SSEfull, and Extra df = number of parameters being tested. To get the Extr SS in SAS we can simply fit two regressions (reduced and full) or we can look at Type I SS which are also called Sequential Sum of Squares. The Sequential SS gives the additional contribution to SSR each variable gives over and above variables previously listed. The Sequential SS depends on which order variables are stated in model statement; the variables whose coefficients we want to test must be listed last.

week 1015 Partial Correlation Recall, for simple regression, the correlation between X and Y is When considering the reduced/full model when the full model has only 1 additional predictor variable, the coefficient of partial correlation is It is negative if the coefficient of the additional predictor variable is negative and positive otherwise. It is a measure of the contribution of the additional predictor variable, given that the others are in the model.