Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables.

Slides:



Advertisements
Similar presentations
1-Way Analysis of Variance
Advertisements

Topic 12: Multiple Linear Regression
Chapter 7: Multiple Regression II Ayona Chatterjee Spring 2008 Math 4813/5813.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Generalized Linear Models (GLM)
Chapter 14 Introduction to Multiple Regression
Lecture 23: Tues., Dec. 2 Today: Thursday:
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Chapter 11 Multiple Regression.
Multiple Linear Regression
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
 Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables. 
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
STA302/ week 111 Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
Chapter 14 Introduction to Multiple Regression
STA302/ week 911 Multiple Regression A multiple regression model is a model that has more than one explanatory variable in it. Some of the reasons.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Multivariate Models Analysis of Variance and Regression Using Dummy Variables.
Regression Models for Quantitative (Numeric) and Qualitative (Categorical) Predictors KNNL – Chapter 8.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Week 101 ANOVA F Test in Multiple Regression In multiple regression, the ANOVA F test is designed to test the following hypothesis: This test aims to assess.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Week of March 23 Partial correlations Semipartial correlations
1 G Lect 10M Contrasting coefficients: a review ANOVA and Regression software Interactions of categorical predictors Type I, II, and III sums of.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
Chapter 15 Multiple Regression and Model Building
Decomposition of Sum of Squares
CHAPTER 7 Linear Correlation & Regression Methods
Meadowfoam Example Continuation
Analysis of Variance in Matrix form
Multiple Regression Analysis and Model Building
John Loucks St. Edward’s University . SLIDES . BY.
Least Squares ANOVA & ANCOV
Multiple Regression II
Analysis of Variance and Regression Using Dummy Variables
CHAPTER 29: Multiple Regression*
Multiple Regression II
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana,
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Multiple Linear Regression
Regression and Categorical Predictors
Multicollinearity Multicollinearity occurs when explanatory variables are highly correlated, in which case, it is difficult or impossible to measure their.
General Linear Regression
Decomposition of Sum of Squares
Presentation transcript:

Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables in the model we define dummy variables. A dummy variable takes only two values, 0 and 1. In categorical variable with j categories we need j-1 indictor variables. STA302/1001 - week 12

Meadowfoam Example Meadowfoam is a small plant found in the US Pacific Northwest. Its seed oil is unique among vegetable oils for its long carbon strings, and it is nongreasy and highly stable. A study was conducted to find out how to elevate meadowfoam production to a profitable crop. In a growth chamber, plants were grown under 6 light intensities (in micromol/m^2/sec) and two timings of the onset of the light treatment, either late (coded 0) or early (coded 1). The response variable is the average number of flowers per plant for 10 seedlings grown under each of the 12 treatment conditions. This is an example of an experiment in which we can make causal conclusions. There are two explanatory variables, light intensity and timing. There are 24 data points, 2 at each treatment combination. STA302/1001 - week 12

Question of Interests What is the effect of timing on the seedling growth? What are the effects of the different light intensity? Does the effect of intensity depend on timing? STA302/1001 - week 12

Indicator Variables in Meadowfoam Example To include the variable time in the model we define a dummy variable that takes the value 1 if early timing and the value 0 if late timing. The variable intensity has 6 levels (150, 300, 450, 600, 750, 900). We will treat these levels as 6 categories. It is useful to do so if we expect a complex relationship between response variable and intensity and if the goal is to determine which intensity level is “best”. The cost in using dummy variables is degrees of freedom since we need multiple dummy variables for each of the multiple categories. We define the dummy variables as follows…. STA302/1001 - week 12

Partial F-test Partial F-test is designed to test whether a subset of β’s are 0 simultaneously. The approach has two steps. First we fit a model with all predictor variables. We call this model the “full model”. Then we fit a model without the predictor variables whose coefficients we are interested in testing. We call this model the “reduced model”. We then compare the SSReg and RSS in these two models…. STA302/1001 - week 12

Test Statistic for Partial F-test To test whether some of the coefficients of the explanatory variables are all 0 we use the following test statistic: . Where Extra SS = RSSred - RSSfull, and Extra df = number of parameters being tested. To get the Extr SS in SAS we can simply fit two regressions (reduced and full) or we can look at Type I SS which are also called Sequential Sum of Squares. The Sequential SS gives the additional contribution to SSR each variable gives over and above variables previously listed. The Sequential SS depends on which order variables are stated in model statement; the variables whose coefficients we want to test must be listed last. STA302/1001 - week 12

Meadowfoam Example Continuation Suppose now we treat the variable light intensity as a quantitative variable. There are three possible models to look at the relationship between seedling growth and the two predictor variables… If we want to know whether the effect of light intensity on number of flowers per plant depends on timing we need to include in the model an interaction term…. STA302/1001 - week 12

Meadowfoam Example – Summary of Findings There is no evidence that the effect of light intensity on flowers depends on timing (P-value = 0.91). That means that the interaction effect is not significant. If interaction did exist, it is difficult to talk about the effect of light intensity on Y, as it varies with timing. Since the interaction was not significant, we remove it from the model. For same timing, increasing light intensity by 100 micromol/m2/sec decreases the mean number of flower per plant by 4.0 flowers / per plant. 95% CI: (-5.1, -3) For same light intensity, beginning the light treatment early increases the mean number of flowers per plant by 12.2 flowers / plants. 95% CI (6.7, 17.6). STA302/1001 - week 12