Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables.

Similar presentations


Presentation on theme: "Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables."— Presentation transcript:

1 Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables in the model we define dummy variables. A dummy variable takes only two values, 0 and 1. In categorical variable with j categories we need j-1 indictor variables. STA302/ week 12

2 Meadowfoam Example Meadowfoam is a small plant found in the US Pacific Northwest. Its seed oil is unique among vegetable oils for its long carbon strings, and it is nongreasy and highly stable. A study was conducted to find out how to elevate meadowfoam production to a profitable crop. In a growth chamber, plants were grown under 6 light intensities (in micromol/m^2/sec) and two timings of the onset of the light treatment, either late (coded 0) or early (coded 1). The response variable is the average number of flowers per plant for 10 seedlings grown under each of the 12 treatment conditions. This is an example of an experiment in which we can make causal conclusions. There are two explanatory variables, light intensity and timing. There are 24 data points, 2 at each treatment combination. STA302/ week 12

3 Question of Interests What is the effect of timing on the seedling growth? What are the effects of the different light intensity? Does the effect of intensity depend on timing? STA302/ week 12

4 Indicator Variables in Meadowfoam Example
To include the variable time in the model we define a dummy variable that takes the value 1 if early timing and the value 0 if late timing. The variable intensity has 6 levels (150, 300, 450, 600, 750, 900). We will treat these levels as 6 categories. It is useful to do so if we expect a complex relationship between response variable and intensity and if the goal is to determine which intensity level is “best”. The cost in using dummy variables is degrees of freedom since we need multiple dummy variables for each of the multiple categories. We define the dummy variables as follows…. STA302/ week 12

5 Partial F-test Partial F-test is designed to test whether a subset of β’s are 0 simultaneously. The approach has two steps. First we fit a model with all predictor variables. We call this model the “full model”. Then we fit a model without the predictor variables whose coefficients we are interested in testing. We call this model the “reduced model”. We then compare the SSReg and RSS in these two models…. STA302/ week 12

6 Test Statistic for Partial F-test
To test whether some of the coefficients of the explanatory variables are all 0 we use the following test statistic: . Where Extra SS = RSSred - RSSfull, and Extra df = number of parameters being tested. To get the Extr SS in SAS we can simply fit two regressions (reduced and full) or we can look at Type I SS which are also called Sequential Sum of Squares. The Sequential SS gives the additional contribution to SSR each variable gives over and above variables previously listed. The Sequential SS depends on which order variables are stated in model statement; the variables whose coefficients we want to test must be listed last. STA302/ week 12

7 Meadowfoam Example Continuation
Suppose now we treat the variable light intensity as a quantitative variable. There are three possible models to look at the relationship between seedling growth and the two predictor variables… If we want to know whether the effect of light intensity on number of flowers per plant depends on timing we need to include in the model an interaction term…. STA302/ week 12

8 Meadowfoam Example – Summary of Findings
There is no evidence that the effect of light intensity on flowers depends on timing (P-value = 0.91). That means that the interaction effect is not significant. If interaction did exist, it is difficult to talk about the effect of light intensity on Y, as it varies with timing. Since the interaction was not significant, we remove it from the model. For same timing, increasing light intensity by 100 micromol/m2/sec decreases the mean number of flower per plant by 4.0 flowers / per plant. 95% CI: (-5.1, -3) For same light intensity, beginning the light treatment early increases the mean number of flowers per plant by 12.2 flowers / plants. 95% CI (6.7, 17.6). STA302/ week 12


Download ppt "Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables."

Similar presentations


Ads by Google