Download presentation
Presentation is loading. Please wait.
Published byBarnard Patterson Modified over 6 years ago
1
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana, Missouri and Ohio). Straight line model is not adequate – up to 12″ rainfall yield increases and then starts to decrease. A better model for this data is a quadratic model: Yield = β0 + β1∙rain + β2∙rain2 + ε. This is still a multiple linear regression model since it is linear in the β’s. However, we can not interpret individual coefficients, since we can’t change one variable while holding the other constant… STA302/ week 11
2
More on Rainfall Example
Examination of residuals (from quadratic model) versus year showed that perhaps there is a pattern of an increase over time. Fit a model with year… To assess whether yield’s relationship with rainfall depends on year we include an interaction term in the model… STA302/ week 11
3
Interaction Two predictor variables are said to interact if the effect that one of them has on the response depends on the value of the other. To include interaction term in a model we simply the have to take the product of the two predictor variables and include the resulting variable in the model and an additional predictor. Interaction terms should not routinely be added to the model. Why? We should add interaction terms when the question of interest has to do with interaction or we suspect interaction exists (e.g., from plot of residuals versus interaction term). If an interaction term for 2 predictor variables is in the model we should also include terms for predictor variables as well even if their coefficients are not statistically significant different from 0. STA302/ week 11
4
Indicator Variables Often, a data set will contain categorical variables which are potential predictor variables. To include these categorical variables in the model we define dummy variables. A dummy variable takes only two values, 0 and 1. In categorical variable with j categories we need j-1 indictor variables. STA302/ week 11
5
Meadowfoam Example Meadowfoam is a small plant found in the US Pacific Northwest. Its seed oil is unique among vegetable oils for its long carbon strings, and it is nongreasy and highly stable. A study was conducted to find out how to elevate meadowfoam production to a profitable crop. In a growth chamber, plants were grown under 6 light intensities (in micromol/m^2/sec) and two timings of the onset of the light treatment, either late (coded 0) or early (coded 1). The response variable is the average number of flowers per plant for 10 seedlings grown under each of the 12 treatment conditions. This is an example of an experiment in which we can make causal conclusions. There are two explanatory variables, light intensity and timing. There are 24 data points, 2 at each treatment combination. STA302/ week 11
6
Question of Interests What is the effect of timing on the seedling growth? What are the effects of the different light intensity? Does the effect of intensity depend on timing? STA302/ week 11
7
Indicator Variables in Meadowfoam Example
To include the variable time in the model we define a dummy variable that takes the value 1 if early timing and the value 0 if late timing. The variable intensity has 6 levels (150, 300, 450, 600, 750, 900). We will treat these levels as 6 categories. It is useful to do so if we expect a complex relationship between response variable and intensity and if the goal is to determine which intensity level is “best”. The cost in using dummy variables is degrees of freedom since we need multiple dummy variables for each of the multiple categories. We define the dummy variables as follows…. STA302/ week 11
8
Partial F-test Partial F-test is designed to test whether a subset of β’s are 0 simultaneously. The approach has two steps. First we fit a model with all predictor variables. We call this model the “full model”. Then we fit a model without the predictor variables whose coefficients we are interested in testing. We call this model the “reduced model”. We then compare the SSReg and RSS in these two models…. STA302/ week 11
9
Test Statistic for Partial F-test
To test whether some of the coefficients of the explanatory variables are all 0 we use the following test statistic: . Where Extra SS = RSSred - RSSfull, and Extra df = number of parameters being tested. To get the Extr SS in SAS we can simply fit two regressions (reduced and full) or we can look at Type I SS which are also called Sequential Sum of Squares. The Sequential SS gives the additional contribution to SSR each variable gives over and above variables previously listed. The Sequential SS depends on which order variables are stated in model statement; the variables whose coefficients we want to test must be listed last. STA302/ week 11
10
Meadowfoam Example Continuation
Suppose now we treat the variable light intensity as a quantitative variable. There are three possible models to look at the relationship between seedling growth and the two predictor variables… If we want to know whether the effect of light intensity on number of flowers per plant depends on timing we need to include in the model an interaction term…. STA302/ week 11
11
Meadowfoam Example – Summary of Findings
There is no evidence that the effect of light intensity on flowers depends on timing (P-value = 0.91). That means that the interaction effect is not significant. If interaction did exist, it is difficult to talk about the effect of light intensity on Y, as it varies with timing. Since the interaction was not significant, we remove it from the model. For same timing, increasing light intensity by 100 micromol/m2/sec decreases the mean number of flower per plant by 4.0 flowers / per plant. 95% CI: (-5.1, -3) For same light intensity, beginning the light treatment early increases the mean number of flowers per plant by 12.2 flowers / plants. 95% CI (6.7, 17.6). STA302/ week 11
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.