Presentation is loading. Please wait.

Presentation is loading. Please wait.

Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and.

Similar presentations


Presentation on theme: "Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and."— Presentation transcript:

1 Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and put practice questions, etc on web. No office hours on Wed this week. (I’m out of town all day.) See web for Homework due this Thursday (4/25). I read your proposals and made comments / suggestions on the ones that analyze their own data.

2 Multiple Regression Cheese Example: In a study of cheddar cheese from the La Trobe Valley of Victoria, Australia, samples of cheese were analyzed to determine the amount of acetic acid and hydrogen sulfide they contained. Overall scores for each cheese were obtained by combining the scores from several tasters. The goal is to predict the taste score based on the lactic acid and hydrogen sulfide content. (From Matt Wand)

3 Model: A simple model for taste is: Taste i =  0 +  1 acetic i +  2 H 2 S i + error i i = 1,…,n=30 Again the intercepts and slopes are selected to minimize the error sum of squares: SSE = {taste 1 – (b 0 + b 1 acetic 1 + b 2 H 2 S 1 )}2 + … + {taste 30 – (b 0 + b 1 acetic 30 + b 2 H 2 S 30 )} 2 Geometrically: The simple linear model estimated a line. A model with an intercept and 2 slopes estimates a surface. (see Matlab) Note that you could add more predictors too…

4 Minitab: Stat: Regression: Regression –Response is taste –Predictors are acetic and h2s Output: The regression equation is taste = - 34.0 - 7.57 H2S + 14.8 acetic Predictor Coef SE Coef T P Constant -33.99 26.53 -1.28 0.211 H2S -7.570 3.474 -2.18 0.038 acetic 14.763 4.242 3.48 0.002 S = 12.98 R-Sq = 40.6% R-Sq(adj) = 36.2% Analysis of Variance Source DF SS MS F P Regression 2 3114.0 1557.0 9.24 0.001 Residual Error 27 4548.9 168.5 Total 29 7662.9

5 Minitab: The regression equation is taste = - 34.0 - 7.57 H2S + 14.8 acetic Predictor Coef SE Coef T P Constant -33.99 26.53 -1.28 0.211 H2S -7.570 3.474 -2.18 0.038 acetic 14.763 4.242 3.48 0.002 T = Coef / SE Coef P-value is for test:H0: Coef = 0, HA: Coef is not 0 (if p-value < , then reject H0) 1-  CI for Coef: Coef +/- SE Coef t  /2,df=error df Test statistic

6 Minitab: This is a test of the “usefulness of regression” Analysis of Variance Source DF SS MS F P Regression 2 3114.0 1557.0 9.24 0.001 Residual Error 27 4548.9 168.5 Total 29 7662.9 The regression equation is taste = - 34.0 - 7.57 H2S + 14.8 acetic Model is regression equation + error: taste = - 34.0 - 7.57 H2S + 14.8 acetic + error MSE = 168.5 = variance of error. F stat = MSR / MSE (this is test statistic) P-value is for test: H0:  1 =  2 = (both slopes = 0) HA: at least one is not 0 Overall test of whether or not the regression is useful.

7 Using the regression equation: taste = - 34.0 - 7.57 H2S + 14.8 acetic If H2S = 3 and acetic = 5, then what is the expected taste score? (NOTE that this is not an extrapolation…) For value, just plug H2S=3 and acetic=5 into equation. For “confidence interval” (CI): Stat: regression: regression, Options button: prediction interval for new obs (put in in order that they’re in the regression equation)| New Obs Fit SE Fit 95.0% CI 95.0% PI 1 17.11 3.17 ( 10.60, 23.63) ( -10.30, 44.53) Prediction interval: wider than CI since prediction includes “error” variability and variability in estimating the parameters.

8 Dummy (or indicator) variables: When some predictor variables are categorical, then regression can still be used. Dummy variables are used to indicate fabric of each observation…

9 Regression Model for Burn Time Data Burn time =  1 if fabric 1 +  2 if fabric 2 +  3 if fabric 3 +  4 if fabric 4 + error or y i =  1 x 1i +  2 x 2i +  3 x 3i +  4 x 4i +  i (x’s are “indicator variables”) x 1i = 1 if observation i is fabric 1 and 0 otherwise x 2i = 1 if observation i is fabric 2 and 0 otherwise x 3i = 1 if observation i is fabric 3 and 0 otherwise x 4i = 1 if observation i is fabric 4 and 0 otherwise Beta’s are fabric specific means. The model does not have an intercept. (stat:regression:regression,options: “Fit intercept” button)

10 An Equivalent Model: y i =  0 +  2 x 2i +  3 x 3i +  4 x 4i +  i x 2i = 1 if observation i is fabric 2 and 0 otherwise x 3i = 1 if observation i is fabric 3 and 0 otherwise x 4i = 1 if observation i is fabric 4 and 0 otherwise Fabric 1 mean =  0 Fabric 2 mean =  0 +  2 Fabric 3 mean =  0 +  3 Fabric 4 mean =  0 +  4 This model does have an intercept.  0 is mean for fabric 1 Rest of the  ’s are “offsets”

11 The regression equation is Burn Time = 16.9 - 5.90 Fabric 2 - 6.35 Fabric 3 - 5.85 Fabric 4 Predictor Coef SE Coef T P Constant 16.8500 0.5806 29.02 0.000 Fabric 2 -5.9000 0.8211 -7.19 0.000 Fabric 3 -6.3500 0.8211 -7.73 0.000 Fabric 4 -5.8500 0.8211 -7.12 0.000 S = 1.161 R-Sq = 87.2% R-Sq(adj) = 83.9% Analysis of Variance (Note that this is the same as before!) Source DF SS MS F P Regression 3 109.810 36.603 27.15 0.000 Residual Error 12 16.180 1.348 Total 15 125.990 95% CI’s for fabric means: (Point estimate of mean) +/- t 0.025,12 sqrt(MSE / 4) Fabric 2: (16.85 – 5.90) +/- 2.179sqrt(1.348 / 4) 10.96 +/- 2.179(0.5806) (0.5806 is std dev of estimate of  0 +  2 ) (As usual, we’re assuming the errors are indep and normal with constant variance.)

12 Back to cheese Suppose the cheeses come from two regions of Australia and we want to include that info in the model: Taste i =  0 +  1 acetic i +  2 H 2 S i +  3 Region i + error i i = 1,…,n=30 Region i = 1 if i th sample comes from region 1 and 0 otherwise.  3 is effect of region 1… If b 3 is > 0, then region 1 tends to increase the mean score (and vice versa)


Download ppt "Announcements There’s an in class exam one week from today (4/30). It will not include ANOVA or regression. On Thursday, I will list covered material and."

Similar presentations


Ads by Google