Download presentation
Presentation is loading. Please wait.
Published byEdwina Cobb Modified over 8 years ago
1
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression
2
2 April 5 -- Lab
3
3 Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) p. 649 (SS “explained” by the model) (SS “unexplained” by the model) (S yy )
4
4 Plot of Production vs Cost
5
5 SS(???)
6
6
7
7
8
8 measures the proportion of the variability in Y that is explained by the regression on X
9
9 12 8 7 12 4 15 11 10 15 12 20 8 17 14 24 7 8 12 4 12 11 15 YXX
10
10 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model 1 19.575 Error 6 174.425 Corrected Total 7 194.000 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) 1 170.492 Error =SS(Res) 6 23.508 Corrected Total 7 194.000 =SS(Total)
11
11 RECALL Theoretical Model Regression line residuals
12
12 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of x vs residuals
13
13
14
14
15
15 Study Time Data PROC GLM; MODEL score=time; OUTPUT out=new r=resid; RUN; PROC GPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;
16
16 Average Height of Girls by Age
17
17 Average Height of Girls by Age
18
18 Residual Plot
19
19 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of x vs residuals Normality of Residuals: - probability plot - histogram
20
20 Residuals from Car Dataset fit using √ hp
21
21 Residuals from Car Dataset fit using log(hp)
22
22 Y X 4.3 4 5.5 5 6.8 6 8.0 7 4.0 4 5.2 5 6.6 6 7.5 7 2.0 4 4.0 5 5.7 6 6.5 7 Data – Page 572 Y = weight loss (wtloss) X = exposure time (exptime) Weight loss in a chemical compound as a function of how long it is exposed to air
23
23 PROC REG; MODEL wtloss=exptime/r cli clm; output out=new r=resid; RUN; The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001
24
24 Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics Dependent Predict Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual 1 4.3000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.7667 2 5.5000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.6500 3 6.8000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.6333 4 8.0000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.5167 5 4.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 0.4667 6 5.2000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 0.3500 7 6.6000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 0.4333 8 7.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 0.0167 9 2.0000 3.5333 0.3884 2.6679 4.3987 1.5437 5.5229 -1.5333 10 4.0000 4.8500 0.2543 4.2835 5.4165 2.9710 6.7290 -0.8500 11 5.7000 6.1667 0.2543 5.6001 6.7332 4.2877 8.0456 -0.4667 12 6.5000 7.4833 0.3884 6.6179 8.3487 5.4937 9.4729 -0.9833
25
25
26
26 The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Root MSE 0.80405 R-Square 0.8009 Dependent Mean 5.50833 Adj R-Sq 0.7810 Coeff Var 14.59701 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 -1.73333 1.16518 -1.49 0.1677 exptime 1 1.31667 0.20761 6.34 <.0001 ??? For testing H 0 : For testing H 0 :
27
27 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 26.00417 26.00417 40.22 <.0001 Error 10 6.46500 0.64650 Corrected Total 11 32.46917 Recall: SS(Regression) = “Model SS” SS(Residual) = “Error SS”
28
28 H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y Reject H 0 if F > F (1,n – 2) where
29
29 H 0 : there is no linear relationship between weight loss and exposure time H 1 : there is a linear relationship between weight loss and exposure time
30
30 Note: In simple linear regression H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y and H 0 : 0 H 1 : ≠ 0 are equivalent and F t2F t2
31
31 Multiple Regression Use of more than one independent variable to predict Y Assumptions:
32
32 Data and so we have i th observation, j th independent variable
33
33 Goal: Find “best” prediction equation of the form As before:
34
34 Again: the solution involves calculus -- solving the Normal Equations on page 627
35
35 Analysis of Variance Sum of Mean Source DF Squares Square F Value Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) Corr. Total n-1 SS(Total)
36
36 H 0 : there is no linear relationship between Y and the independent variables H 1 : there is a linear relationship between Y and the independent variables Reject H 0 if F > F (k, n k 1) where Multiple Regression Setting
37
37 measures the proportion of the variability in Y that is explained by the regression - in MLR Setting has the same interpretation as before
38
38 Y X 1 X 2 4.3 4.2 5.5 5.2 6.8 6.2 8.0 7.2 4.0 4.3 5.2 5.3 6.6 6.3 7.5 7.3 2.0 4.4 4.0 5.4 5.7 6.4 6.5 7.4 Data – Page 628 Y = weight loss (wtloss) X 1 = exposure time (exptime) X 2 = relative humidity (humidity) Weight loss in a chemical compound as a function of exposure time and humidity
39
39 The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 31.12417 15.56208 104.13 <.0001 Error 9 1.34500 0.14944 Corrected Total 11 32.46917 Root MSE 0.38658 R-Square 0.9586 Dependent Mean 5.50833 Adj R-Sq 0.9494 Coeff Var 7.01810 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002 Chemical Weight Loss – MLR Output
40
40 H 0 : there is no linear relationship between weight loss and the variables exposure time and humidity H 1 : there is a linear relationship between weight loss and the variables exposure time and humidity
41
41 Examining Contributions of Individual X variables Use t -test for the X variable in question. - this tests the effect of that particular independent variable while all other independent variables stay constant. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.66667 0.69423 0.96 0.3620 exptime 1 1.31667 0.09981 13.19 <.0001 humidity 1 -8.00000 1.36677 -5.85 0.0002
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.