1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression
2 April 5 -- Lab
3 Analysis of Variance Approach Mathematical Fact SS(Total) = SS(Regression) + SS(Residuals) p. 649 (SS “explained” by the model) (SS “unexplained” by the model) (S yy )
4 Plot of Production vs Cost
5 SS(???)
6
7
8 measures the proportion of the variability in Y that is explained by the regression on X
YXX
10 The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model Error Corrected Total The GLM Procedure Dependent Variable: y Sum of Source DF Squares Model =SS(reg) Error =SS(Res) Corrected Total =SS(Total)
11 RECALL Theoretical Model Regression line residuals
12 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: Plot of x vs residuals
13
14
15 Study Time Data PROC GLM; MODEL score=time; OUTPUT out=new r=resid; RUN; PROC GPLOT; TITLE 'Plot of Residuals'; PLOT resid*time; RUN;
16 Average Height of Girls by Age
17 Average Height of Girls by Age
18 Residual Plot
19 Residual Analysis Examination of residuals to help determine if: - assumptions are met - regression model is appropriate Residual Plot: - plot of x vs residuals Normality of Residuals: - probability plot - histogram
20 Residuals from Car Dataset fit using √ hp
21 Residuals from Car Dataset fit using log(hp)
22 Y X Data – Page 572 Y = weight loss (wtloss) X = exposure time (exptime) Weight loss in a chemical compound as a function of how long it is exposed to air
23 PROC REG; MODEL wtloss=exptime/r cli clm; output out=new r=resid; RUN; The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept exptime <.0001
24 Plot of Residuals - MLR Model The REG Procedure Dependent Variable: wtloss Output Statistics Dependent Predict Std Error Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict Residual
25
26 The REG Procedure Dependent Variable: wtloss Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept exptime <.0001 ??? For testing H 0 : For testing H 0 :
27 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Recall: SS(Regression) = “Model SS” SS(Residual) = “Error SS”
28 H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y Reject H 0 if F > F (1,n – 2) where
29 H 0 : there is no linear relationship between weight loss and exposure time H 1 : there is a linear relationship between weight loss and exposure time
30 Note: In simple linear regression H 0 : there is no linear relationship between X and Y H 1 : there is a linear relationship between X and Y and H 0 : 0 H 1 : ≠ 0 are equivalent and F t2F t2
31 Multiple Regression Use of more than one independent variable to predict Y Assumptions:
32 Data and so we have i th observation, j th independent variable
33 Goal: Find “best” prediction equation of the form As before:
34 Again: the solution involves calculus -- solving the Normal Equations on page 627
35 Analysis of Variance Sum of Mean Source DF Squares Square F Value Model k SS(Reg.) MS(Reg.)=SS(Reg.)/k MS(Reg.)/MSE Error n-k-1 SSE MSE=SSE/(n-k-1) Corr. Total n-1 SS(Total)
36 H 0 : there is no linear relationship between Y and the independent variables H 1 : there is a linear relationship between Y and the independent variables Reject H 0 if F > F (k, n k 1) where Multiple Regression Setting
37 measures the proportion of the variability in Y that is explained by the regression - in MLR Setting has the same interpretation as before
38 Y X 1 X Data – Page 628 Y = weight loss (wtloss) X 1 = exposure time (exptime) X 2 = relative humidity (humidity) Weight loss in a chemical compound as a function of exposure time and humidity
39 The REG Procedure Dependent Variable: wtloss Number of Observations Read 12 Number of Observations Used 12 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept exptime <.0001 humidity Chemical Weight Loss – MLR Output
40 H 0 : there is no linear relationship between weight loss and the variables exposure time and humidity H 1 : there is a linear relationship between weight loss and the variables exposure time and humidity
41 Examining Contributions of Individual X variables Use t -test for the X variable in question. - this tests the effect of that particular independent variable while all other independent variables stay constant. Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept exptime <.0001 humidity