Chapter 11: Inferential methods in Regression and Correlation
Example: distribution of y The relationship between age and change in systolic blood pressure (BP, mm Hg) after 24 hours in response to a particular treatment has a linear regression equation of y = – 0.526x + e with σ = a)What is the mean value of y when x = 30? x = 50? x = 70? b)What is the standard deviation of y when x = 30? x = 50? x = 70?
Example: Estimating and The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. a)What are the point estimates of and ? b)What is a point estimate of the true average cetane number whose iodine value is 100?
Example: Estimating and (cont) x: y: x: y: a)What are the point estimates of and ?
Example: Estimating and (cont)
x: y: x: y:
Example: Estimating and (cont) b) What is a point estimate of the true average cetane number whose iodine value is 100?
Example: Estimating and The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. c) Find the point estimate of the error standard deviation, σ. d) What proportion of the observed variation in y can be attributed to the simple linear regression relationship between x and y?
Example: Estimating and (cont) c) Find the point estimate of the error standard deviation, σ. d) What proportion of the observed variation in y can be attributed to the simple linear regression relationship between x and y?
Example: Estimating and (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001
Example: CI The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. e) What is the 95% CI for the true slope?
Example: Output (SAS) The SAS System 09:20 Thursday, November 10, The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits Intercept < iodine < S xx =
Example: Hypothesis test The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. f) Is the model useful (that is, is there a useful linear relationship between x and y)?
Example: Hypothesis test (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001
Summary Slide SourcedfSSMS Model (Regression) 1SSR Errorn - 2SST – b S xy Totaln - 1S yy
Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001
Example: Hypothesis test for The cetane number is a critical property in specifying the ignition quality of a fuel used in a diesel engine. Determination of this number for a biodiesel fuel is expensive and time-consuming. Therefore a way of predicting this number is wanted. The data on the next slide is x = iodine value (g) and y = cetane number for a sample of 14 biofuels. The iodine value is the amount of iodine necessary to saturate a sample of 100g of oil. g) Is the model useful (that is, is there a useful linear relationship between x and y) using the population correlation coefficient?
Example: ANOVA (SAS) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001
Example: Hypothesis test for (2) In some locations, there is a strong association between concentrations for two different pollutants. The following data consists of the concentrations of x = ozone (ppm) and y = secondary carbon concentration (μg/m 3 ). x y x y
Example: Hypothesis test for (2) x y
Using the population correlation coefficient, is this model useful? The summary statistics are:
Example: Hypothesis test for (2) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept x
Example: Hypothesis test for (2) The REG Procedure Model: MODEL1 Dependent Variable: cetane Number of Observations Read 15 Number of Observations Used 14 Number of Observations with Missing Values 1 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 iodine <.0001
Example: Hypothesis test for (2)
Multiple Linear Regression
Example: Multiple Linear Regression It is important to know how long a tool will last (min) in the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x 1 = cutting speed (feet/min), x 2 = feed rate (in/revolution) and x 3 = depth of cut (in). The predicted model is y = – x 1 – x x 3 + e a) What is the mean life of a tool that is being used to cut depths of 0.03 inch at a speed rate of 450 feet/min with a feed rate of 0.01 in/revolution? b) What is the interpretation of 1 = ? Of 2 = ? Of 3 = ?
Example: Polynomial Regression Suppose the mean daily peak load (MW) for a power plant and the maximum outdoor temperature ( o F) for a sample of 10 days is given below. a)What is the estimated regression line using a quadratic regression model (besides the equation of the line, include the values of adj. r 2 and s e ? b)Using the line, predict the required peak power if the temperature is 98 o F? x i ( o F) y i (MW)
Example: Polynomial Regression (SAS) data newpower; set power; temp2 = temp*temp; proc reg data=newpower; model load=temp temp2; output out=fit r=res; run; Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept temp temp
Example: Polynomial Regression (cont) b) Using the line, predict the required peak power if the temperature is 98 o F? lineadj r 2 sese linearŷ= temp quadraticŷ= temp+0.27temp
Residual Plots
Interaction Effect
I love statistics! Thank you for not eating me!
Example: Multiple Regression Qualitative Predictors A study is conducted to determine the effects of x 1 = company size and x 2 = the presence (1) or absence (0) of a safety program on y = the number of work hours lost due to work-related accidents (thousands). 20 companies with no active safety programs were randomly chosen and 20 companies with active safety programs were randomly chosen. The SAS file (qualpred.txt) is on the class notes web site. The estimated regression line isqualpred.txt y ̂ = x 1 – x 2 + e What are the interpretations of 1 = and 2 = ?
Conceptual Understanding X1 X2 X3 Total Variation of Y
ANOVA table - MRR SourcedfSSMS Model (Regression) k SSM (from data) Errorn – k - 1 SSE (from data) Totaln - 1 SST (from data)
Example: Multiple Linear Regression It is important to know how long a tool will last (min) in the industrial setting. The cutting tool in this study is used to cut a particular type and size of cold-rolled steel. The predictors of interest are x 1 = cutting speed (feet/min), x 2 = feed rate (in/revolution) and x 3 = depth of cut (in). a) Is there a useful linear relationship between the cutting tool lifetime and the predictors?
Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth
Example: MLR (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth
Conceptual Understanding X1 X2 X3 Total Variation of Y
Example: MLR (backwards elimination) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 feed depth
Example: MLR (backwards elimination) (cont) Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 speed <.0001 depth
Example: MLR (backwards elimination) (cont) fullw/o feed line y = – speed feed – depth y = – speed – depth R2R adj R ANOVA table Model Error Total Model Error Total
Example: MLR (backwards elimination) (cont) fullfull - Pw/ow/o - P F – test20.93< < t-tests:speed-6.72< < t- tests: feed t-tests:depth