1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing for Significance
2 2 Slide Simple Linear Regression Model y = 0 + 1 x + where: 0 and 1 are called parameters of the model, is a random variable called the error term. is a random variable called the error term. The simple linear regression model is: The simple linear regression model is: The equation that describes how y is related to x and The equation that describes how y is related to x and an error term is called the regression model. an error term is called the regression model. dependent variable independent variable
3 3 Slide Simple Linear Regression Equation n The simple linear regression equation is: E ( y ) is the expected value of y for a given x value. E ( y ) is the expected value of y for a given x value. 1 is the slope of the regression line. 1 is the slope of the regression line. 0 is the y intercept of the regression line. 0 is the y intercept of the regression line. Graph of the regression equation is a straight line. Graph of the regression equation is a straight line. E ( y ) = 0 + 1 x
4 4 Slide Simple Linear Regression Equation n Positive Linear Relationship E(y)E(y)E(y)E(y) E(y)E(y)E(y)E(y) xx Slope 1 is positive Regression line Intercept 0
5 5 Slide Simple Linear Regression Equation n Negative Linear Relationship E(y)E(y)E(y)E(y) E(y)E(y)E(y)E(y) xx Slope 1 is negative Regression line Intercept 0
6 6 Slide Simple Linear Regression Equation n No Relationship E(y)E(y)E(y)E(y) E(y)E(y)E(y)E(y) xx Slope 1 is 0 Regression line Intercept 0
7 7 Slide Estimated Simple Linear Regression Equation n The estimated simple linear regression equation is the estimated value of y for a given x value. is the estimated value of y for a given x value. b 1 is the slope of the line. b 1 is the slope of the line. b 0 is the y intercept of the line. b 0 is the y intercept of the line. The graph is called the estimated regression line. The graph is called the estimated regression line.
8 8 Slide Estimation Process Regression Model y = 0 + 1 x + Regression Equation E ( y ) = 0 + 1 x Unknown Parameters 0, 1 Sample Data: x y x 1 y x n y n b 0 and b 1 provide estimates of 0 and 1 Estimated Regression Equation Sample Statistics b 0, b 1
9 9 Slide Least Squares Method n Least Squares Criterion where: y i = observed value of the dependent variable for the i th observation for the i th observation^ y i = estimated value of the dependent variable for the i th observation for the i th observation
10 Slide n Slope for the Estimated Regression Equation Least Squares Method
11 Slide n y -Intercept for the Estimated Regression Equation Least Squares Method where: x i = value of independent variable for i th observation observation n = total number of observations _ y = mean value for dependent variable _ x = mean value for independent variable y i = value of dependent variable for i th observation observation
12 Slide Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown on the next slide. Simple Linear Regression n Example: Reed Auto Sales
13 Slide Simple Linear Regression n Example: Reed Auto Sales Number of TV Ads TV Ads Number of Cars Sold
14 Slide Estimated Regression Equation n Slope for the Estimated Regression Equation n y -Intercept for the Estimated Regression Equation n Estimated Regression Equation
15 Slide n Excel Worksheet (showing data) Estimated Regression Equation Download Ch12-CarSales.xlsx
16 Slide n Producing a Scatter Diagram and Trend Line Step 1 Select cells B2:C6 Step 2 Click the Insert tab on the Ribbon Step 3 In the Charts group, click Scatter Step 4 When the list of scatter diagram subtypes appears, Click Scatter with only Markers Click Scatter with only Markers Estimated Regression Equation Step 5 In the Chart Layouts group, click Layout 1 Step 6 Right-click on the Chart Title to display options; choose Delete choose Delete Step 7 Select the Horizontal (Value) Axis Title and replace it with TV Ads replace it with TV Ads
17 Slide n Producing a Scatter Diagram and Trend Line Estimated Regression Equation Step 8 Select the Vertical (Value) Axis Title and replace it with Cars Sold replace it with Cars Sold Step 9 Right-click on the Series 1 Legend Entry to display a list of options; choose Delete display a list of options; choose Delete Step 10 Position the mouse pointer over any Vertical (Value) Axis Major Gridline in the diagram (Value) Axis Major Gridline in the diagram and right-click to display a list of options; and right-click to display a list of options; choose Delete choose Delete
18 Slide n Producing a Scatter Diagram and Trend Line Estimated Regression Equation Step 11 Position the mouse pointer over any data point in the diagram and right-click to display point in the diagram and right-click to display a list of options; choose Add Trendline a list of options; choose Add Trendline Step 12 When the Format Trendline dialog box appears, Select Trendline Options and then Select Trendline Options and then Choose Linear from the Trend/Regression Choose Linear from the Trend/Regression Type list Type list Choose Display Equation on chart Choose Display Equation on chart Click Close Click Close
19 Slide Scatter Diagram and Trend Line
20 Slide Coefficient of Determination n Relationship Among SST, SSR, SSE where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error SST = SSR + SSE
21 Slide n The coefficient of determination is: Coefficient of Determination where: SSR = sum of squares due to regression SST = total sum of squares r 2 = SSR/SST
22 Slide Coefficient of Determination r 2 = SSR/SST = 100/114 =.8772 The regression relationship is very strong; 88% The regression relationship is very strong; 88% of the variability in the number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.
23 Slide n Using Excel to Produce r 2 Step 3 When the Format Trendline dialog box appears, Select Trendline Options and then Select Trendline Options and then Choose Display R-squared value on chart Choose Display R-squared value on chart Click Close Click Close Step 2 Choose Add Trendline Step 1 Position the mouse pointer over any data point in the scatter diagram and right click at display in the scatter diagram and right click at display a list of options a list of options Coefficient of Determination
24 Slide n Excel Value Worksheet (showing r 2 ) Coefficient of Determination
25 Slide Sample Correlation Coefficient where: b 1 = the slope of the estimated regression b 1 = the slope of the estimated regression equation equation Excel approach: r xy = CORREL(x range, y range)
26 Slide The sign of b 1 in the equation is “+”. Sample Correlation Coefficient r xy =
27 Slide Using Excel’s Regression Tool The Regression tool can be used to perform a The Regression tool can be used to perform a complete regression analysis. complete regression analysis. Excel also has a comprehensive tool in its Data Excel also has a comprehensive tool in its Data Analysis package called Regression. Analysis package called Regression. Up to this point, you have seen how Excel can be Up to this point, you have seen how Excel can be used for various parts of a regression analysis. used for various parts of a regression analysis.
28 Slide Using Excel’s Regression Tool n Excel Worksheet (showing data)
29 Slide Using Excel’s Regression Tool n Performing the Regression Analysis Step 3 Choose Regression from the list of Analysis Tools Analysis Tools Step 2 In the Analysis group, click Data Analysis Step 1 Click the Data tab on the Ribbon
30 Slide Using Excel’s Regression Tool n Excel Regression Dialog Box
31 Slide Using Excel’s Regression Tool n Excel Value Worksheet ANOVA Output Regression Statistics Output Data Estimated Regression Equation Output
32 Slide Using Excel’s Regression Tool Note: Columns F-I are not shown. n Excel Value Worksheet (bottom-left portion) n Estimated Regression Equation
33 Slide Using Excel’s Regression Tool Note: Columns C-E are hidden. n Excel Value Worksheet (bottom-right portion)
34 Slide Using Excel’s Regression Tool n Excel Value Worksheet (middle portion)
35 Slide Using Excel’s Regression Tool n Excel Value Worksheet (top portion)
36 Slide Assumptions About the Error Term 1. The error is a random variable with mean of zero. 2. The variance of , denoted by 2, is the same for all values of the independent variable. all values of the independent variable. 2. The variance of , denoted by 2, is the same for all values of the independent variable. all values of the independent variable. 3. The values of are independent. 4. The error is a normally distributed random variable. variable. 4. The error is a normally distributed random variable. variable.
37 Slide Testing for Significance To test for a significant regression relationship, we To test for a significant regression relationship, we must conduct a hypothesis test to determine whether must conduct a hypothesis test to determine whether the value of 1 is zero. the value of 1 is zero. To test for a significant regression relationship, we To test for a significant regression relationship, we must conduct a hypothesis test to determine whether must conduct a hypothesis test to determine whether the value of 1 is zero. the value of 1 is zero. Two tests are commonly used: Two tests are commonly used: t Test and F Test Both the t test and F test require an estimate of 2, Both the t test and F test require an estimate of 2, the variance of in the regression model. the variance of in the regression model. Both the t test and F test require an estimate of 2, Both the t test and F test require an estimate of 2, the variance of in the regression model. the variance of in the regression model. Individualsignificancetest Overallsignificancetest
38 Slide An Estimate of An Estimate of Testing for Significance where: s 2 = MSE = SSE/( n 2) The mean square error (MSE) provides the estimate of 2, and the notation s 2 is also used.
39 Slide Testing for Significance An Estimate of An Estimate of To estimate we take the square root of 2. To estimate we take the square root of 2. The resulting s is called the standard error of The resulting s is called the standard error of the estimate. the estimate.
40 Slide n Hypotheses n Test Statistic Testing for Significance: F Test F = MSR/MSE
41 Slide n Rejection Rule Testing for Significance: F Test where: F is based on an F distribution with 1 degree of freedom in the numerator and n - 2 degrees of freedom in the denominator p-value approach: Reject H 0 if p -value < Critical value approach: Reject H 0 if F > F
42 Slide 1. Determine the hypotheses. 2. Specify the level of significance. 3. Compute the value of the test statistic F. =.05 Testing for Significance: F Test p -value and critical value approach F = MSR/MSE = 100/4.667 = test statistic Use Ch12-CarSales.xlsx p-value
43 Slide 4. Compute the p -value. Testing for Significance: F Test p -value approach p -value is the area on the right of F with numerator’s degrees of freedom 1 and denominator’s degrees of freedom n -2=5-2=3 F=21.43 p-value 0.01<p-value<0.025
44 Slide Testing for Significance: F Test 5. Determine whether to reject H 0. The p -value corresponding to F = is less than.05. Hence, we reject H 0. The p -value corresponding to F = is less than.05. Hence, we reject H 0. The statistical evidence is sufficient to conclude The statistical evidence is sufficient to conclude that we have a significant relationship between the number of TV ads aired and the number of cars sold. p -value approach
45 Slide 4. Compute the critical value. Testing for Significance: F Test Critical value approach critical value alpha level
46 Slide Testing for Significance: F Test 5. Determine whether to reject H 0. The critical value corresponding to the level of significance 0.05 is The test statistic F = is greater than the critical value. Hence, we reject H 0. The critical value corresponding to the level of significance 0.05 is The test statistic F = is greater than the critical value. Hence, we reject H 0. The statistical evidence is sufficient to conclude The statistical evidence is sufficient to conclude that we have a significant relationship between the number of TV ads aired and the number of cars sold. Critical value approach
47 Slide n Hypotheses n Test Statistic Testing for Significance: t Test where
48 Slide n Rejection Rule Testing for Significance: t Test where: t is based on a t distribution with n - 2 degrees of freedom p-value approach: Reject H 0 if p -value < Critical value approach: Reject H 0 if t t
49 Slide 1. Determine the hypotheses. 2. Specify the level of significance. 3. Compute the test statistic. =.05 Testing for Significance: t Test p -value and critical value approach test statistic p-value Use Ch12-CarSales.xlsx
50 Slide 4. Compute the p -value. Testing for Significance: t Test p -value approach p -value is the area on both tails beyond the test statistic t. It’s based on n-2=3 degrees of freedom. T=4.63 upper tail area 0.01<p-value<0.02 p-value is from both tails. Double the range.
51 Slide Testing for Significance: t Test 5. Determine whether to reject H 0. The p -value is less than the alpha level. We can reject H 0. The TV Ads independent variable is a significant factor at the 0.05 level. p -value approach
52 Slide Testing for Significance: t Test 4. Compute the critical value and identify rejection rule. Rejection Rule: Reject H 0 if t > t or t t or t < - t Critical value approach Critical values for two-tailed test are - t and t . At.05 level, they are - t and t (with 3 degrees of freedom). critical value t Excel approach: Critical value t =TINV(0.025*2,3)= t =-TINV(0.025*2,3)= t =-TINV(0.025*2,3)=-3.182
53 Slide Testing for Significance: t Test 5. Determine whether to reject H 0. t = 4.63 > We can reject H 0. The TV Ads independent variable is a significant factor at the 0.05 level. Critical value approach
54 Slide Confidence Interval for 1 H 0 is rejected if 0 is not included in the confidence H 0 is rejected if 0 is not included in the confidence interval for 1. interval for 1. We can use a 95% confidence interval for 1 to test We can use a 95% confidence interval for 1 to test the hypotheses just used in the t test. the hypotheses just used in the t test.
55 Slide The form of a confidence interval for 1 is: The form of a confidence interval for 1 is: Confidence Interval for 1 where is the t value providing an area of /2 in the upper tail of a t distribution with n - 2 degrees of freedom b 1 is the pointestimator is the margin of error
56 Slide Confidence Interval for 1 Reject H 0 if 0 is not included in the confidence interval for 1. 0 is not included in the confidence interval. Reject H 0 = 5 +/ (1.08) = 5 +/ or 1.56 to 8.44 n Rejection Rule 95% Confidence Interval for 1 95% Confidence Interval for 1 n Conclusion b1 s b1 Confidence Interval for 1 t.025 = TINV(0.025*2,3)=3.182
57 Slide Confidence Interval for 1 At a new level A new level can be specified here
58 Slide Some Cautions about the Interpretation of Significance Tests Just because we are able to reject H 0 : 1 = 0 and Just because we are able to reject H 0 : 1 = 0 and demonstrate statistical significance does not enable demonstrate statistical significance does not enable us to conclude that there is a linear relationship between x and y. Rejecting H 0 : 1 = 0 and concluding that the Rejecting H 0 : 1 = 0 and concluding that the relationship between x and y is significant does not enable us to conclude that a cause-and-effect relationship is present between x and y.
59 Slide Simple Linear Regression Part B n Using the Estimated Regression Equation for Estimation for Estimation n Residual Analysis: Validating Model Assumptions n Outliers and Influential Observations
60 Slide 1. If 3 TV ads are run prior to a sale, what is the estimated number of cars sold? Estimation of y ^ y = (3) = 25 cars
61 Slide Residual Analysis Much of the residual analysis is based on an Much of the residual analysis is based on an examination of graphical plots. examination of graphical plots. Residual for Observation i Residual for Observation i The residuals provide the best information about . The residuals provide the best information about . If the assumptions about the error term appear If the assumptions about the error term appear questionable, the hypothesis tests about the questionable, the hypothesis tests about the significance of the regression relationship and the significance of the regression relationship and the interval estimation results may not be valid. interval estimation results may not be valid.
62 Slide Residual Plot Against x If the assumption that the variance of is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then If the assumption that the variance of is the same for all values of x is valid, and the assumed regression model is an adequate representation of the relationship between the variables, then The residual plot should give an overall The residual plot should give an overall impression of a horizontal band of points impression of a horizontal band of points
63 Slide x 0 Good Pattern Residual Residual Plot Against x
64 Slide Residual Plot Against x x 0 Residual Nonconstant Variance
65 Slide Residual Plot Against x x 0 Residual Model Form Not Adequate
66 Slide n Residuals Residual Plot Against x
67 Slide n Using Excel to Produce a Residual Plot The output will include two new items: The output will include two new items: A plot of the residuals against the A plot of the residuals against the independent variable, and independent variable, and A list of predicted values of y and the A list of predicted values of y and the corresponding residual values. corresponding residual values. When the Regression dialog box appears, we must When the Regression dialog box appears, we must also select the Residual Plot option. also select the Residual Plot option. The steps outlined earlier to obtain the regression The steps outlined earlier to obtain the regression output are performed with one change. output are performed with one change. Residual Plot Against x
68 Slide n Excel Value Worksheet (bottom portion) Residual Plot Against x
69 Slide Residual Plot Against x
70 Slide n Standardized Residual for Observation i Standardized Residuals where:
71 Slide Standardized Residual Plot The standardized residual plot can provide insight about the assumption that the error term has a normal distribution. The standardized residual plot can provide insight about the assumption that the error term has a normal distribution. n If this assumption is satisfied, the distribution of the standardized residuals should appear to come from a standard normal probability distribution.
72 Slide n Excel’s Regression tool be used to obtain the standardized residuals. n The steps described earlier to conduct a regression analysis are performed with one change: When the Regression dialog box appears, we must select the Standardized Residuals option When the Regression dialog box appears, we must select the Standardized Residuals option n The Standardized Residuals option does not automatically produce a standardized residual plot. Standardized Residual Plot
73 Slide n Excel Value Worksheet Standardized Residual Plot
74 Slide n Excel’s Chart Wizard can be used to construct the standardized residual plot. n A scatter diagram is developed in which: The values of the independent variable are placed on the horizontal axis The values of the independent variable are placed on the horizontal axis The values of the standardized residuals are placed on the vertical axis The values of the standardized residuals are placed on the vertical axis Standardized Residual Plot
75 Slide n Excel Standardized Residual Plot Standardized Residual Plot
76 Slide Standardized Residual Plot All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that has a normal distribution. All of the standardized residuals are between –1.5 and +1.5 indicating that there is no reason to question the assumption that has a normal distribution.
77 Slide Outliers and Influential Observations n Detecting Outliers An outlier is an observation that is unusual in comparison with the other data. An outlier is an observation that is unusual in comparison with the other data. Minitab classifies an observation as an outlier if its standardized residual value is +2. Minitab classifies an observation as an outlier if its standardized residual value is +2. This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier. This standardized residual rule sometimes fails to identify an unusually large observation as being an outlier. This rule’s shortcoming can be circumvented by using studentized deleted residuals. This rule’s shortcoming can be circumvented by using studentized deleted residuals. The | i th studentized deleted residual| will be larger than the | i th standardized residual|. The | i th studentized deleted residual| will be larger than the | i th standardized residual|.