1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression Analysis
2 Doing Statistics for Business Chapter 11 Objectives Ü Find the linear regression equation for a dependent variable Y as a function of a single independent variable X Ü Determine whether a relationship between X and Y exists Ü Analyze the results of a regression analysis to determine whether the simple linear model is appropriate
3 Doing Statistics for Business Figure 11.1 Deterministic Relationship Between Total Order Cost and Number of Items Ordered
4 Doing Statistics for Business Figure 11.2 Statistical Relationship Between Revenue and Advertising Expenditures
5 Doing Statistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship An oil company is trying to determine how the number of refining sites available for refining crude oil relates to the overall refining capacity. It would use this information to determine whether or not expansion will provide the increase in capacity that it wants or whether others steps to increase capacity will be necessary. The company collects data on other competitive companies and finds the following:
6 Doing Statistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship (con’t)
7 Doing Statistics for Business TRY IT NOW! Increasing Capacity Plotting Data to Look at the Relationship (con’t) Use a grid to create a scatter plot of of the data. Do you think that a linear model is a good one?
8 Doing Statistics for Business The true relationship between the variables X and Y, the Simple Linear Regression Model, can be described by the equation y = 0 + 1 x +
9 Doing Statistics for Business Figure 11.3 The True Regression Model Showing how Y Varies for a Given Value of X
10 Doing Statistics for Business Figure 11.4 Straight Line Approximating the Relationship Between Advertising and Revenue
11 Doing Statistics for Business Figure 11.5 A Single Criterion Can Produce Many Different Lines
12 Doing Statistics for Business The distance between the predicted value of Y, and the actual value of Y, , is called the deviation or error.
13 Doing Statistics for Business Figure 11.6 Deviations Between the Data Points and the Line
14 Doing Statistics for Business The technique that finds the equation of the line that minimizes the total or sum of the squared deviations between the actual data points and the line is called the least squares method.
15 Doing Statistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line The oil company that is looking at increasing refining capacity has decided that a linear relationship is appropriate. Fill in the table shown on the following slide or use some other means to find the equation of the least-squares line:
16 Doing Statistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line (con’t)
17 Doing Statistics for Business TRY IT NOW! Increasing Capacity Finding the Equation of the Least- Squares Regression Line(con’t) Interpret the meaning of the estimate of the slope of the line. Does the y intercept make sense for these data?
18 Doing Statistics for Business The value of that we find is really a prediction of the mean value of Y for a given value of X.
19 Doing Statistics for Business Using the equation to predict values of Y within the range of the X data is called interpolation. Predicting values of for values of X outside the observed range is called extrapolation.
20 Doing Statistics for Business TRY IT NOW! Increasing Capacity Using the Regression Equation to Predict the Value of Y Use the equation of the regression line you found earlier to predict the refining capacity for each of the observed values of X, the number of sites.
21 Doing Statistics for Business TRY IT NOW! Increasing Capacity Using the Regression Equation to Predict the Value of Y (con’t)
22 Doing Statistics for Business The difference between the observed value of Y (y), and the predicted value of Y from the regression equation ( i ), for a value of X = x, is called the ith residual, e i.
23 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals The oil company that is looking at the relationship between refining capacity and the number of refining sites wants to get a better idea of how the regression line relates to the actual data. It decides to calculate the residuals for each observed value of X, the number of sites. Find the residuals and fill in the table found on the following slide:
24 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t)
25 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t) To get a picture of how the residuals and the regression line fit together, the company also decides to graph the regression line on a plot of the data. Graph the regression line on the data plot. How well do you think the line represents the data?
26 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating the Residuals (con’t)
27 Doing Statistics for Business The standard error of the estimate, s y x is a measure of how much the data vary around the regression line.
28 Doing Statistics for Business Figure 11.8 Computer Output Showing the Standard Error of the Estimate Excel Output Minitab Output
29 Doing Statistics for Business Figure 11.9 (a) Line with non-zero slope (b) Line with zero slope (a) (b)
30 Doing Statistics for Business Figure t-test Portion of Computer Output Minitab Excel
31 Doing Statistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model The oil company that is looking at increasing capacity wants to determine whether the relationship between refining capacity and number of refining sites that it calculated is significant. Write down the hypotheses that the company needs to test.
32 Doing Statistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) The company decides to use a 0.01 level of significance for the test. Find the critical values for the test. It used a computer software package to run the analysis and obtained the following output:
33 Doing Statistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) From the computer output, find the slope of the regression line, the standard error of the slope, and the value of the t statistic. Perform the hypothesis test and make a decision about the regression line.
34 Doing Statistics for Business TRY IT NOW! Increasing Capacity Testing for Significance of the Regression Model (con’t) Find the p value of the test from the output and explain how you could use the p value on the output to make the same decision. Once we have determined that the relationship between X and Y is significant, we can perform some additional analyses to see if the predictions we obtain are useful for the purposes of decision making and to determine the strength of the relationship.
35 Doing Statistics for Business Figure Components of the Variation in y Value
36 Doing Statistics for Business Excel Output Minitab Output Figure Computer ANOVA Output for Regression Analysis
37 Doing Statistics for Business A Confidence Interval provides an estimate for the mean value of Y ( y x ) at a particular value of X.
38 Doing Statistics for Business Figure Confidence Interval for the Mean Estimate
39 Doing Statistics for Business TRY IT NOW! Increasing Capacity Finding Confidence Intervals for the Mean Predicted Value After calculating the regression model and deciding that the model is significant, the analysts at the oil company would like to know about the accuracy of the estimates from the model. They decide to calculate 95% confidence intervals for X = 8 and 13 sites. They know from previous work that for the set of 10 observations in the model, s y x = 13.43, x = 78, and x 2 = 714.
40 Doing Statistics for Business TRY IT NOW! Increasing Capacity Finding Confidence Intervals for the Mean Predicted Value (con’t) Find 95% confidence intervals for the mean estimates. Do you think that these estimates would be useful for planning purposes? Why or why not?
41 Doing Statistics for Business A Prediction Interval gives an estimate for an individual value of Y at a particular value of X.
42 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating Prediction Intervals for Regression Estimates The oil company analysts decide to calculate 95% prediction intervals for the two X values that they are interested in. The relevant values from the set of 10 observations are s y x = 13.43, x = 78, and x 2 = 714. Find 95% prediction intervals for X = 8 and X = 13 refining sites.
43 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating Prediction Intervals for Regression Estimates (con’t) Do you think that confidence intervals or planning intervals would be more appropriate for the oil company’s purpose?
44 Doing Statistics for Business The Correlation Coefficient is used as a measure of the strength of a linear relation- ship. A correlation of – 1 corresponds to a perfect negative relationship, a correlation of 0 corresponds to no relationship, and a correlation of +1 corresponds to a perfect positive relationship.
45 Doing Statistics for Business Perfect Negative No Relationship Perfect Positive Figure Types of Relationships: Perfect Negative, No Relationship, and Perfect Positive
46 Doing Statistics for Business TRY IT NOW! Increasing Capacity Calculating the Correlation Coefficient The relevant data to calculate the correlation coefficient for the oil company problem are n = 10 x = 78 y = xy = x 2 = 714 y 2 = 25, Find the correlation coefficient for the data.
47 Doing Statistics for Business Figure Examples of Residual Plots
48 Doing Statistics for Business Figure Histograms of Residuals
49 Doing Statistics for Business A Normal Probability Plot is a plot of the ordered data against their expected values under a normal distribution. When data are normally distributed, the plot will be a straight line.
50 Doing Statistics for Business Figure Regression Diagnostic Plots
51 Doing Statistics for Business Figure Warning Output from Minitab
52 Doing Statistics for Business Simple Linear Regression Model in Excel 1. From the list of data analysis tools, select Regression. 2. Position the cursor in the textbox labeled Input Y Range: and highlight the data range for the Y variable, in this case, Revenues. 3. Move the cursor in the textbox for Input X Range: and highlight the data range of the X variable, in this case, Members.
53 Doing Statistics for Business Simple Linear Regression Model in Excel 4. If the data ranges contain labels, click on the Labels checkbox. If you want confidence intervals for the regression estimates, click the checkbox for Confidence Level. 5. Specify the location where you want the output to appear, either in the current sheet, in a new worksheet, or in a new workbook. 6. Click the checkbox for Residuals. Do not check the Standardized Residuals checkbox. Excel does not calculate these values correctly.
54 Doing Statistics for Business Simple Linear Regression Model in Excel 7. Click the checkboxes for Residual Plots and Line Fit Plots. Do not click the checkbox for Normal Probability Plot. The plot is not created correctly. 8. Click on OK. The output will appear in the location you specified.
55 Doing Statistics for Business Figure Completed Regression Dialog Box
56 Doing Statistics for Business Figure Summary Section of Regression Output
57 Doing Statistics for Business Figure Residual Output
58 Doing Statistics for Business Figure Plots from Regression Analysis
59 Doing Statistics for Business 4 Although Excel does perform linear regression, KaddStat can also be used for the analysis. The basic input is the same, although KaddStat has slightly different output. 4 From the Kadd menu select Regression and correlation > Single/Multiple. The dialog box shown in Figure opens.
60 Doing Statistics for Business Figure Regression Dialog Box in KaddStat
61 Doing Statistics for Business 1. Position the cursor in the box labeled Input Range and highlight your data in the Excel worksheet. Although nothing changes immediately, if you click on the drop down arrow in the box labeled Dependent Variable all of the variable names in the Input Range appear in the boxes for Dependent and Independent Variables as shown in Figure
62 Doing Statistics for Business Figure Variable lists for regression analysis
63 Doing Statistics for Business 1.From the drop down list, select Rev $bn for the Dependent Variable. 2.Move the cursor over to the box labeled Independent Variable and from the list, click on the variable that you want to use for the independent variable, in this case, Members (m). 3.In the bottom part of the dialog box indicate which plots you want included in the output. 4.Indicate where you want the output to appear and click OK.
64 Doing Statistics for Business The main portion of the output is shown in Figure 11.27
65 Doing Statistics for Business The remainder of the output consists of the graphs requested and the residuals and standardized residuals shown in Figure 11.28
66 Doing Statistics for Business
67 Doing Statistics for Business Kadd will calculate the predicted values for the data points, or for any other x values. Click on the box labeled Forecast and the dialog box will open.
68 Doing Statistics for Business
69 Doing Statistics for Business 4 Place the cursor in the Forecast Data Range box and highlight the location of the values of the independent variable for which you want predictions. 4 Indicate where you want the output located 4 Click OK
70 Doing Statistics for Business
71 Doing Statistics for Business Chapter 11 Summary In this chapter you have learned: 4 Linear regression analysis is a powerful tool for determining how two variables are related. 4 The regression equation can be used for: Description - used when you are simply trying to understand the way that two variables are related.
72 Doing Statistics for Business Chapter 11 Summary (con’t) Control - describes when the model is used to set standards or reduce variability. Predictability - describes when the model is used to determine what the resulting Y value should be when X takes on certain values. 4 Although the simple linear model may be significant, it might not be correct.
73 Doing Statistics for Business Chapter 11 Summary (con’t) 4 It is necessary to test the Assumptions of the linear model to see whether the model you obtain is appropriate.