Presentation is loading. Please wait.

Presentation is loading. Please wait.

HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 13.

Similar presentations


Presentation on theme: "HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 13."— Presentation transcript:

1 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 13 Regression, Inference, and Model Building

2 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Building a Simple Linear Regression Model

3 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Definition The simple linear regression model is given by the linear equation where is the y-intercept for the population data, is the slope coefficient for the population data, Simple Linear Regression Model

4 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Simple Linear Regression Model Definition (cont.) is the value of the independent (or predictor) variable for observation i, is the random error in y for observation i, and is the value of the dependent (or response) variable for observation i.

5 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Estimated Simple Linear Regression Equation Definition The estimated simple linear regression equation is where b 0 and b 1 are estimates of their population counterparts. Specifically, is an estimate of and is an estimate of is the predicted value of y for a given value of and is pronounced y-hat. The symbol y i is reserved for the observed value of y.

6 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Defining a Linear Relationship

7 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. How Do We Measure How Close a Line Is to the Data? Definition The difference between the observed value of y and the predicted value of y is called the error, estimated error, or residual (e i ). The error for each observation is given by

8 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Sum of Squared Errors Formula Sum of Squared Errors (SSE) The sum of squared errors (SSE) is given by

9 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Least Squares Line Definition The least squares line is the line that has the smallest sum of squared errors. This is the line of best fit.

10 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Finding the Least Squares Line

11 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Formula Slope and y-Intercept of the Least Squares Line The equation for finding the slope is given by where Slope and y-Intercept of the Least Squares Line

12 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Formula (cont.) and The slope can also be calculated using Slope and y-Intercept of the Least Squares Line

13 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Formula (cont.) The estimate of the intercept is given by The and referred to in the expressions are the observed data values of x and y, respectively. Slope and y-Intercept of the Least Squares Line

14 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Definition The intercept coefficient, b 0, is the average value of the dependent variable, y, when the independent variable, x, is equal to zero. The slope coefficient, b 1, is the average change in the dependent variable, y, for a one unit change in the independent variable, x. Intercept and Slope Coefficients

15 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Mean Square Error Formula Mean Square Error The variance of the error terms is also known as the mean square error and is given by: The square root of the mean square is the standard error, or the standard deviation of the error terms.

16 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Evaluating the Fit of a Model

17 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Total Sum of Squares (TSS) Formula Total Sum of Squares (TSS) The total variation in y is given by the total sum of squares (TSS).

18 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Sum of Squares of Regression Definition The sum of squares of regression (SSR) denotes the explained variation in the model. SSR = TSS – SSE (the explained variation, SSR, is equal to the total variation minus the unexplained variation)

19 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Formula Coefficient of Determination The coefficient of determination, R 2, is given by The coefficient of determination is a value between 0 and 1, inclusive. That is, Coefficient of Determination

20 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 The SAT Reasoning Test has been used for years as a predictor of academic success. If SAT scores are predictors of academic success, they should be positively related to the grade point average upon graduation. 27 graduates of a state college were sampled and their grade point averages (GPA) upon graduation and SAT scores reported upon admission are recorded. The data are given in Table 13.5.

21 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Table 13.5 – SAT Scores and Graduating GPA Student SAT Critical Reading SAT Math SAT Writing SAT Total Graduating GPA 1 2 3 4 5 6 7 8 9 10 11 12 440 390 410 390 490 400 450 420 370 460 370 410 550 480 360 350 590 550 430 350 390 600 400 530 495 435 385 370 540 475 440 385 380 530 385 470 1485 1305 1155 1110 1620 1425 1320 1155 1140 1590 1155 1410 2.105 2.484 2.537 2.969 3.619 2.303 2.602 2.195 2.112 3.482 2.367 2.082

22 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Table 13.5 – SAT Scores and Graduating GPA Student SAT Critical Reading SAT Math SAT Writing SAT Total Graduating GPA 13 14 15 16 17 18 19 20 21 22 23 24 470 490 540 560 440 580 360 440 290 440 510 570 610 630 620 470 530 670 420 460 410 500 570 520 550 585 590 455 485 625 390 450 350 470 540 1560 1650 1755 1770 1365 1455 1875 1170 1350 1050 1410 1620 2.346 3.484 2.446 2.820 2.556 3.357 3.269 2.964 2.642 2.297 2.388 2.850

23 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Table 13.5 – SAT Scores and Graduating GPA Student SAT Critical Reading SAT Math SAT Writing SAT Total Graduating GPA 25 26 27 320 470 550 540 580 550 430 525 550 1290 1575 1650 2.742 2.347 3.025

24 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Figure 13.14

25 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) The scatterplot in Figure 13.14 suggests that as SAT scores increase the GPA tends to increase, although there is a substantial amount of variability in the relationship. The upward sloping pattern of the data suggests a linear model could be constructed. However, a great deal of variation in the model’s errors should be expected. What percent of the variation in final grade point average can be explained by the model relating total SAT score to graduating GPA?

26 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Solution Using the least squares method, the estimated model is given by

27 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Figure 13.15

28 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Figure 13.16

29 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) One of the differences in the production model and the SAT/GPA model is the manner in which the data seem to fit the model. In the production model, the data seemed to fit closely around the line, while in the SAT/GPA model the data are loosely clustered about the line. While tight and loose are interesting portrayals of the relative fit of the data, it would be desirable to have a numerical measure to describe fit. R 2 is such a measure.

30 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.1 (cont.) Thus, approximately 19% of the variation in graduating GPA is explained by this model.

31 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. What is a goo R2?

32 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Fitting a Linear Time Trend

33 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 Many analysts believe that college tuition prices may soon be in the same situation as housing prices were when the housing bubble burst (causing home prices to drop significantly). Table 13.6 contains data for the Tuition Consumer Price Index (TCPI) from 1978 to 2009. Use a linear time trend to model the data.

34 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) Table 13.6 – Tuition Consumer Price Index YearTCPIYearTCPI 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 60.89 65.66 71.80 80.58 91.33 100.73 110.94 112.61 130.63 140.41 125.98 137.86 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 175.93 193.73 181.14 234.48 250.80 249.17 264.15 278.42 307.51 319.63 307.80 324.73

35 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) Table 13.6 – Tuition Consumer Price Index YearTCPIYearTCPI 2002 2003 2004 2005 348.54 371.42 343.05 476.08 2006 2007 2008 2009 507.90 456.30 531.55 607.60

36 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) Solution Tuition Consumer Price Index Figure 13.17

37 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) A graph of the data reveals an upward trend in the tuition consumer price index. The data appear to be a nonstationary time series with an upward trend. To describe the data, we will model the trend by fitting a line through the data with the notion of capturing how fast (on average) the series is changing over time. Estimating the slope of the line will provide the average rate of change per year in the TCPI. The line is fitted using least squares estimates in exactly the same way as other regression models have been constructed.

38 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) The independent variable in a linear trend model is always time. In this case, the dependent variable is TCPI. The estimated least squares equation is

39 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. The computer output for the problem is given in Figure 13.19. Example 13.2 (cont.) Tuition Consumer Price Index Figure 13.18

40 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) Figure 13.19

41 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.2 (cont.) The estimate of the slope, 15.4794, tells us that on average the TCPI is increasing at a rate of 15.4794 per year. Given how well the line fits the data the trend line is a good descriptor of the data. The trend line can also be used for short-term prediction. Suppose you wanted to estimate the TCPI for 2010. If the data are not available, the trend model can be used.

42 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Confidence Interval

43 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. The Confidence Interval for Formula 100(1  a )% Confidence Interval for B 1 The 100(1  a )% confidence interval for B 1 is given by

44 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 Table 13.7 – Weekly Production WeekItems ProducedCost ($) 1 2 3 4 5 6 7 8 9 10 22 30 36 41 27 45 30 37 32 31 3500 3800 4500 4200 3700 4600 3600 4550 3990 3675

45 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 (cont.) In Section 13.3, a model relating the number of items produced to total cost was constructed. If the relationship is to be applicable for the entire production process, then a substantial amount of data will be required, more than we could hope to collect. If the data given in Table 13.7 are considered a random sample of weekly production, then a relationship can be constructed from the sample data.

46 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 (cont.) Specifically, the estimated least squares regression line relating items produced to total cost is where b 0 = $2227.96 (the sample estimate of b 0, the y­ intercept), and b 1 = $53.88 (the sample estimate of b 1, the slope). Note: Both estimates were determined using Microsoft Excel and rounded to the nearest hundredth.

47 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 (cont.) Figure 13.20

48 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 (cont.) The manual calculation of is tedious. Virtually every statistical analysis program that performs regression analysis calculates The summary output from Microsoft Excel is given in Figure 13.20. Most software packages will automatically include a confidence interval for b 1 or it will include the pieces required to compute a confidence interval. Microsoft Excel automatically displays the 95% confidence interval for b 1, and is capable of displaying an interval for any level of confidence you choose.

49 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.3 (cont.) 95% Confidence Interval for  1

50 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.4 (cont.) 99% Confidence Interval for b 1

51 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.4 (cont.) Note: The confidence interval in this example was calculated using rounded values from the summary output. Microsoft Excel calculates the confidence interval using unrounded values as 17.05 to 90.72.

52 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Testing a Hypothesis Concerning b 1

53 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 Using the data in Example 13.3, determine if there is overwhelming evidence at the a = 0.05 level of a relationship between the number of items produced and the total production cost. Solution Step 1: State the hypotheses in plain English. Null Hypothesis: There is not a linear relationship between the number of items produced and total production cost.

54 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Alternative Hypothesis: There is a linear relationship between the number of items produced and total production cost. Step 2: Select the appropriate statistical measure. Since we are interested in determining if b 1 = 0, the sample estimate of b 1, namely b 1, will be used to evaluate whether the hypothesis b 1 = 0 is reasonable.

55 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 3: Determine whether the hypothesis should be one-sided or two-sided. The alternative hypothesis should be two-sided since we are interested in discovering any relationship (positive or negative) between items produced and total cost. Step 4: Specify the hypotheses using the appropriate statistical measure.

56 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 5: Specify the level of the test. The level of the test has been given in the problem statement as the 0.05 level. Step 6: Select the appropriate test statistic.

57 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Formula Test Statistic for Testing the Hypothesis b 1 ≠ 0 The test statistic for testing the hypothesis b 1 ≠ 0 is given by The test statistic follows a t-distribution with n  2 degrees of freedom. Example 13.6 (cont.)

58 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) The test statistic is similar in nature to the other test statistics developed in Chapter 10. It measures how far b 1 is from the hypothesized value of b 1, which is 0. This distance is measured in standard deviation units. If t is close to 0, then b 1 is close to 0 and H 0 : b 1 = 0 is the more reasonable conclusion. However, if t is far from zero, then b 1 is far from its hypothesized value and H a : b 1 ≠ 0 would seem more reasonable. This criterion is defined by the critical value of the test statistic.

59 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 7: Determine the critical value. The test is two-tailed and the level of the test is specified to be 0.05, which implies The test statistic has a t-distribution with The critical value corresponds to

60 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) t-Distribution, df = 8 Figure 13.22

61 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 8: Compute the test statistic. Table 13.9 – Regression Results PredictorCoefficientStandard Deviation of Coefficientt-value Intercept2227.96370.14886.019 Items Produced53.8810.97784.908 The estimated value of b 1 is almost five standard deviations above zero. This is very persuasive evidence that b 1 ≠ 0.

62 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 9: Make the decision. Since the value of the test statistic falls into the rejection region, reject the null hypothesis in favor of the alternative.

63 HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 13.6 (cont.) Step 10: State the conclusion in terms of the original question. There is overwhelming evidence at the 0.05 level that b 1 ≠ 0 so we reject the null hypothesis in favor of the alternative. This implies that it is reasonable to believe (at the 0.05 level) that there is a linear relationship between the number of items produced and total cost. In fact, there appears to be a positive linear relationship between items produced and production cost. However, our hypothesis test did not address the issue of a positive relationship, so we cannot make this conclusion.


Download ppt "HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 13."

Similar presentations


Ads by Google