Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diploma in Statistics Introduction to Regression Lecture 3.11 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review.

Similar presentations


Presentation on theme: "Diploma in Statistics Introduction to Regression Lecture 3.11 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review."— Presentation transcript:

1 Diploma in Statistics Introduction to Regression Lecture 3.11 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

2 Diploma in Statistics Introduction to Regression Lecture 3.12 Homework 2.2.1 Extend table of predictions of small medium and large jobs to include predictions based on the final fit. Compare and contrast.

3 Diploma in Statistics Introduction to Regression Lecture 3.13 Homework 2.2.2 You have been asked to comment, as a statistical consultant, on a prediction formula for forecasting job completion times prepared by a former employee. The formula is, effectively, the one derived from the first fit discussed above. Write a report for management. Your report should refer to (i)the practical usefulness of the employee's prediction formula, from a customer's perspective, (ii) the significance of the exceptional cases from the customer's and management's perspectives, and (iii) your recommended formula, with its relative advantages.

4 Diploma in Statistics Introduction to Regression Lecture 3.14 Outline solution (i)This formula is biased upwards for small jobs and downwards for large jobs. Also, the prediction error associated with this prediction formula is ± 75 hours, that is, ± 2 working weeks. This means that we can predict the delivery time to be anywhere in a 4 week period. This is unlikely to be acceptable to our customers who have to meet exacting scheduling requirements of their own.

5 Diploma in Statistics Introduction to Regression Lecture 3.15 Outline solution (ii)There was one small job which took an excessively long time to complete. The causes for this need to be established with a view to preventing their recurrence. The two longest jobs were subject to excessive variability, one taking an excessively long time and the other taking a remarkably short time. Again, the causes for these need to be established, with a view to reducing variability. In the meantime, while the recommended prediction formula (see next) may be used with caution for long jobs, the prediction error is not valid for jobs longer than around 600 hours. Further experience with longer jobs is needed to establish a valid prediction formula.

6 Diploma in Statistics Introduction to Regression Lecture 3.16 Outline solution (iii)The prediction formula is Jobtime = 44.2 – 0.0693 × Units + 9.83 × Ops + 0.108 × T_Ops hours, less 38 hours for Rushed jobs, ± 15 hours. This formula is unbiased and has a suitably small prediction interval width, likely to be acceptable to our customers.

7 Diploma in Statistics Introduction to Regression Lecture 3.17 Homework 2.2.3 Make a table of the t values and corresponding s values for the three regressions Compare, contrast and explain.

8 Diploma in Statistics Introduction to Regression Lecture 3.18 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

9 Diploma in Statistics Introduction to Regression Lecture 3.19 Analysis of Variance S = 7.41272 R-Sq = 99.8% R-Sq(adj) = 99.7% Analysis of Variance Source DF SS MS F P Regression 4 299165 74791 1361.12 0.000 Residual Error 12 659 55 Total 16 299824 Residual Mean Square = s 2 :check!

10 Diploma in Statistics Introduction to Regression Lecture 3.110 Analysis of Variance Regression Sum of Squares measures explained variation Residual Sum of Squares measures unexplained (chance) variation Total Variation= Explained +Unexplained Coefficient of Determination: Check it!

11 Diploma in Statistics Introduction to Regression Lecture 3.111 Analysis of Variance Regression Sum of Squares measures explained variation Residual Sum of Squares measures unexplained (chance) variation Total Variation= Explained +Unexplained F = MS(Reg) / MS(Res) with 4 and 12 degrees of freedom. Check it!Check F tables.

12 Diploma in Statistics Introduction to Regression Lecture 3.112 Analysis of Variance

13 Diploma in Statistics Introduction to Regression Lecture 3.113 Reduction in Prediction Error No fit prediction error:s No fit =s Y = 202 1st fit prediction error:s 1st fit =37.5,less by factor of 5.4 2nd fit prediction error:s 2nd fit =13.8,less by factor of 2.7 3rd fit prediction error:s 3rd fit =7.4,less by factor of 1.9

14 Diploma in Statistics Introduction to Regression Lecture 3.114 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

15 Diploma in Statistics Introduction to Regression Lecture 3.115 Step 1:Initial data analysis standard single variable summaries –to determine extent of variation –possible exceptional values; scatter plot matrix –to view pair wise relationships between the response and the explanatory variables and –to view pair wise relationships between the explanatory variables themselves.

16 Diploma in Statistics Introduction to Regression Lecture 3.116 Step 2:Least squares fit and interpretation calculate the best fitting regression coefficients –check meaningfulness and statistical significance; calculate s –check its usefulness for prediction –its usefulness relative to alternative estimates of standard deviation.

17 Diploma in Statistics Introduction to Regression Lecture 3.117 Step 3:Diagnostic analysis of residuals diagnostic plot –check for exceptional residuals or patterns of residuals, –possible explanations in terms of the fitted values; Normal plot –check for exceptional residuals or non-linear patterns in the residuals

18 Diploma in Statistics Introduction to Regression Lecture 3.118 Step 4:Iterate fit and check determine cases for deletion –repeat steps 2 and 3 until checks are passed.

19 Diploma in Statistics Introduction to Regression Lecture 3.119 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

20 Diploma in Statistics Introduction to Regression Lecture 3.120 The Stamp Sales Case Study The problem January 1984, An Post established New business plan; sales forecasts required Historical sales data available

21 Diploma in Statistics Introduction to Regression Lecture 3.121 Historical data

22 Diploma in Statistics Introduction to Regression Lecture 3.122 Trend projection? Hire a consultant!

23 Diploma in Statistics Introduction to Regression Lecture 3.123 Factors influencing sales Economic growth Stamp prices Alternative product prices measurement problems!

24 Diploma in Statistics Introduction to Regression Lecture 3.124 Project: develop a sales forecasting system for An Post Terms of reference 1.Identify and collect the relevant macro- economic data. 2.Establish a data base containing the data needed for model building; 3.Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

25 Diploma in Statistics Introduction to Regression Lecture 3.125 (a)medium-term (one to five years) forecasting of aggregate demand for postal services; (b)analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services; (c)use as a benchmark for the analysis of the effects of demand stimulation activities.

26 Diploma in Statistics Introduction to Regression Lecture 3.126 Project: develop a sales forecasting system for An Post Terms of reference 1.Identify and collect the relevant macro- economic data. 2.Establish a data base containing the data needed for model building; 3.Identify, estimate and check a dynamic regression model suitable for the purposes outlined below:

27 Diploma in Statistics Introduction to Regression Lecture 3.127 (a)medium-term (one to five years) forecasting of aggregate demand for postal services; (b)analysis of the effects of levels of general economic activity, postal prices and the prices of competing services, on aggregate demand for postal services; (c)use as a benchmark for the analysis of the effects of demand stimulation activities.

28 Diploma in Statistics Introduction to Regression Lecture 3.128 Explanatory variables General economic activity: –Gross National ProductGNP Postal prices: –Real Letter PriceRLP Prices of competing services: –Real Phone ChargeRPC

29 Diploma in Statistics Introduction to Regression Lecture 3.129 Definitions GNP measures the value of all goods and services produced by all residents of the state

30 Diploma in Statistics Introduction to Regression Lecture 3.130 Definitions Real Letter Price: the price of a standard sealed internal letter divided by the Consumer Price Index (CPI); measures relative change in the price of a stamp, relative to changes in the prices of other goods and services

31 Diploma in Statistics Introduction to Regression Lecture 3.131 Definitions Real Phone Charge: the price of a local telephone call divided by the Consumer Price Index (CPI)

32 Diploma in Statistics Introduction to Regression Lecture 3.132 Table 8.7Annual postage stamp sales, GNP, real letter prices and real phone charges, 1949-1983

33 Diploma in Statistics Introduction to Regression Lecture 3.133 Prediction model? Multiple linear regression How are Stamps Sale (Y) related to Gross National Product(GNP = X1 ), Real Letter Price(RLP = X2 ), Real Phone Charge(RPC = X3 ) ? Try

34 Diploma in Statistics Introduction to Regression Lecture 3.134 Example Best prediction equation: Predicted Sales = 343 –.0577 GNP – 53.2 RLP To calculate the predicted sales for any year, find the values of GNP and RLP for that year and substitute them in the equation. Application Evaluate the effect on sales of the industrial action in 1979. Actual sales (1979):112.5 GNP(1979):1,422.8 RLP(1979):1.526

35 Diploma in Statistics Introduction to Regression Lecture 3.135 Application Evaluate the effect on sales of the industrial action in 1979. Actual sales (1979):112.5 GNP(1979):1,422.8 RLP(1979):1.526 "Predicted" Sales: 343 –.0577 GNP – 53.2 RLP = 343 –.0577 × 1422.8 – 53.2 × 1.526 = 179.7 Effect = 112.5 – 179.7 = – 67.2

36 Diploma in Statistics Introduction to Regression Lecture 3.136 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

37 Diploma in Statistics Introduction to Regression Lecture 3.137 Step 1: Initial data analysis, dotplots

38 Diploma in Statistics Introduction to Regression Lecture 3.138 Initial data analysis, time plots

39 Diploma in Statistics Introduction to Regression Lecture 3.139 Initial data analysis, scatterplot matrix

40 Diploma in Statistics Introduction to Regression Lecture 3.140 Initial data analysis, scatterplot matrix

41 Diploma in Statistics Introduction to Regression Lecture 3.141 Initial data analysis, scatterplot matrix

42 Diploma in Statistics Introduction to Regression Lecture 3.142 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

43 Diploma in Statistics Introduction to Regression Lecture 3.143 The regression equation is Stamp Sales = 300 - 0.0603 GNP - 54.6 RLP + 73.5 RPC PredictorCoef SE Coef T P Constant 300.26 19.34 15.52 0.000 GNP -0.06033 0.02406 -2.51 0.018 RLP -54.57 21.15 -2.58 0.015 RPC 73.52 32.83 2.24 0.032 S = 15.2996R-Sq = 86.6%R-Sq(adj) = 85.3% Analysis of Variance Source DF SS MS F P Regression 3 46807 15602 66.65 0.000 Residual Error 31 7256 234 Total 34 54063 Step 2: Regression Analysis, First Fit

44 Diploma in Statistics Introduction to Regression Lecture 3.144 Exercise Explain the Degrees of Freedom Check the calculation of: MS(Regression) MS(Error) s R 2 F T Check the statistical significance of the coefficients

45 Diploma in Statistics Introduction to Regression Lecture 3.145 Step3: Diagnostic Analysis

46 Diploma in Statistics Introduction to Regression Lecture 3.146 Step 4: Iterate the analysis, 1979 deleted Predictor Coef SE Coef T P Constant 317.96 11.90 26.71 0.000 GNP -0.00771 0.01614 -0.48 0.636 RLP -92.18 13.72 -6.72 0.000 RPC 43.29 20.21 2.14 0.040 S = 9.22460 Exercise: Compare s to previous value. Compare coefficient estimates to previous values.

47 Diploma in Statistics Introduction to Regression Lecture 3.147 Compare fits

48 Diploma in Statistics Introduction to Regression Lecture 3.148 1980 Diagnostic plots, 1979 deleted after 1970 up to 1970

49 Diploma in Statistics Introduction to Regression Lecture 3.149 Next step: Model recent data, 1971-83 (\1979)

50 Diploma in Statistics Introduction to Regression Lecture 3.150 Regression 1971-1983 \ 1979 Predictor Coef SE Coef T P Constant 327.99 29.03 11.30 0.000 GNP -0.05480 0.01664 -3.29 0.011 RLP -56.65 13.45 -4.21 0.003 RPC 29.50 46.78 0.63 0.546 S = 5.8924

51 Diploma in Statistics Introduction to Regression Lecture 3.151

52 Diploma in Statistics Introduction to Regression Lecture 3.152

53 Diploma in Statistics Introduction to Regression Lecture 3.153 Regression with 1980, RPC deleted Predictor Coef SE Coef T P Constant 339.58 10.62 31.96 0.000 GNP -0.03158 0.01329 -2.38 0.045 RLP -70.155 9.660 -7.26 0.000 S = 3.92988 R-Sq = 96.8%

54 Diploma in Statistics Introduction to Regression Lecture 3.154 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review model fitting and testing procedure Case study: Predicting stamp sales for An Post –Problem formulation –Initial data analysis –Fitting and checking –Application

55 Diploma in Statistics Introduction to Regression Lecture 3.155 Exercise Calculate the predicted stamp sales for 1984 and 1985. Assume no change in nominal stamp price. Compare with the actual outcomes: 19841985 Sales163.6172.1 GNP1487.51466.6 RLP1.8351.741 Comment on the prediction errors.

56 Diploma in Statistics Introduction to Regression Lecture 3.156 Exercise Predicted Sales = 300 –.0312 GNP – 70.155 RLP To calculate the predicted sales for any year, find the values of GNP and RLP for that year and substitute them in the equation. Problem:how to get GNP and RLP for future years? Answer:use "official" predictions.

57 Diploma in Statistics Introduction to Regression Lecture 3.157 Central Bank predictions for 1984, 1985 19841985 GNP:+ 1.5%+ 1.5% Inflation:+ 8.6%+ 5.5% NB:no change in nominal stamp price in 1984 or 1985 GNP(83) = 1462.6; predicted GNP(84)= 1462.6 × 1.015 = 1484.5 RLP(83) = 1.993; assuming no change in nominal stamp price, predicted RLP(84)= 1.993 / 1.086 = 1.835

58 Diploma in Statistics Introduction to Regression Lecture 3.158 Prediction for 1984 GNP(84) = 1484.5 RLP(84) = 1.835 Predicted Sales = 340 –.0316 × GNP – 70.155 × RLP = 340 –.0316 × 1484.5 – 70.155 × 1.835 = 164.4 Actual outcome: 163.6 Prediction for 1985?Homework 3.1.1

59 Diploma in Statistics Introduction to Regression Lecture 3.159 Homework 3.1.2 Carry out the analysis of stamp sales data prior to 1970, leading to the prediction formula Sales = 371 – 176 RLP + 84 RPC, s = 5.5. Compare early and recent prediction formulas, including prediction errors. Ref:SA pp. 282-4

60 Diploma in Statistics Introduction to Regression Lecture 3.160 Reading SA § 1.6, §8.7 Alternative readings Hamilton, Ch 2, up to p. 53, Ch 3, pp. 65-72, 74-75, 77-80, Ch 4, pp. 109-117


Download ppt "Diploma in Statistics Introduction to Regression Lecture 3.11 Lecture 3.1 Multiple Regression (continued) Review Homework Review Analysis of Variance Review."

Similar presentations


Ads by Google