Download presentation
Presentation is loading. Please wait.
Published byBrian Watkins Modified over 9 years ago
1
Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
2
Diploma in Statistics Introduction to Regression Lecture 2.22 Update:Accessing data files Access the data in mstuart's get folder: –in ISS Public Access labs, click Start, then Network Shortcuts, open Get –on your own computer with TCD network access, navigate to Ntserver-usr / get –once in get, type ms, open mstuart, Diploma Reg, Excel Data, or Access the data on the Diploma web page at https://www.scss.tcd.ie:453/courses/dipstats/Loc al/ST7002_0809.php https://www.scss.tcd.ie:453/courses/dipstats/Loc al/ST7002_0809.php Open the relevant Excel file and copy the data
3
Diploma in Statistics Introduction to Regression Lecture 2.23 Homework 2.1.1 The shelf life of packaged foods depends on many factors. Dry cereal (such as corn flakes) is considered to be a moisture-sensitive product, with the shelf life determined primarily by moisture. In a study of the shelf life of one brand of cereal, packets of cereal were stored in controlled conditions (23°C and 50% relative humidity) for a range of times, and moisture content was measured. The results were as follows. Draw a scatter diagram. Comment. What action is suggested? Why?
4
Diploma in Statistics Introduction to Regression Lecture 2.24 Draw a scatter diagram. Comment. What action is suggested? Why? 2 exceptional cases; delete and investigate
5
Diploma in Statistics Introduction to Regression Lecture 2.25 Following appropriate action, the following regression was computed. The regression equation is Moisture = 2.86 + 0.0417 Storage Predictor Coef SE Coef T P Constant 2.86122 0.02488 115.01 0.000 Storage 0.041660 0.001177 35.40 0.000 S = 0.0493475 Calculate a 95% confidence interval for the daily change in moisture content; show details.
6
Diploma in Statistics Introduction to Regression Lecture 2.26 Was the action you suggested on studying the scatter diagram in part (a) justified? Explain. Predict the moisture content of a packet of cereal stored under these conditions for 5 weeks; calculate a prediction interval. What would be the effect on your interval of not taking the action you suggested on studying the scatter diagram? Why? Taste tests indicate that this brand of cereal is unacceptably soggy when the moisture content exceeds 4. Based on your prediction interval, do you think that a box of cereal that has been on the shelf for 5 weeks will be acceptable? Explain. What about 4 weeks? 3 weeks? What is acceptable?
7
Diploma in Statistics Introduction to Regression Lecture 2.27 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
8
Diploma in Statistics Introduction to Regression Lecture 2.28 Example 5 A production prediction problem Erie Metal Products: The problem Metal products fabrication: customers order varying quantities of products of varying complexity; customers demand accurate and precise order delivery times.
9
Diploma in Statistics Introduction to Regression Lecture 2.29 Erie Metal Products: The data
10
Diploma in Statistics Introduction to Regression Lecture 2.210 The multiple linear regression model Jobtime = Units × Units Ops × Ops T_Ops × T_Ops Rushed × Rushed
11
Diploma in Statistics Introduction to Regression Lecture 2.211 Model parameters The regression coefficients: Units, Ops, T_Ops, Rushed The "uncertainty" parameter: standard deviation of
12
Diploma in Statistics Introduction to Regression Lecture 2.212 Regression of Jobtime on other variables Predictor Coef SE Coef T P Constant 77.24 44.76 1.73 0.105 Units -0.1507 0.1121 -1.34 0.199 Ops 7.152 4.305 1.66 0.117 T_Ops 0.11460 0.01322 8.67 0.000 Rushed -24.94 19.11 -1.31 0.211 S = 37.4612
13
Diploma in Statistics Introduction to Regression Lecture 2.213 Homework Predict job times for small (U=100, O=5), medium (U=300, O=10) and large (U=500, O=15) jobs, both normal and rushed. Present the results in tabular form.
14
Diploma in Statistics Introduction to Regression Lecture 2.214 Homework Solution
15
Diploma in Statistics Introduction to Regression Lecture 2.215 Are these predictions useful? What is S? What is 2S? When will my order arrive? NEXT Diagnostics; analysis of residuals
16
Diploma in Statistics Introduction to Regression Lecture 2.216 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
17
Diploma in Statistics Introduction to Regression Lecture 2.217 Checking model fit Assumptions: explanatory variables are adequate error term ( ): variation is Normal variation is stable Check via residuals Response = Fit + Residual
18
Diploma in Statistics Introduction to Regression Lecture 2.218 Regression diagnostics The diagnostic plot, 'deleted' residuals vs fitted values –checking for homogeneity of error The Normal residual plot, –checking the Normal model
19
Diploma in Statistics Introduction to Regression Lecture 2.219 Residuals Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete. Prediction Jobtime = 77 – 0.15 × 21 + 7.1 × 9 + 0.11 × 189 – 25 = 135, Residual= 260 – 135 = 125
20
Diploma in Statistics Introduction to Regression Lecture 2.220 Deleted residuals Job 9, a rushed job with 21 units and 9 operations per unit, took 260 hours to complete. Deleted prediction, regression with case 9 deleted: Jobtime = 42 – 0.08 × 21 + 10 × 9 + 0.11 × 189 - 38 = 113, Deleted Residual= 260 – 113 = 147 Standardised deleted residual ≈ DR / s = 147 / 14 = 10.5
21
Diploma in Statistics Introduction to Regression Lecture 2.221 Deleted residuals Residual –observed – fitted Standardised Residual –using an estimate of based on current data Standardised Deleted Residual –calculated from data with suspect case deleted – estimated from data with suspect case deleted
22
Diploma in Statistics Introduction to Regression Lecture 2.222 The Diagnostic Plot
23
Diploma in Statistics Introduction to Regression Lecture 2.223 Scatterplot of artificial data with a highly exceptional case NB: exceptionally large Y value corresponds to small X value
24
Diploma in Statistics Introduction to Regression Lecture 2.224 Scatter plot and diagnostic plot for artificial data
25
Diploma in Statistics Introduction to Regression Lecture 2.225 Normal plot of residuals
26
Diploma in Statistics Introduction to Regression Lecture 2.226 Statistical Analysis Section 8.4 Iterating the analysis Revising the fit –revised prediction formula –revised diagnostics A further iteration
27
Diploma in Statistics Introduction to Regression Lecture 2.227 Revised fit, case 9 deleted The regression equation is Jobtime = 41.7 – 0.0835 Units + 10.0 Ops + 0.110 T_Ops – 38.2 Rushed 19 cases used, 1 cases contain missing values Predictor Coef SE Coef T P Constant 41.72 16.87 2.47 0.027 Units -0.08349 0.04186 -1.99 0.066 Ops 10.022 1.612 6.22 0.000 T_Ops 0.110016 0.004891 22.49 0.000 Rushed -38.217 7.166 -5.33 0.000 S = 13.7952
28
Diploma in Statistics Introduction to Regression Lecture 2.228 Revised fit Exercise Predict job times for small (U=100, O=5), medium (U=300, O=10) and large (U=500, O=15) jobs, normal and rushed.
29
Diploma in Statistics Introduction to Regression Lecture 2.229 Revised predictions
30
Diploma in Statistics Introduction to Regression Lecture 2.230 Recall scatter plot for artificial data
31
Diploma in Statistics Introduction to Regression Lecture 2.231 Revised diagnostics, case 9 deleted
32
Diploma in Statistics Introduction to Regression Lecture 2.232 Revised fit, cases 9, 11, 16 deleted The regression equation is Jobtime = 44.2 – 0.0693 Units + 9.83 Ops + 0.108 T_Ops – 38.0 Rushed 17 cases used, 3 cases contain missing values Predictor Coef SE Coef T P Constant 44.216 9.080 4.87 0.000 Units –0.06931 0.02853 –2.43 0.032 Ops 9.8286 0.8873 11.08 0.000 T_Ops 0.107795 0.004114 26.20 0.000 Rushed –37.960 3.857 –9.84 0.000 S = 7.41272
33
Diploma in Statistics Introduction to Regression Lecture 2.233 Revised diagnostics, cases 9, 11, 16 deleted
34
Diploma in Statistics Introduction to Regression Lecture 2.234 Coefficient estimates from three fits
35
Diploma in Statistics Introduction to Regression Lecture 2.235 Homework 2.2.1 Extend table of predictions of small medium and large jobs to include predictions based on the final fit. Compare and contrast.
36
Diploma in Statistics Introduction to Regression Lecture 2.236 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
37
Diploma in Statistics Introduction to Regression Lecture 2.237 The model fitting and testing procedure Step 1:Initial data analysis: Step 2:Least squares fit and interpretation: Step 3:Diagnostic analysis of residuals: Step 4:Iterate fit and check:
38
Diploma in Statistics Introduction to Regression Lecture 2.238 Step 1:Initial data analysis standard single variable summaries –to determine extent of variation –possible exceptional values; scatter plot matrix –to view pair wise relationships between the response and the explanatory variables and –to view pair wise relationships between the explanatory variables themselves.
39
Diploma in Statistics Introduction to Regression Lecture 2.239 Step 2: Least squares fit and interpretation calculate the best fitting regression coefficients –check meaningfulness and statistical significance; calculate s –check its usefulness for prediction –its usefulness relative to alternative estimates of standard deviation.
40
Diploma in Statistics Introduction to Regression Lecture 2.240 Step 3:Diagnostic analysis of residuals diagnostic plot –check for exceptional residuals or patterns of residuals, –possible explanations in terms of the fitted values; Normal plot –check for exceptional residuals or non-linear patterns in the residuals
41
Diploma in Statistics Introduction to Regression Lecture 2.241 Step 4:Iterate fit and check determine cases for deletion –repeat steps 2 and 3 until checks are passed.
42
Diploma in Statistics Introduction to Regression Lecture 2.242 Homework 2.2.2 You have been asked to comment, as a statistical consultant, on a prediction formula for forecasting job completion times prepared by a former employee. The formula is, effectively, the one derived from the first fit discussed above. Write a report for management. Your report should refer to (i)the practical usefulness of the employee's prediction formula, from a customer's perspective, (ii) the significance of the exceptional cases from the customer's and management's perspectives, and (iii) your recommended formula, with its relative advantages.
43
Diploma in Statistics Introduction to Regression Lecture 2.243 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
44
Diploma in Statistics Introduction to Regression Lecture 2.244 t-tests First fit The regression equation is Jobtime = 77.2 – 0.151 Units + 7.15 Ops + 0.115 T_Ops – 24.9 Rushed Predictor Coef SE Coef T P Constant 77.24 44.76 1.73 0.105 Units –0.1507 0.1121 –1.34 0.199 Ops 7.152 4.305 1.66 0.117 T_Ops 0.11460 0.01322 8.67 0.000 Rushed –24.94 19.11 –1.31 0.211 S = 37.4612
45
Diploma in Statistics Introduction to Regression Lecture 2.245 Revised fit, case 9 deleted The regression equation is Jobtime = 41.7 – 0.0835 Units + 10.0 Ops + 0.110 T_Ops – 38.2 Rushed 19 cases used, 1 cases contain missing values Predictor Coef SE Coef T P Constant 41.72 16.87 2.47 0.027 Units -0.08349 0.04186 -1.99 0.066 Ops 10.022 1.612 6.22 0.000 T_Ops 0.110016 0.004891 22.49 0.000 Rushed -38.217 7.166 -5.33 0.000 S = 13.7952
46
Diploma in Statistics Introduction to Regression Lecture 2.246 Revised fit, cases 9, 11, 16 deleted The regression equation is Jobtime = 44.2 – 0.0693 Units + 9.83 Ops + 0.108 T_Ops – 38.0 Rushed 17 cases used, 3 cases contain missing values Predictor Coef SE Coef T P Constant 44.216 9.080 4.87 0.000 Units –0.06931 0.02853 –2.43 0.032 Ops 9.8286 0.8873 11.08 0.000 T_Ops 0.107795 0.004114 26.20 0.000 Rushed –37.960 3.857 –9.84 0.000 S = 7.41272
47
Diploma in Statistics Introduction to Regression Lecture 2.247 Homework 2.2.3 Make a table of the t values and corresponding s values for the three regressions Compare, contrast and explain.
48
Diploma in Statistics Introduction to Regression Lecture 2.248 Introduction to Regression Lecture 2.2 1.Review of Lecture 2.1 –Homework –Multiple regression –Job times case study 2.Job times continued –residual analysis –model fitting and testing 3.Model fitting and testing procedure 4.t-tests 5.Analysis of Variance
49
Diploma in Statistics Introduction to Regression Lecture 2.249 Analysis of Variance S = 7.41272 R-Sq = 99.8% R-Sq(adj) = 99.7% Analysis of Variance Source DF SS MS F P Regression 4 299165 74791 1361.12 0.000 Residual Error 12 659 55 Total 16 299824 Residual Mean Square = s 2 :check!
50
Diploma in Statistics Introduction to Regression Lecture 2.250 Analysis of Variance Regression Sum of Squares measures explained variation Residual Sum of Squares measures unexplained (chance) variation Total Variation= Explained +Unexplained Check it!
51
Diploma in Statistics Introduction to Regression Lecture 2.251 Analysis of Variance Regression Sum of Squares measures explained variation Residual Sum of Squares measures unexplained (chance) variation Total Variation= Explained +Unexplained F = MS(Reg) / MS(Res) with 4 and 12 degrees of freedom. Check it!Check F tables.
52
Diploma in Statistics Introduction to Regression Lecture 2.252 Reduction in Prediction Error No fit prediction error:s No fit =s Y = 202 1st fit prediction error:s 1st fit =37.5,less by factor of 5.4 2nd fit prediction error:s 2nd fit =13.8,less by factor of 2.7 3rd fit prediction error:s 3rd fit =7.4,less by factor of 1.9
53
Diploma in Statistics Introduction to Regression Lecture 2.253 Reading SA §§ 8.2 - 8.6, § 1.6 Extra Notes: Degrees of Freedom R 2 and Adjusted R 2 (Further Interpretation of the Correlation Coefficient)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.