Download presentation
Presentation is loading. Please wait.
Published byEmory Allison Modified over 9 years ago
1
Simple Linear Regression
2
Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting Method How to determine this regression function? (need to estimate the parameters.)
3
Least Squares Regression Function : Least Squares Estimates
6
How do we know the two estimators can minimize Q?
7
Terminology Fitted model True model Fitted regression function
8
It can be shown that
13
REGRESSION ON MIDTERM GRADE Obs MIDTERM FINAL 1 68 75 2 49 63 3 60 57 4 68 88 5 97 88 6 82 79 7 59 82 8 50 73 9 73 90 10 39 62 11 71 70 12 95 96 13 61 76 14 72 75 15 87 85 16 40 40 17 66 74 18 58 70 19 58 75 20 77 72 Figure 1.4 SAS PROC PRINT output for the grade data problem.
14
TITLE ‘REGRESSION ON MIDTERM GRADE’; DATA; INPUT MIDTERM FINAL; CARDS; 68 75 49 63 60 57. 77 72 ; PROC PLOT; PLOT FINAL*MIDTERM=’O’ PRED*MIDTERM=’P’ / OVERLAY; LABEL FINAL=’FINAL’; PROC PRINT; PROC REG; MODEL FINAL=MIDTERM / P; OUTPUT PREDICTED=PRED RESIDUAL=RESID; PROC RANK NORMAL=VW; VAR RESID; RANKS NSCORE; PROC PLOT; PLOT RESID*NSCORE=’R’; LABEL NSCORE=’NORMAL SCORE’; RUN;
15
REGRESSION ON MIDTERM GRADE Model: MODEL1 Dependent Variable: FINAL Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 1774.44117 1774.44117 24.26 0.0001 Error 18 1316.55883 73.14216 Corrected Total 19 3091.00000 Root MSE 8.55232 R-Square 0.5741 Dependent Mean 74.50000 Adj R-Sq 0.5504 Coeff Var 11.47962 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 34.56757 8.32984 4.15 0.0006 MIDTERM 1 0.60049 0.12192 4.93 0.0001
16
Dep Var Predicted Obs FINAL Value Residual 1 75.0000 75.4007 -0.4007 2 63.0000 63.9915 -0.9915 3 57.0000 70.5968 -13.5968 4 88.0000 75.4007 12.5993 5 88.0000 92.8149 -4.8149 6 79.0000 83.8076 -4.8076 7 82.0000 69.9963 12.0037 8 73.0000 64.5920 8.4080 9 90.0000 78.4032 11.5968 10 62.0000 57.9866 4.0134 11 70.0000 77.2022 -7.2022 12 96.0000 91.6139 4.3861 13 76.0000 71.1973 4.8027 14 75.0000 77.8027 -2.8027 15 85.0000 86.8100 -1.8100 16 40.0000 58.5871 -18.5871 17 74.0000 74.1998 -0.1998 18 70.0000 69.3959 0.6041 19 75.0000 69.3959 5.6041 20 72.0000 80.8051 -8.8051 Sum of Residuals 0 Sum of Squared Residuals 1316.55883 Predicted Residual SS (PRESS) 1668.47241
17
| 100 + | o | | o p p | o o | o | o p 80 + p o F | o p pp I | o o o o o N | o pp o o A | p p L | p | o o 60 + p | p o | 40 + o | -+------------+------------+------------+------------+------------+------------+------------+ 30 40 50 60 70 80 90 100 NOTE: 6 obs hidden. MIDTERM Figure 1.6 Output for the first PROC PLOT step for the grade data problem.
18
20 + | | R | R R 10 + | R R | e | R R R s | R i | d 0 +---------------------------------R---------R--R--------------------------------------------- u | R R a | R l | R R | R -10 + | | R | | R -20 + | --+----------+----------+----------+----------+----------+----------+----------+----------+-- 55 60 65 70 75 80 85 90 95 Predicted Value of FINAL Figure 1.7 The remainder of the output from the first PROC PLOT step.
19
20 + | | R | R R 10 + | R R | e | R R R s | R i | d 0 + R R R u | R R a | R l | R R | R -10 + | | R | | R -20 + | --+----------+----------+----------+----------+----------+----------+----------+----------+-- -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 NORMAL SCORE
20
* Confidence Interval
21
The range lies between –1 and 1. * Pearson’s Correlation Coefficient * Goal : The degree of linear correlation between two variables.
22
* Coefficient of Determination: the fraction of the variance in y that is explained by regression on x. Definition : Goal : may be used as an index of linearity for the relation of y to x.
23
120 + | o | | o | 100 + o | | o P | o R 80 + E | o o S | o U | o R 60 + o o | o 40 + o o | o o | o o o | o | 20 + | ---+---------+---------+---------+---------+---------+---------+---------+---------+-- 10 15 20 25 30 35 40 45 50 VOLUME Figure 3.3: A plot of the air pressure data (an example of residual analysis).
24
| 30 + | | * | 20 + | R | e | * s |* i | d 10 + * * u | a | * l | * * | | * * 0 +------------------------------------------------------------------------------*------------- | * | * * | * * * * -10 + * * * | -+---------+---------+---------+---------+---------+---------+---------+---------+---------+- 16.357 25.007 33.658 42.308 50.959 59.609 68.259 76.910 85.560 94.210 Predicted Value of P Figure 3.4 The residual on fit plot after fitting the model P= a + b V + e to the air pressure data.
25
0.50 + | * | | * | 0.25 + | | * * * * * * | * * * R | * * * e 0.00 +-----------------------*--------------------------*------------------------ s | * i | * * * d | u | * a -0.25 + * l | * | * | -0.50 + | | * | -0.75 + ---+-------------+-------------+-------------+-------------+-------------+-- 20 40 60 80 100 120 Predicted Value of P Figure 3.5 The residual on the fit plot using the model P = a + b/V +e for the air pressure data.
26
Weighted Regression Problem : (unequal variance) Model : Claim : minimize Ordinary Regression Model : Claim : minimize
27
How to determine the weights? So the optimal weights are inversely proportional to the variances of the y.
28
DATA; INPUT V P; VI=1/V; CARDS; 48 29.1. 12 117.6 ; PROC REG; MODEL P=VI; WIGHT W; OUTPUT P=FIT R=RES; DATA; SET; WRES=SQRT(W)*RES; PROC REG; MODEL P=VI; OUTPUT P=LSFIT; DATA; SET; W=1/LSFIT; PROC RANK NORMAL=VW; VAR WRES; RANKS NSCORE; PROC PLOT; PLOT WRES*FIT=’*’ / VREF=0 VPOS=30; POLT WRES*NSCORE=’*’ /VPOS=30; LABEL WRES=’WEIGHTED RESIDUAL’ NSCORE=’NORMAL SCORE’; RUN;
29
| 0.050 + | | * W | * E | I 0.025 + * * * G | * * * H | * T | * * * E | * * D 0.000 +-----------------------*--------------------------------------------------- | * R | * * * E | * S | I -0.025 + * D | * U | A | * L | -0.050 + * | | * -0.075 + | ---+-------------+-------------+-------------+-------------+-------------+-- 20 40 60 80 100 120 Predicted Value of P Figure 3.13 Weighted residual plot for a weighted fit of the model P = a + b/V + e to the air pressure data.
30
0.0002 + | | * 0.0001 + * * | * | R | * * * e | * * s 0 +------*--------*-------------------------------*---------------*--------------------* i | * * * * d | * u | * * a | * l -0.0001 + * | -0.0002 + | | * | -0.0003 + | ---+---------------+---------------+---------------+---------------+---------------+-- -0.034 -0.029 -0.024 -0.019 -0.014 -0.009 Predicted Value of PT Figure 3.17 Residual on fit plot for the model –1/ P =α+ BV + e in air pressure data.
31
| 0.0002 + | | * 0.0001 + * * | * | R | * * * e | * * s 0 + * * * * * i | * * * * d | * u | * * a | * l -0.0001 + * | -0.0002 + | | * | -0.0003 + | ---+------------------+------------------+------------------+------------------+-- -2 -1 0 1 2 NORMAL SCORE Figure 3.18 Residual normal probability plot for the model –1/ P =α+ BV + e in air pressure data..
32
| 0.0001 + * | * | | * 0.00005 + * * | * R | * e | * * s 0 +----------------------------------------------------*------------------------ i | * * * d | * * u | a | * * l -0.00005 + * | * * | | * -0.0001 + | | * | -0.00015 + | ---+-------+-------+-------+-------+-------+-------+-------+-------+-------+-- -0.033 -0.030 -0.027 -0.024 -0.021 -0.018 -0.016 -0.013 -0.010 -0.007 Predicted Value of PT Figure 3.19 Residual on fit plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.
33
| 0.0001 + * | * | | * 0.00005 + * * | * R | * e | * * s 0 + * i | * * * d | * * u | a | * * l -0.00005 + * | * * | | * -0.0001 + | | * | -0.00015 + | ---+------------------+------------------+------------------+------------------+-- -2 -1 0 1 2 NORMAL SCORE Figure 3.20 Residual normal probability plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.
34
How to determine the weights of transformation T such that (assuming T is monotonic increasing)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.