Simple Linear Regression
Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting Method How to determine this regression function? (need to estimate the parameters.)
Least Squares Regression Function : Least Squares Estimates
How do we know the two estimators can minimize Q?
Terminology Fitted model True model Fitted regression function
It can be shown that
REGRESSION ON MIDTERM GRADE Obs MIDTERM FINAL Figure 1.4 SAS PROC PRINT output for the grade data problem.
TITLE ‘REGRESSION ON MIDTERM GRADE’; DATA; INPUT MIDTERM FINAL; CARDS; ; PROC PLOT; PLOT FINAL*MIDTERM=’O’ PRED*MIDTERM=’P’ / OVERLAY; LABEL FINAL=’FINAL’; PROC PRINT; PROC REG; MODEL FINAL=MIDTERM / P; OUTPUT PREDICTED=PRED RESIDUAL=RESID; PROC RANK NORMAL=VW; VAR RESID; RANKS NSCORE; PROC PLOT; PLOT RESID*NSCORE=’R’; LABEL NSCORE=’NORMAL SCORE’; RUN;
REGRESSION ON MIDTERM GRADE Model: MODEL1 Dependent Variable: FINAL Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept MIDTERM
Dep Var Predicted Obs FINAL Value Residual Sum of Residuals 0 Sum of Squared Residuals Predicted Residual SS (PRESS)
| | o | | o p p | o o | o | o p 80 + p o F | o p pp I | o o o o o N | o pp o o A | p p L | p | o o 60 + p | p o | 40 + o | NOTE: 6 obs hidden. MIDTERM Figure 1.6 Output for the first PROC PLOT step for the grade data problem.
20 + | | R | R R 10 + | R R | e | R R R s | R i | d R R--R u | R R a | R l | R R | R | | R | | R | Predicted Value of FINAL Figure 1.7 The remainder of the output from the first PROC PLOT step.
20 + | | R | R R 10 + | R R | e | R R R s | R i | d 0 + R R R u | R R a | R l | R R | R | | R | | R | NORMAL SCORE
* Confidence Interval
The range lies between –1 and 1. * Pearson’s Correlation Coefficient * Goal : The degree of linear correlation between two variables.
* Coefficient of Determination: the fraction of the variance in y that is explained by regression on x. Definition : Goal : may be used as an index of linearity for the relation of y to x.
120 + | o | | o | o | | o P | o R 80 + E | o o S | o U | o R 60 + o o | o 40 + o o | o o | o o o | o | 20 + | VOLUME Figure 3.3: A plot of the air pressure data (an example of residual analysis).
| 30 + | | * | 20 + | R | e | * s |* i | d 10 + * * u | a | * l | * * | | * * * | * | * * | * * * * * * * | Predicted Value of P Figure 3.4 The residual on fit plot after fitting the model P= a + b V + e to the air pressure data.
| * | | * | | | * * * * * * | * * * R | * * * e * * s | * i | * * * d | u | * a * l | * | * | | | * | Predicted Value of P Figure 3.5 The residual on the fit plot using the model P = a + b/V +e for the air pressure data.
Weighted Regression Problem : (unequal variance) Model : Claim : minimize Ordinary Regression Model : Claim : minimize
How to determine the weights? So the optimal weights are inversely proportional to the variances of the y.
DATA; INPUT V P; VI=1/V; CARDS; ; PROC REG; MODEL P=VI; WIGHT W; OUTPUT P=FIT R=RES; DATA; SET; WRES=SQRT(W)*RES; PROC REG; MODEL P=VI; OUTPUT P=LSFIT; DATA; SET; W=1/LSFIT; PROC RANK NORMAL=VW; VAR WRES; RANKS NSCORE; PROC PLOT; PLOT WRES*FIT=’*’ / VREF=0 VPOS=30; POLT WRES*NSCORE=’*’ /VPOS=30; LABEL WRES=’WEIGHTED RESIDUAL’ NSCORE=’NORMAL SCORE’; RUN;
| | | * W | * E | I * * * G | * * * H | * T | * * * E | * * D * | * R | * * * E | * S | I * D | * U | A | * L | * | | * | Predicted Value of P Figure 3.13 Weighted residual plot for a weighted fit of the model P = a + b/V + e to the air pressure data.
| | * * * | * | R | * * * e | * * s * * * * * i | * * * * d | * u | * * a | * l * | | | * | | Predicted Value of PT Figure 3.17 Residual on fit plot for the model –1/ P =α+ BV + e in air pressure data.
| | | * * * | * | R | * * * e | * * s 0 + * * * * * i | * * * * d | * u | * * a | * l * | | | * | | NORMAL SCORE Figure 3.18 Residual normal probability plot for the model –1/ P =α+ BV + e in air pressure data..
| * | * | | * * * | * R | * e | * * s * i | * * * d | * * u | a | * * l * | * * | | * | | * | | Predicted Value of PT Figure 3.19 Residual on fit plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.
| * | * | | * * * | * R | * e | * * s 0 + * i | * * * d | * * u | a | * * l * | * * | | * | | * | | NORMAL SCORE Figure 3.20 Residual normal probability plot for the model –1/ P =α+ BV + e in Example 3.4 after deleting the first data point.
How to determine the weights of transformation T such that (assuming T is monotonic increasing)