Download presentation
Presentation is loading. Please wait.
1
Correlation and Regression-III
QSCI 381 – Lecture 38 (Larson and Farber, Sect 9.3)
2
Overview In the last two lectures, you have learnt how to identify whether two variables are correlated, to determine the best fit line that represents the data, and to use this line to make predictions. Today, we focus on representing the uncertainty associated with those predictions.
3
Deviations-I
4
Deviations-II total variation explained variation
The about a regression line is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y. The is the sum of the squares of the differences between each predicted y-value and the mean of the y. total variation explained variation
5
unexplained variation
Deviations-III The is the sum of the squares of the differences between the y-value of each ordered pair and each corresponding predicted y-value. The sum of the explained and unexplained variations is equal to the total variation. unexplained variation
6
The Coefficient of Determination-I
The square of the correlation coefficient is called the The coefficient of determination is also equal to the ratio of the explained variation to the total variation, i.e.: coefficient of determination
7
The Coefficient of Determination-II (Example)
Compute the fraction of the variation that is unexplained if the correlation coefficient is 0.7. The coefficient of determination is r2=0.7*0.7=0.49. This implies that 49% of the total variation is explained. The percentage of the variation that is unexplained is therefore 51%.
8
standard error of estimate
Quantifying the Uncertainty of a Prediction-I (The Standard Error of Estimate) The se, is the standard deviation of the observed y-values about the predicted y-value for a given x-value: The standard error of estimate is also known as the residual standard deviation. standard error of estimate
9
Standard Error of Estimate (Blue warehou example)
Y XY X2 Y2 Y-hat (Yi-Yhat)2 3.13 5.51 17.25 9.79 30.39 5.53 0.0004 3.39 6.29 21.33 11.49 39.60 6.32 0.0007 3.38 6.39 21.62 11.46 40.78 6.30 0.0068 2.71 4.21 11.39 7.34 17.69 4.27 0.0038 3.95 7.99 31.54 15.59 63.82 8.00 0.0002 3.83 7.62 29.17 14.65 58.08 7.64 3.44 6.58 22.66 11.84 43.35 6.47 0.0124 7.94 31.40 15.63 63.10 8.02 0.0053 3.60 7.01 25.20 12.95 49.07 6.95 0.0036 3.63 25.45 13.20 7.05 0.0020 35.01 66.54 237.01 123.93 454.94 0.0354
10
Quantifying the Uncertainty of a Prediction-II (Constructing Prediction Intervals)
Given a linear regression equation and x0, a specific value of x, a c-prediction interval for y is: where The point estimate is and the maximum error of estimate is E. The probability that the prediction interval contains y is c.
11
Quantifying the Uncertainty of a Prediction-III (Constructing a prediction interval for specific value of x) Identify n and the degrees of freedom (d.f.=n-2). Fit the regression line and compute Find the critical value tc that corresponds to the given level of confidence c. Find the standard error of estimate. Find the maximum error of estimate E. Construct the prediction interval.
12
Example (Find the 95% prediction interval for the logarithm of the length of 50cm blue warehou)
n=10 so d.f.=8. The regression line is: The value of tc is The standard error of estimate is The maximum error of estimate E is: The prediction interval is therefore:
13
Uncertainty and Extrapolation-I
99% prediction intervals for predicted log-weight. Extrapolation
14
Uncertainty and Extrapolation-II
The uncertainty is least for the prediction based on the mean of the data and greatest as we make predictions further from this. Beware of making predictions beyond the data!
15
Final Warning All of the methods in this part of the course are based on the assumption of a linear relationship among the variables. This will not always be the case. Ways to allow for non-linearity exist. For example, instead of including x as an independent variable, consider x and x2.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.