Download presentation
Published byScot Mason Modified over 9 years ago
1
Regression Regression relationship = trend + scatter
8 y = 5 + 2x data point (8, 25) 25 21 prediction error Regression is about fitting a line or curve to bivariate data to predict the value of a variable y based on the value of an independent variable x. Regression relationship = trend + scatter Observed value = predicted value + prediction error
2
Residual A line of best fit will be used to predict a value of y for a given value of x. The difference between the measured value y and the predicted value ŷ is called the residual. Residual = y-ŷ Residual = observed value – predicted value
3
Regression Line Obviously, we would like all these
residuals to be as small as possible. A technique is least squares regression minimises the sum of the squares of the residuals, the line found by this technique is therefore called the least squares regression line of y on x, or simply the regression line.
4
Complete the table below
Data Point (8, 25) (3, 7) (-2, -3) (x, y) Observed y-value 25 y Fitted line Predicted value / fitted value 21 Prediction error / residual 4 y -
5
Complete the table below
Data Point (8, 25) (3, 7) (-2, -3) (x, y) Observed y-value 25 7 -3 y Fitted line Predicted value / fitted value 21 19 -1 Prediction error / residual 4 -12 -2 y -
6
The Least Squares Regression Line
Which line? Choose the line with smallest sum of squared prediction errors. Minimise the sum of squared prediction errors Minimise
7
The Least Squares Regression Line
There is one and only one least squares regression line for every linear regression for the least squares line but it is also true for many other lines is on the least squares line Calculator or computer gives the equation of the least squares line
8
Residuals Plot The pattern of residuals allows you to see if your regression line is a good fit for the data and how reliable interpolation and extrapolation will be. If the model is a good fit, the residuals will oscillate closely above and below the zero line.
9
Correlation coefficient = 0.8352 This is √0.6975
Temperature oF Chirps per second 69.4 15.4 69.7 14.7 71.6 16.0 75.2 15.5 76.3 14.4 79.6 15.0 80.6 17.1 82.0 82.6 17.2 83.3 16.2 83.5 17.0 84.3 18.4 88.6 20.0 93.3 19.8 Correlation coefficient = This is √0.6975
10
Predicted chirps per second Observed chirps per second
Residuals Temperature oF Chirps per second 69.4 15.4 69.7 14.7 71.6 16.0 75.2 15.5 76.3 14.4 79.6 15.0 80.6 17.1 82.0 82.6 17.2 83.3 16.2 83.5 17.0 84.3 18.4 88.6 20.0 93.3 19.8 Predicted chirps per second Observed chirps per second Residuals 14.4 15.4 1.0 14.5 14.7 0.2 14.9 16.0 1.1 15.6 15.5 -0.1 15.9 -1.5 16.6 15.0 -1.6 16.8 17.1 0.3 -0.8 0.0 17.2 17.3 16.2 -1.1 17.4 17.0 -0.4 17.6 18.4 0.8 18.5 20.0 1.5 19.5 19.8 The regression line is: y = x which is what we use to get the predicted value of y. Eg. x = 69.4 oF y = (69.4) – = 14.4 chirps per second Residual = Observed – Predicted Value Value
11
Predicted chirps per second Observed chirps per second
Residuals Predicted chirps per second Observed chirps per second Residuals 14.4 15.4 1.0 14.5 14.7 0.2 14.9 16.0 1.1 15.6 15.5 -0.1 15.9 -1.5 16.6 15.0 -1.6 16.8 17.1 0.3 -0.8 0.0 17.2 17.3 16.2 -1.1 17.4 17.0 -0.4 17.6 18.4 0.8 18.5 20.0 1.5 19.5 19.8 The plot of the residuals shows that they are randomly scattered, so in this case a linear model is appropriate.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.