Download presentation
Presentation is loading. Please wait.
1
The Lease Squares Line Finite 1.3
2
Estimating equation for line from data
Given individual data points… y w Question: What should be considered a good line? w w w w w w w w w w w w w w x Estimating equation for line from data
3
A good line is one that minimizes the sum of squared differences between the
points and the line. Lease Squares Line
4
The Least Squares (Regression) Line
Sum of squared differences = (2 - 1)2 + (4 - 2)2 + ( )2 + ( )2 = 6.89 Sum of squared differences = (2 -2.5)2 + ( )2 + ( )2 + ( )2 = 3.99 Let us compare two lines 4 (2,4) The second line is horizontal w w (4,3.2) 3 2.5 2 w (1,2) (3,1.5) w The smaller the sum of squared differences the better the fit of the line to the data. 1 2 3 4 Lease Squares Line 4
5
We are given the following ordered pairs: (1. 2,1), (1. 3,1. 6), (1
We are given the following ordered pairs: (1.2,1), (1.3,1.6), (1.7,2.7), (2,2), (3,1.8), (3,3), (3.8,3.3), (4,4.2). They are shown in the scatterplot below: Least Squares Line
6
If we draw a line, not the best line, necessarily, but a line, as shown, we can begin to consider how well it fits the data. From each data point, we construct a vertical line segment to the line. This distance gives us an indication of the error, the difference between the predicted and actual y values. Squaring this error, which may be positive or negative, gives all positive values, an advantage in finding a total. The sum of the squares gives us a measure of the scatter of the data away from the line.
7
We try drawing another line, this time a horizontal line is shown
We try drawing another line, this time a horizontal line is shown. The squares are still fairly large. Least Squares Line
8
This line seems like a better fit
This line seems like a better fit. It has a positive slope and an intercept that seems reasonable. The total sum of the squares is less than that for the two previous lines.
9
Wow, this line with a negative slope does not fit so well
Wow, this line with a negative slope does not fit so well. The sum of the squares will be very large. This would make a very poor model.
10
Again, this line looks much better
Again, this line looks much better. This is clearly a better model than some of the earlier attempts.
11
This line does not look as good as the purple or pink ones.
12
This line has larger squares than some of the others
This line has larger squares than some of the others. This is not the best model.
13
This is the line based on calculations
This is the line based on calculations. This is very similar to the purple one.The equation is (The graphs are just approximations, and are not exact.)
14
This shows the approximate sums of the squares of the previous examples. The smaller this quantity, the better the model. Fortunately, we have a technique that allows us to go straight to the equation without all the guesswork. Each colored square represents the total area of the squares for an earlier example. The yellow was the worst of the proposals, and the dark green the best.
15
Below are the standardized coordinates
Below are the standardized coordinates. All ordered pairs (x,y) are now represented (Zx, Zy ). The best fit line will pass through the origin. Remember, to standardize is to calculate a z-score.
16
The line drawn is the best fit line.
The difference between the data point and the line is shown.
17
In order to find the best fit line we want to minimize the quantity
This is the standardized sum of the squares of the differences, divided by degrees of freedom to adjust for sample size. Least Squares Line
18
Least Squares Line The equation of the best fit line is where and
This means that the equation can be found if you have the means and standard deviations for both x and y, even without knowing all of the data values. We usually make use of technology to carry out these calculations, and formulas are always provided, but do know how to use the formulas. Least Squares Line
19
Correlation Coefficient
Technology can calculate the quality of a least square line. r is used as a coefficient, -1 to 1 are values Often r2 is used, so zero to positive one are values, the closer to positive one the better the correlation. Correlation Coefficient
20
Pages 34 – 37 13, 17, 19, 25 Homework
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.