Download presentation
Presentation is loading. Please wait.
1
Calculating the Least Squares Regression Line
Lecture 45 Sec Fri, Nov 30, 2007
2
The Least Squares Regression Line
The equation of the regression line is of the form y^ = a + bx. We need to find the coefficients a and b from the data.
3
The Least Squares Regression Line
The formula for b is The formula for a is or
4
Formula 1 Consider again the data set x y 1 8 3 12 4 9 5 14 16 20 11
17 15 24
5
Formula 1 Compute the x and y deviations. x y x –x y –y 1 8 -6 -7 3
12 -4 -3 4 9 5 14 -2 -1 16 20 2 11 17 15 24
6
Formula 1 Compute the squared deviations. x y x –x y –y (x –x)2
8 -6 -7 36 49 42 3 12 -4 -3 16 9 4 18 5 14 -2 -1 2 20 25 10 11 17 15 24 64 81 72
7
Formula 1 Find the sums. x y x –x y –y (x –x)2 (y –y)2 1 8 -6 -7
36 49 42 3 12 -4 -3 16 9 4 18 5 14 -2 -1 2 20 25 10 11 17 15 24 64 81 72 150 206 165
8
Formula 1 Compute b: Then compute a: The equation is
9
Formula 2 Consider yet again the data set x y 1 8 3 12 4 9 5 14 16 20
11 17 15 24
10
Formula 2 Square x and y and find xy. x y x2 y2 xy 1 8 64 3 12 9 144
36 4 16 81 5 14 25 196 70 256 128 20 400 180 11 17 121 289 187 15 24 225 576 360
11
Formula 2 Add up the columns. x y x2 y2 xy 1 8 64 3 12 9 144 36 4 16
81 5 14 25 196 70 256 128 20 400 180 11 17 121 289 187 15 24 225 576 360 56 120 542 2006 1005
12
Method 2 Compute b: Then compute a as before: The equation is
13
Example The second method is usually easier (really) if you are doing it by hand. By either method, we get the equation y^ = x.
14
TI-83 – Regression Line On the TI-83, we could use 2-Var Stats to get the basic summations. Then use Formula 2 for b. Try it: Enter 2-Var Stats L1, L2.
15
TI-83 – Regression Line 2-Var Stats L1, L2 reports that
x = 56 x2 = 542 y = 120 y2 = 2006 xy = 1005 Then use the formulas.
16
TI-83 – Regression Line Or we can use the LinReg function (#8).
Put the x values in L1 and the y values in L2. Select STAT > CALC > LinReg(a+bx). Press Enter. LinReg(a+bx) appears in the display. Enter L1, L2. Press Enter.
17
TI-83 – Regression Line The following appear in the display.
The title LinReg. The equation y = a + bx. The value of a. The value of b. The value of r2 (to be discussed later). The value of r (to be discussed later).
18
TI-83 – Regression Line To graph the regression line along with the scatterplot, Put the x values in L1 and the y values in L2. Select STAT > CALC > LinReg(a+bx). Press Enter. LinReg(a+bx) appears in the display. Enter L1, L2, Y1 Press Enter.
19
TI-83 – Regression Line To graph the regression line along with the scatterplot, Put the x values in L1 and the y values in L2. Select STAT > CALC > LinReg(a+bx). Press Enter. LinReg(a+bx) appears in the display. Enter L1, L2, Y1 Press Enter. Add this
20
TI-83 – Regression Line Press Y= to see the equation.
Press ZOOM > ZoomStat to see the graph.
21
Free Lunch Participation vs. Graduation Rate
Find the equation of the regression line for the school-district data on the free-lunch participation rate vs. the graduation rate. Let x be the free-lunch participation. Let y be the graduation rate.
22
Free Lunch Participation vs. Graduation Rate
District Free Lunch Grad. Rate Amelia 41.2 68.9 King and Queen 59.9 64.1 Caroline 40.2 62.9 King William 27.9 67.0 Charles City 45.8 67.7 Louisa 44.9 80.1 Chesterfield 22.5 80.5 New Kent 13.9 77.0 Colonial Hgts 25.7 73.0 Petersburg 61.6 54.6 Cumberland 55.3 63.9 Powhatan 12.2 89.3 Dinwiddie 45.2 71.4 Prince George 30.9 85.0 Goochland 23.3 76.3 Richmond 74.0 46.9 Hanover 13.7 90.1 Sussex 74.8 59.0 Henrico 30.2 81.1 West Point 19.1 82.0 Hopewell 63.1 63.4
23
Free Lunch Participation vs. Graduation Rate
The regression equation is y^ = – 0.494x.
24
Scatter Plot 90 80 Graduation Rate 70 60 50 Free Lunch Rate 20 30 40
25
Scatter Plot with Regression Line
90 80 Graduation Rate 70 60 50 Free Lunch Rate 20 30 40 50 60 70 80
26
Predicting y What graduation rate would we predict in a district if we knew that the free-lunch participation rate was 50%?
27
Scatter Plot with Regression Line
90 Predicted Point 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
28
Scatter Plot with Regression Line
90 Predicted Point 80 Graduation Rate 70 66.3 60 Predicted Graduation Rate 50 Free Lunch Rate 10 20 30 40 50 60 70 80
29
Variation in the Model There is a very simple relationship between the variation in the observed y values and the variation in the predicted y values.
30
Observed y and Predicted y
90 Observed y Predicted y 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
31
SST = Variation in the Observed y
90 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
32
SSR = Variation in the Predicted y
90 80 Graduation Rate 70 60 50 Free Lunch Rate 10 20 30 40 50 60 70 80
33
Variation in Observed y
The variation in the observed y is measured by SST (same as SSY). For graduation rate data (L2), SST =
34
Variation in Predicted y
The variation in the predicted y is measured by SSR. For predicted graduation rate data, let L3 = Y1(L1). SSR =
35
SSE = Residual Sum of Squares
It turns out that SST = SSE + SSR. That is,
36
Sum Squared Error In the example, SST – SSR = 2598.18 – 1896.67
= If we compute the sum of the squared residuals directly, we get SSE =
37
Explaining the Variability
In the equation SST = SSE + SSR, SSR is the amount of variability in y that is explained by the model. SSE is the amount of variability in y that is not explained by the model.
38
Explaining the Variability
In the last example, how much variability in graduation rate is explained by the model (by free-lunch participation)?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.