Download presentation
Presentation is loading. Please wait.
Published byBartholomew Cain Modified over 9 years ago
1
Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003
2
Mathematical Modeling – Least Squares Section 2.3 Three Modeling Methods Known Relationship – underlying mathematical setting is known Finite Differences – theoretical data or hard science data with little scatter Least Squares – modeling data with scatter
3
Model: Global Warming Global warming is partly the result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5
4
Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5
5
Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? Solution: Nearly so after 1960. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5 1st Finite Difference 12.8 21.4 27.6 32.3 28.9
6
Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through?
7
Eyeball line of fit
8
Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through? Solution: Midpoint (3.5,81.8) Estimate the trend with approximate slope of m = 30
9
Error in Model Error = observed - predicted
10
Error Error or Residual for a model is the vertical distance between the actual (observed) value and the predicted (fitted) value Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -
11
Finding Total Error Why not just add the errors to get total error?
12
Error Cancellation
13
Finding Total Error Why not just add the errors to get total error? Solution: + and – values cancel out This gives a false sense of what an error of zero should mean
14
How do we find total error? Make errors positive by Taking the absolute value of the errors Square the errors Sum the positive errors to find the total error The best fit line makes the positive sum of errors as small as possible
15
Question Why square deviations vs. absolute deviations? Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty). Gauss Markov
16
Least Squares Method vs. Absolute Value Method The Least Squares Method always yields a unique best fit line, while the Absolute Value Method does not. Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)
17
Absolute Value Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
20
Least Square Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)
21
Outliers An outlier is a point that lies further from the line of best fit than most other points. Least-square method is more sensitive to outliers.
22
Absolute Value Method Pts: 4 near line, one outlier
23
Least Square Method Pts: 4 near line, one outlier
24
Mean Squared Error For a set of data (x,y) with n elements modeled by a line
25
Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE to get a measure of error in the same dimension as the original data. so the standard deviation is
26
Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8
27
Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit Solution: MSE 98.3 YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8
28
Some History - Gauss Gauss came up with a mathematical model for fitting a line to data when he was 17 years old (in 1794)! However, he did not publish his findings until 1809.
29
Some History - Legendre Legendre published the first explicit account of the method of least squares in 1805, 4 years before Gauss published his. But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method. Today both Gauss and Legendre are given credit for discovering the method of least squares independently.
30
Line of Best Fit Least Squares Fit Method – algebraic method of finding the best fit line This method gives a line which has the smallest possible mean squared error or standard deviation.
31
Verification of Least Square Fit Method Minimize MSE MSE
32
Verification of LS Method n is a fixed constant, so minimize numerator where is predicted value y i is actual observed value for x i
33
Verification of LS Method Substitute for Square
34
Verification of LS Method Now expand the summation, which requires some summation rules: Distribute Summation: Summation of Constant: Summation and Mean:
35
Verification of LS Method
36
Now the line of best fit passes through Substitute for a with and simplify
37
Verification of LS Method The left expression and is a constant value since data is known, so we ignore it when minimizing.
38
Verification of LS Method Complete square on second expression to get Since both terms are squared they are positive Subtracting the second positive term can only make the expression smaller Thus minimizing the first positive term will minimize the entire expression
39
Verification of LS Method Entire expression is minimal when Thus and in line of best fit
40
Line of Best Fit Derive’s calculated line of best fit is y = 25.3 x – 6.8 Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.
41
Line of Best Fit (blue)
42
Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear?
43
Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear? Solution: Data starts above line of best fit, then goes below and finishes back above – indication that data is curvilinear Extension of Linear Method to Other Polynomial Functions
44
Polynomial of Best Fit Derive calculated quadratic of best fit for the Global Warming Data The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit
45
Polynomial of Best Fit
46
Model:Higher Ed. Cost Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. What is the predicted cost in 1990? 2005? When will the cost reach $3500? YearCost 1975599 1980840 19851386 19881726 19891846 19902006
47
Higher Ed. Cost Is the data linear or curvilinear? Scale data Derive or Grapher to Fit C(y) = 95.7y + 491.4 (sd 80.5) C(y) = 6.1y 2 + 671.7 (sd 56.0) YearCost 1975 Scale 0 599 1980 Sc. 5 840 1985 Sc. 10 1386 1988 Sc. 13 1726 1989 Sc. 14 1846 1990 Sc. 15 2006
48
Scatter plot & Models
49
Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 What is the predicted cost in 1990? What is the error in the prediction? YearCost 1975599 1980840 19851386 19881726 19891846 19902006
50
Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 Predicted cost in 1990 which is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2 YearCost 1975599 1980840 19851386 19881726 19891846 19902006
51
Predicting Year The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 When will the cost reach $3500? Solve the quadratic equation 3500 = 6.1y 2 + 671.7 YearCost 1975599 1980840 19851386 19881726 19891846 19902006
52
Quadratic Equations Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula
53
Square Root Method Let y = 3500 in model y = 6.1x 2 + 671.7 Resulting quadratic equation has no linear term 6.1x 2 + 671.7 = 3500 How can we parallel the method of solving linear equations to solve this quadratic equation?
54
Square Root Method Solving a quadratic with no linear term Isolate the square term 6.1x 2 + 671.7 = 3500 6.1x 2 + 671.7 – 671.1 = 3500 – 671.1 6.1x 2 = 2828.9 x 2 = 2828.9/6.1 Square root to find x
55
Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x 2 + 38.9x + 587.6 (sd 22.4) Let y = 3500 and solve resulting quadratic equation with a linear term 3.7x 2 + 38.9x + 587.6 = 3500 How do we solve such equations?
56
Completing the Square Method Solve by converting to a perfect square and using the Square Root Method x 2 + 4x - 5 = 0 Isolate the x terms x 2 + 4x = 5 Complete the square x 2 + 4x + 2 2 = 5 + 2 2 (x+2) 2 = 9
57
Completing the Square Method Square Root and solve (x+2) 2 = 9 x + 2 = 3 or x + 2 = -3 x = 1 or x = -5
58
Quadratic Formula Complete the square on the general quadratic to get a general solution ax 2 + bx + c = 0 ax 2 + bx = -c
59
Quadratic Formula
60
Use the Quadratic Formula to solve 3.7x 2 + 38.9x + 587.6 = 3500 3.7x 2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-2912.4
61
Graphic Method Graph the related function for the Higher Education problem. Equation: 3.7x 2 + 38.9x –2912.4 = 0 Related Function: f(x) = 3.7x 2 + 38.9x – 2914.4 Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01
62
Graphic Solution
63
What is the approximation and error from this graph?
64
Least Squares Modeling The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.