Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003

Mathematical Modeling – Least Squares Section 2.3 Three Modeling Methods Known Relationship – underlying mathematical setting is known Finite Differences – theoretical data or hard science data with little scatter Least Squares – modeling data with scatter

Model: Global Warming Global warming is partly the result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5

Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5

Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? Solution: Nearly so after 1960. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5 1st Finite Difference 12.8 21.4 27.6 32.3 28.9

Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through?

Eyeball line of fit

Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through? Solution: Midpoint (3.5,81.8) Estimate the trend with approximate slope of m = 30

Error in Model Error = observed - predicted

Error Error or Residual for a model is the vertical distance between the actual (observed) value and the predicted (fitted) value Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -

Finding Total Error Why not just add the errors to get total error?

Error Cancellation

Finding Total Error Why not just add the errors to get total error? Solution: + and – values cancel out This gives a false sense of what an error of zero should mean

How do we find total error? Make errors positive by Taking the absolute value of the errors Square the errors Sum the positive errors to find the total error The best fit line makes the positive sum of errors as small as possible

Question Why square deviations vs. absolute deviations? Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty). Gauss Markov

Least Squares Method vs. Absolute Value Method The Least Squares Method always yields a unique best fit line, while the Absolute Value Method does not. Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)

Absolute Value Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Least Square Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

Outliers An outlier is a point that lies further from the line of best fit than most other points. Least-square method is more sensitive to outliers.

Absolute Value Method Pts: 4 near line, one outlier

Least Square Method Pts: 4 near line, one outlier

Mean Squared Error For a set of data (x,y) with n elements modeled by a line

Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE to get a measure of error in the same dimension as the original data. so the standard deviation is

Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8

Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit Solution: MSE 98.3 YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8

Some History - Gauss Gauss came up with a mathematical model for fitting a line to data when he was 17 years old (in 1794)! However, he did not publish his findings until 1809.

Some History - Legendre Legendre published the first explicit account of the method of least squares in 1805, 4 years before Gauss published his. But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method. Today both Gauss and Legendre are given credit for discovering the method of least squares independently.

Line of Best Fit Least Squares Fit Method – algebraic method of finding the best fit line This method gives a line which has the smallest possible mean squared error or standard deviation.

Verification of Least Square Fit Method Minimize MSE MSE

Verification of LS Method n is a fixed constant, so minimize numerator where is predicted value y i is actual observed value for x i

Verification of LS Method Substitute for Square

Verification of LS Method Now expand the summation, which requires some summation rules: Distribute Summation: Summation of Constant: Summation and Mean:

Verification of LS Method

Now the line of best fit passes through Substitute for a with and simplify

Verification of LS Method The left expression and is a constant value since data is known, so we ignore it when minimizing.

Verification of LS Method Complete square on second expression to get Since both terms are squared they are positive Subtracting the second positive term can only make the expression smaller Thus minimizing the first positive term will minimize the entire expression

Verification of LS Method Entire expression is minimal when Thus and in line of best fit

Line of Best Fit Derive’s calculated line of best fit is y = 25.3 x – 6.8 Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.

Line of Best Fit (blue)

Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear?

Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear? Solution: Data starts above line of best fit, then goes below and finishes back above – indication that data is curvilinear Extension of Linear Method to Other Polynomial Functions

Polynomial of Best Fit Derive calculated quadratic of best fit for the Global Warming Data The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit

Polynomial of Best Fit

Model:Higher Ed. Cost Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. What is the predicted cost in 1990? 2005? When will the cost reach $3500? YearCost 1975599 1980840 19851386 19881726 19891846 19902006

Higher Ed. Cost Is the data linear or curvilinear? Scale data Derive or Grapher to Fit C(y) = 95.7y + 491.4 (sd 80.5) C(y) = 6.1y 2 + 671.7 (sd 56.0) YearCost 1975 Scale 0 599 1980 Sc. 5 840 1985 Sc. 10 1386 1988 Sc. 13 1726 1989 Sc. 14 1846 1990 Sc. 15 2006

Scatter plot & Models

Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 What is the predicted cost in 1990? What is the error in the prediction? YearCost 1975599 1980840 19851386 19881726 19891846 19902006

Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 Predicted cost in 1990 which is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2 YearCost 1975599 1980840 19851386 19881726 19891846 19902006

Predicting Year The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 When will the cost reach $3500? Solve the quadratic equation 3500 = 6.1y 2 + 671.7 YearCost 1975599 1980840 19851386 19881726 19891846 19902006

Quadratic Equations Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula

Square Root Method Let y = 3500 in model y = 6.1x 2 + 671.7 Resulting quadratic equation has no linear term 6.1x 2 + 671.7 = 3500 How can we parallel the method of solving linear equations to solve this quadratic equation?

Square Root Method Solving a quadratic with no linear term Isolate the square term 6.1x 2 + 671.7 = 3500 6.1x 2 + 671.7 – 671.1 = 3500 – 671.1 6.1x 2 = 2828.9 x 2 = 2828.9/6.1 Square root to find x

Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x 2 + 38.9x + 587.6 (sd 22.4) Let y = 3500 and solve resulting quadratic equation with a linear term 3.7x 2 + 38.9x + 587.6 = 3500 How do we solve such equations?

Completing the Square Method Solve by converting to a perfect square and using the Square Root Method x 2 + 4x - 5 = 0 Isolate the x terms x 2 + 4x = 5 Complete the square x 2 + 4x + 2 2 = 5 + 2 2 (x+2) 2 = 9

Completing the Square Method Square Root and solve (x+2) 2 = 9 x + 2 = 3 or x + 2 = -3 x = 1 or x = -5

Quadratic Formula Complete the square on the general quadratic to get a general solution ax 2 + bx + c = 0 ax 2 + bx = -c

Quadratic Formula

Use the Quadratic Formula to solve 3.7x 2 + 38.9x + 587.6 = 3500 3.7x 2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-2912.4

Graphic Method Graph the related function for the Higher Education problem. Equation: 3.7x 2 + 38.9x –2912.4 = 0 Related Function: f(x) = 3.7x 2 + 38.9x – 2914.4 Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01

Graphic Solution

What is the approximation and error from this graph?

Least Squares Modeling The End

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Similar presentations

Presentation on theme: "Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Similar presentations

Presentation on theme: "Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003."— Presentation transcript:

Similar presentations

About project

Feedback