Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003.

Similar presentations


Presentation on theme: "Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003."— Presentation transcript:

1 Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003

2 Mathematical Modeling – Least Squares Section 2.3 Three Modeling Methods Known Relationship – underlying mathematical setting is known Finite Differences – theoretical data or hard science data with little scatter Least Squares – modeling data with scatter

3 Model: Global Warming Global warming is partly the result of burning fuels, which increases the amount of carbon dioxide in the air. One of the major sources of fuel consumption are cars. Let’s examine the number of cars in the U.S. (in millions) as one variable of global warming. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5

4 Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5

5 Linear or Curvilinear? Numeric Method – use Finite Differences method to determine if data has a near linear trend. Is the first difference nearly constant? Solution: Nearly so after 1960. YearCars 194027.5 195040.3 196061.7 197089.3 1980121.6 1990150.5 1st Finite Difference 12.8 21.4 27.6 32.3 28.9

6 Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through?

7 Eyeball line of fit

8 Linear or Curvilinear? Year Scaled Cars 127.5 240.3 361.7 489.3 5121.6 6150.5 Graphic Method – plot the data to see the trend Estimate a line of best fit What point should the line pass through? Solution: Midpoint (3.5,81.8) Estimate the trend with approximate slope of m = 30

9 Error in Model Error = observed - predicted

10 Error Error or Residual for a model is the vertical distance between the actual (observed) value and the predicted (fitted) value Error = observed - predicted Actual value: (x,y) Predicted value: (x, ) So Error = y -

11 Finding Total Error Why not just add the errors to get total error?

12 Error Cancellation

13 Finding Total Error Why not just add the errors to get total error? Solution: + and – values cancel out This gives a false sense of what an error of zero should mean

14 How do we find total error? Make errors positive by Taking the absolute value of the errors Square the errors Sum the positive errors to find the total error The best fit line makes the positive sum of errors as small as possible

15 Question Why square deviations vs. absolute deviations? Gauss-Markov Theorem: Of all methods for fitting a line to data, the least squares method provides predictions with minimum variance (a measure of uncertainty). Gauss Markov

16 Least Squares Method vs. Absolute Value Method The Least Squares Method always yields a unique best fit line, while the Absolute Value Method does not. Example: Use the NCTM Applet for Least Squares to approximate the line of best fit for the following set of points: (0,2), (0,4), (6,3), (6,5), (3,3.5)

17 Absolute Value Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

18

19

20 Least Square Method Pts: (0,2),(0,4),(6,3),(6,5),(3,3.5)

21 Outliers An outlier is a point that lies further from the line of best fit than most other points. Least-square method is more sensitive to outliers.

22 Absolute Value Method Pts: 4 near line, one outlier

23 Least Square Method Pts: 4 near line, one outlier

24 Mean Squared Error For a set of data (x,y) with n elements modeled by a line

25 Standard Deviation Standard Deviation: a measure of error found by square rooting the MSE to get a measure of error in the same dimension as the original data. so the standard deviation is

26 Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8

27 Mean Squared Error Calculate the MSE for the Global warming data when modeled by the eyeball line of best fit Solution: MSE 98.3 YearCars Observed Cars Predicted 127.56.8 240.336.8 361.766.8 489.396.8 5121.6126.8 6150.5156.8

28 Some History - Gauss Gauss came up with a mathematical model for fitting a line to data when he was 17 years old (in 1794)! However, he did not publish his findings until 1809.

29 Some History - Legendre Legendre published the first explicit account of the method of least squares in 1805, 4 years before Gauss published his. But in Gauss’s publication, he referred to his earlier (1794) work, which created a controversy about who had first discovered the method. Today both Gauss and Legendre are given credit for discovering the method of least squares independently.

30 Line of Best Fit Least Squares Fit Method – algebraic method of finding the best fit line This method gives a line which has the smallest possible mean squared error or standard deviation.

31 Verification of Least Square Fit Method Minimize MSE MSE

32 Verification of LS Method n is a fixed constant, so minimize numerator where is predicted value y i is actual observed value for x i

33 Verification of LS Method Substitute for Square

34 Verification of LS Method Now expand the summation, which requires some summation rules: Distribute Summation: Summation of Constant: Summation and Mean:

35 Verification of LS Method

36 Now the line of best fit passes through Substitute for a with and simplify

37 Verification of LS Method The left expression and is a constant value since data is known, so we ignore it when minimizing.

38 Verification of LS Method Complete square on second expression to get Since both terms are squared they are positive Subtracting the second positive term can only make the expression smaller Thus minimizing the first positive term will minimize the entire expression

39 Verification of LS Method Entire expression is minimal when Thus and in line of best fit

40 Line of Best Fit Derive’s calculated line of best fit is y = 25.3 x – 6.8 Mean Squared Error is reduced from 98.3 for the eyeballed line of fit to 34.6 for the line of best fit.

41 Line of Best Fit (blue)

42 Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear?

43 Polynomial of Best Fit Is the line of best fit a better model then a quadratic or cubic polynomial? Examine the line of best fit with the data. Is the data curvilinear? Solution: Data starts above line of best fit, then goes below and finishes back above – indication that data is curvilinear Extension of Linear Method to Other Polynomial Functions

44 Polynomial of Best Fit Derive calculated quadratic of best fit for the Global Warming Data The MSE is reduced from 34.6 for the line of best fit to 2.02 for the quadratic of best fit

45 Polynomial of Best Fit

46 Model:Higher Ed. Cost Find a model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. What is the predicted cost in 1990? 2005? When will the cost reach $3500? YearCost 1975599 1980840 19851386 19881726 19891846 19902006

47 Higher Ed. Cost Is the data linear or curvilinear? Scale data Derive or Grapher to Fit C(y) = 95.7y + 491.4 (sd 80.5) C(y) = 6.1y 2 + 671.7 (sd 56.0) YearCost 1975 Scale 0 599 1980 Sc. 5 840 1985 Sc. 10 1386 1988 Sc. 13 1726 1989 Sc. 14 1846 1990 Sc. 15 2006

48 Scatter plot & Models

49 Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 What is the predicted cost in 1990? What is the error in the prediction? YearCost 1975599 1980840 19851386 19881726 19891846 19902006

50 Predicting Cost The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 Predicted cost in 1990 which is scaled year y = 15: C(15) = 2044.2 Error in the prediction: Error = 2006 – 2044.2 = -38.2 YearCost 1975599 1980840 19851386 19881726 19891846 19902006

51 Predicting Year The model for the average cost of tuition and fees per semester for public 4-year colleges in the U.S. is C(y) = 6.1y 2 + 671.7 When will the cost reach $3500? Solve the quadratic equation 3500 = 6.1y 2 + 671.7 YearCost 1975599 1980840 19851386 19881726 19891846 19902006

52 Quadratic Equations Methods of Solving Quadratic Equations Factoring Method Square Root Method Completing the Square Quadratic Formula

53 Square Root Method Let y = 3500 in model y = 6.1x 2 + 671.7 Resulting quadratic equation has no linear term 6.1x 2 + 671.7 = 3500 How can we parallel the method of solving linear equations to solve this quadratic equation?

54 Square Root Method Solving a quadratic with no linear term Isolate the square term 6.1x 2 + 671.7 = 3500 6.1x 2 + 671.7 – 671.1 = 3500 – 671.1 6.1x 2 = 2828.9 x 2 = 2828.9/6.1 Square root to find x

55 Solving Quadratic Equation with a Linear Term Quadratic of Best Fit with Linear Term y = 3.7x 2 + 38.9x + 587.6 (sd 22.4) Let y = 3500 and solve resulting quadratic equation with a linear term 3.7x 2 + 38.9x + 587.6 = 3500 How do we solve such equations?

56 Completing the Square Method Solve by converting to a perfect square and using the Square Root Method x 2 + 4x - 5 = 0 Isolate the x terms x 2 + 4x = 5 Complete the square x 2 + 4x + 2 2 = 5 + 2 2 (x+2) 2 = 9

57 Completing the Square Method Square Root and solve (x+2) 2 = 9 x + 2 = 3 or x + 2 = -3 x = 1 or x = -5

58 Quadratic Formula Complete the square on the general quadratic to get a general solution ax 2 + bx + c = 0 ax 2 + bx = -c

59 Quadratic Formula

60 Use the Quadratic Formula to solve 3.7x 2 + 38.9x + 587.6 = 3500 3.7x 2 + 38.9x –2912.4 = 0 So a = 3.7, b = 38.9, and c =-2912.4

61 Graphic Method Graph the related function for the Higher Education problem. Equation: 3.7x 2 + 38.9x –2912.4 = 0 Related Function: f(x) = 3.7x 2 + 38.9x – 2914.4 Use Derive or graphing calculator to generate a graph and zoom-in to find an error less than 0.01

62 Graphic Solution

63 What is the approximation and error from this graph?

64 Least Squares Modeling The End


Download ppt "Unit 4: Modeling Topic 6: Least Squares Method April 1, 2003."

Similar presentations


Ads by Google