Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 2 Fit the best curve to a discrete data set and obtain estimates for other data points Two general approaches: –Data exhibit a significant degree of scatter. Find a single curve that represents the general trend of the data. –Data is very precise. Pass a curve(s) exactly through each of the points: interpolation. Curve Fitting
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3 In sciences, if several measurements are made of a particular quantity, additional insight can be gained by summarizing the data in one or more well chosen statistics: Arithmetic mean - The sum of the individual data points (y i ) divided by the number of points. Standard deviation – a common measure of spread for a sample or variance Coefficient of variation – quantifies the spread of data (similar to relative error) Simple Statistics
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 4 Fitting a straight line to a set of paired observations: (x 1, y 1 ), (x 2, y 2 ),…,(x n, y n ) y i : measured value e i : error (residual) y i = a 0 + a 1 x i + e i e i = y i - a 0 - a 1 x i a 1 : slope a 0 : intercept Linear Regression e i Error Line equation y = a 0 + a 1 x
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 5 The most commonly used strategy is to minimize the sum of the squares of the residuals between the measured-y and the y calculated with the linear model: Yields a unique line for a given set of data Need to compute a 0 and a 1 such that S r is minimized! e i Error Criteria For a “Best Fit”
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Least-Squares Fit of a Straight Line Normal equations which can be solved simultaneously 6
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Least-Squares Fit of a Straight Line Normal equations which can be solved simultaneously Mean values 7
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. S r = Sum of the squares of residuals around the regression line S t = total sum of the squares around the mean (S t – S r ) quantifies the improvement or error reduction due to describing data in terms of a straight line rather than as an average value. For a perfect fit S r =0 and r = r 2 = 1 signifies that the line explains 100 % of the variability of the data. For r = r 2 = 0 S r =S t the fit represents no improvement r : correlation coefficient r 2 : coefficient of determination “Goodness” of our fit The spread of data (a)around the mean (b)around the best-fit line Notice the improvement in the error due to linear regression 8
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Use linear regression to fit the following data to a straight line. Solution: Solving this system of two equations gives a 0 = 4.56 and a 1 = 0.6 y = x xyx2x2 xy ∑
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Use linear regression to fit the following data to a straight line. Solution: Solving this system of two equations gives a 0 =5 and a 1 =4 y = 5 + 4x xyx2x2 xy ∑
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Linearization of Nonlinear Relationships (a)Data that is ill-suited for linear least-squares regression (b)Indication that a parabola may be more suitable Exponential Eq. 11
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Saturation growth-rate Eq. Power Eq. Linearization of Nonlinear Relationships 12
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 13 Polynomial Regression Some engineering data is poorly represented by a straight line. A curve (polynomial) may be better suited to fit the data. The least squares method can be extended to fit the data to higher order polynomials. As an example let us consider a second order polynomial to fit the data points:
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example: Polynomial Regression Fit a second-order polynomial to the following data Solution: Solving this system of three equations gives a 0 = 2.48, a 1 = 2.36, a 2 = 1.86 y = x x 2 xixi yiyi xi2xi2 xi3xi3 xi4xi4 xiyixiyi xi2yixi2yi ∑