Multivariate Analysis Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with a linear model (straight line) Uses Prediction of new X,Y values Understanding data behavior Verification of hypotheses/physical laws
Multivariate Analysis Regression The Linear Model Y = mX + b Y = Dependent variable X = Independent variable m = slope = DY/DX b = y-intercept (point where line crosses y-axis at x=0) X1=1, Y1=2.4 X2=20, Y2=10 DX DY
Multivariate Analysis Regression Fitting the data: finding the equation for the straight line that does the best job of reproducing the data.
Multivariate Analysis Regression Residual: Difference between measured and calculated Y-values
Multivariate Analysis Regression Residuals: Represents error in the fit for each data point. But the sum of the residuals tends to approach zero so it will not work for finding the overall error in the fit. SSE: Sum of squared residuals (SSE) used to represent the error in the fit. Minimize SSE and you have the best straight line for the data set.
Multivariate Analysis Regression Prediction: Once the best fit line has been determined, the equation can be used to predict new values of Y for any given X and vice versa. (Interpolation/Extapolation) y = 772.03x + 10810 If a states % of the population with a college degree is 20%, then they can expect an average income level of y = 772.03(20) + 10810 = $26,250 If a states average income level is $30,000, then what % of its population has a college degree? x = (30,000 – 10810)/772.03 = 24.9%
Multivariate Analysis Regression Non-Linear Models: Power y = p1• x p2 Polynomial y = p1 + p2• x + p3 • x2 + p4 • x3 + … Exponential y = p1• e p2 • x Logarithmic y = p1• Ln(x) + p2 User-defined
Multivariate Analysis Method of Least-Squares Process first outlined by Gauss in the 1820’s Mathematical process for minimizing the SSE
Multivariate Analysis Excel Functions and Tools SLOPE() Returns the slope when passed X, Y data.. INTERCEPT() Returns the intercept when passed X, Y data.. LINEST() Returns the slope and intercepts when passed X, Y data.. TREND() Returns predicted values in a linear trend when passed X, Y data.. Trendline (from the Chart menu) Returns the trendline, equation, and correlation coefficient for a set of X,Y data.