Linear Regression Model In regression, x = independent (predictor) variable y= dependent (response) variable regression line (prediction line) ŷ = a + bx linear regression model – r, r 2 (explained variance), and prediction line
Regression Line regression line (prediction line) ŷ = a + bx b = Σxy – [(Σx)(Σy)/n] Σx 2 – [(Σx) 2 /n] a = M y – bM x M y = mean of scores of y b = slope just computed M x = mean of scores of x
XYXYX2X2 Y2Y Σx = 25Σy = 13Σxy = -5Σx 2 = 147Σy 2 = 71 M = 3.57M = 1.86 X Y n = 7
Regression Line b = -5 – [(25)(13)/7] 147 – [(25) 2 /7] b = -5 – [325/7] 147 – [625 /7] b = -5 – = = – XYXYX2X2 Y2Y2 Σx = 25Σy = 13Σxy = -5Σx 2 = 147Σy 2 = 71 M = 3.57M = 1.86
Regression Line b = -5 – [(25)(13)/7] 147 – [(25) 2 /7] b = -5 – [325/7] 147 – [625 /7] b = -5 – = = – XYXYX2X2 Y2Y2 Σx = 25Σy = 13Σxy = -5Σx 2 = 147Σy 2 = 71 M = 3.57M = 1.86
Regression Line a = 1.86 – -0.89(3.57) a = 1.86 – a = 5.04 XYXYX2X2 Y2Y2 Σx = 25Σy = 13Σxy = -5Σx 2 = 147Σy 2 = 71 M = 3.57M = 1.86b = -0.89
Regression Line regression line (prediction line) ŷ = a + bx a = 5.04 b = The best line of fit for the data is – ŷ = x
Regression Line of Fit Line of fit is plotted on a scatter plot
Coefficient of Determination r 2 Measures the proportion of the variability of the DV (y) is explained by IV (x) Basic Properties – Obtained by squaring the value of r – Values range from 0.00 to 1.00 or 0% - 100% – 0.00 ≤ r 2 ≤ 1.00 or 0% ≤ r 2 ≤ 100% – 1.00 or 100% = a perfect model (explains most of the variation in the dependent variable) – 0.00 or 0% = a imperfect model (explains none of the variation in the dependent variable)
Coefficient of Determination r 2 The regression model can explain 97.8% of the variation in the y value. r = 7(-5) – (25)(13) = [7(147) – (25) 2 ] X [7(71) – (13) 2 ] r 2 = (-0.989) 2 = 0.978
Linear Regression r = , r 2 = 0.978, explaining 97.8% of the variance in y.
Regression Widely used for prediction (including forecasting and time-series data) Also used to understand how the IVs (x) are related to the DVs (y) & to explore these relationships In controlled studies, can be used to infer causal relationships between the IVs and DVs. In addition to the r, r 2, & ŷ, a test of statistical significance it typically reported.
Regression Type of errors that can cause problems w/ the relationships explored – Chance that data shows a relationship between 2 variables when relationship is purely coincidence. – Chance that a relationship that exists does not show up in the sample data, purely b/c of unlucky randomly selecting the data points.
Residuals & Residuals Error Errors – The difference between the observed (actual, y) value and the predicted (ŷ) value Residual Error – Residual = (observed – predicted) – Residual = (y – ŷ) – When plotted, the residuals should resemble either a linear plot (if the model is linear) or a non-linear plot (if the model is non-linear) If the two do not match, then the model (variables included) should be adjusted
Regression p – values (p) are statistical values, typically reported to show the “statistical significance” of other values – The probability that the observed relationship (e.g., between variables) in a sample occurred by pure chance – Or that in the population from which the sample was drawn, no such relationship actually exists.
Regression p – values – Results said to be statistically significant when the p is less than a preset threshold value (alpha α) Typically α =.05,.01,.005, or.001 – Represents a 5%, 1%,.5%, or.1% – want the p value to be = to or lower than the threshold (p < α or p ≤ α) p ≤.05, p <.01, p <.001 – Typically calculated and reported with the r value (r = 0.61, p < 0.01) – Relationships w/ p over.05 (p >.05) are typically not viewed as being statistically significant