Download presentation
Presentation is loading. Please wait.
Published byClare Roberts Modified over 6 years ago
1
Correlation, Bivariate Regression, and Multiple Regression
Chapter 7 Correlation, Bivariate Regression, and Multiple Regression
2
Pearson’s Product Moment Correlation
Correlation measures the association between two variables. Correlation quantifies the extent to which the mean, variation & direction of one variable are related to another variable. r ranges from +1 to -1. Correlation can be used for prediction. Correlation does not indicate the cause of a relationship.
3
Scatter Plot Scatter plot gives a visual description of the relationship between two variables. The line of best fit is defined as the line that minimized the squared deviations from a data point up to or down to the line.
4
Line of Best Fit Minimizes Squared Deviations from a Data Point to the Line
6
Always do a Scatter Plot to Check the Shape of the Relationship
7
Will a Linear Fit Work?
8
Will a Linear Fit Work? y = x R2 =
9
2nd Order Fit? y = x x R2 =
10
6th Order Fit? y = x x x x x x R2 =
11
Will Linear Fit Work?
12
Linear Fit y = x R2 =
15
Correlation Formulas
16
Evaluating the Strength of a Correlation
For predictions, absolute value of r < .7, may produce unacceptably large errors, especially if the SDs of either or both X & Y are large. As a general rule Absolute value r greater than or equal .9 is good Absolute value r equal to is moderate Absolute value r equal to is low Values for r below .5 give R2 = .25, or 25% are poor, and thus not useful for predicting.
17
Significant Correlation??
If N is large (N=90) then a .205 correlation is significant. ALWAYS THINK ABOUT R2 How much variance in Y is X accounting for? r = .205 R2 = .042, thus X is accounting for 4.2% of the variance in Y. This will lead to poor predictions. A 95% confidence interval will also show how poor the prediction is.
18
Venn diagram shows (R2) the amount of variance in Y that is explained by X.
Unexplained Variance in Y. (1-R2) = .36, 36% R2=.64 (64%) Variance in Y that is explained by X
19
The vertical distance (up or down) from a data point to the line of best fit is a RESIDUAL.
Y = mX + b Y = .72 X + 13
20
Calculation of Regression Coefficients (b, C)
If r < .7 prediction will be poor. Large SDs adversely affect the accuracy of the prediction.
21
Standard Deviation of Residuals
22
Standard Error of Estimate (SEE) SD of Y
Prediction Errors The SEE is the SD of the prediction errors (residuals) when predicting Y from X. SEE is used to make a confidence interval for the prediction equation.
23
The SEE is used to compute confidence intervals for prediction equation.
24
Example of a 95% confidence interval.
Both r and SDY are critical in accuracy of prediction. If SDY is small and r is big, predictions are will be small. If SDY is big and r is small, predictions are will be large. We are 95% sure the mean falls between 45.1 and 67.3
25
The advantage of multivariate or bivariate regression is
Multiple Regression Multiple regression is used to predict one Y (dependent) variable from two or more X (independent) variables. The advantage of multivariate or bivariate regression is Provides lower standard error of estimate Determines which variables contribute to the prediction and which do not.
26
Multiple Regression b1, b2, b3, … bn are coefficients that give weight to the independent variables according to their relative contribution to the prediction of Y. X1, X2, X3, … Xn are the predictors (independent variables). C is a constant, similar to Y intercept. Body Fat = Abdominal + Tricep + Thigh
27
List the variables and order to enter into the equation
X2 has biggest area (C), it comes in first. X1 comes in next area (A) is bigger than area (E). Both A and E are unique, not common to C. X3 comes in next, it uniquely adds area (E). X4 is not related to Y so it is NOT in the equation.
28
Ideal Relationship Between Predictors and Y
Each variable accounts for unique variance in Y Very little overlap of the predictors Order to enter? X1, X3, X4, X2, X5
30
Regression Methods Enter: forces all predictors (independent variables) into the equation, in one step. Forward: Each step adds a new predictor. Predictors enter based upon the unique variance in Y they explain. Backward: Starts with full equation (all predictors) and removes them one at a time on each step, beginning with the predictor that adds the least. Stepwise: Each step adds a new predictor. One any step a predictor can be added and another removed if it has high partial correlations with the newly added predictor.
31
Regression Methods in SPSS
Choose desired Regression Method.
32
Regression Assumptions
Homoscedaticity: equal variance of X at any Y value. The residuals are normally distributed around the line of best fit. X and Y are linearly related
33
Tests for Normality Use SPSS Descriptives Explore Set 1 Set 2 Set 3
11 123 2 5 25 144 29 14 155 4 24 17 7 125 1 31 10 147 9 37 182 35 22 166 6 122 8 27 165 143 30 156 28 19 154 149 26 Tests for Normality Use SPSS Descriptives Explore
34
Tests for Normality
35
Tests for Normality
36
Tests for Normality
37
Tests for Normality Not less than 0.05 so the data are normal.
38
Tests for Normality: Normal Probability Plot or Q-Q Plot
If the data are normal the points cluster around a straight line
39
Tests for Normality: Boxplots
Bar is the median, box extends from 25 – 75th percentile, whiskers extend to largest and smallest values within 1.5 box lengths Outliers are labeled with O, Extreme values are labeled with a star
40
Tests for Normality: Normal Probability Plot or Q-Q Plot
41
Cntry15.Sav Example of Regression Assumptions
42
Cntry15.Sav Example of Regression Assumptions
43
Cntry15.Sav – Regression Statistics Settings
44
Cntry15.Sav – Regression Plot Settings
45
Cntry15.Sav – Regression Save Settings
46
Cntry15.Sav Example of Regression Assumptions
Standardized Residual Stem-and-Leaf Plot Frequency Stem & Leaf Stem width: Each leaf: case(s)
47
Cntry15.Sav Example of Regression Assumptions
Distribution is normal. Two scores are somewhat outside
48
Cntry15.Sav Example of Regression Assumptions
No Outliers [labeled O] No Extreme scores [labeled with a star]
49
Cntry15.Sav Example of Regression Assumptions
The points should fall randomly in a band around 0, if the distribution is normal. In this distribution there is one extreme score.
50
Cntry15.Sav Example of Regression Assumptions
The data are normal.
51
Regression Violations
52
Regression Violations
53
Regression Violations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.