Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression
Copyright © 2010 Pearson Education, Inc Chapter Outline 1) Overview 2) Product-Moment Correlation 3) Regression Analysis 4) Bivariate Regression 5) Multiple Regression 6) Multicollinearity
Copyright © 2010 Pearson Education, Inc Variances are similar, t-tests are appropriate
Copyright © 2010 Pearson Education, Inc Variances are not similar, t-tests could be misleading Correlation and regression analysis can help….
Copyright © 2010 Pearson Education, Inc Product Moment Correlation The product moment correlation, r, summarizes the strength of association between two metric (interval or ratio scaled) variables, say X and Y. In other words, you can have a correlation coefficient for Likert scale items, not dichotomous items. It is an index used to determine whether a linear (straight- line) relationship exists between X and Y. As it was originally proposed by Karl Pearson, it is also known as the Pearson correlation coefficient. It is also referred to as simple correlation, bivariate correlation, or merely the correlation coefficient.
Copyright © 2010 Pearson Education, Inc Linear relationships
Copyright © 2010 Pearson Education, Inc Product Moment Correlation From a sample of n observations, X and Y, the product moment correlation, r, can be calculated as: sum X = average of all x’s Y = average of all y’s Don’t worry, we can do this in SPSS…
Copyright © 2010 Pearson Education, Inc Product Moment Correlation r varies between -1.0 and The correlation coefficient between two variables will be the same regardless of their underlying units of measurement. For example, comparing a 5 point scale to a 7 point scale is okay.
Copyright © 2010 Pearson Education, Inc Explaining Attitude Toward the City of Residence Example: Table 17.1
Copyright © 2010 Pearson Education, Inc Product Moment Correlation The correlation coefficient may be calculated as follows: STEP 1: GET THE AVERAGES OF X AND Y = ( )/12 = = average of X Duration of residence (X) Attitude toward the city (Y) = ( )/12 = = average of Y ( X i - X )( Y i - Y ) i =1 n = ( )(6-6.58) + ( )(9-6.58) + ( )(8-6.58) + (4-9.33)(3-6.58) + ( )( ) + (6-9.33)(4-6.58) For each respondent,+ (8-9.33)(5-6.58) + (2-9.33) (2-6.58) subtract the average of+ ( )( ) + (9-9.33)(9-6.58) x from their x; subtract+ ( )( ) + (2-9.33)(2-6.58) the average of y from= their y, then multiply, then sum all values = STEP 2: GET THE NUMERATOR
Copyright © 2010 Pearson Education, Inc Product Moment Correlation ( X i - X ) 2 i =1 n = ( ) 2 + ( ) 2 + ( ) 2 + (4-9.33) 2 + ( ) 2 + (6-9.33) 2 + (8-9.33) 2 + (2-9.33) 2 + ( ) 2 + (9-9.33) 2 + ( ) 2 + (2-9.33) 2 = = ( Y i - Y ) 2 i =1 n = (6-6.58) 2 + (9-6.58) 2 + (8-6.58) 2 + (3-6.58) 2 + ( ) 2 + (4-6.58) 2 + (5-6.58) 2 + (2-6.58) 2 + ( ) 2 + (9-6.58) 2 + ( ) 2 + (2-6.58) 2 = = Thus, ( ) ( ) = = r STEP 3: GET THE DENOMINATOR STEP 4: COMPLETE THE FORMULA
Copyright © 2010 Pearson Education, Inc Interpretation of the Correlation Coefficient The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear correlation between the variables.
Copyright © 2010 Pearson Education, Inc Positive and Negative Correlation
Copyright © 2010 Pearson Education, Inc CorrelationNegativePositive None−0.09 to to 0.09 Small−0.3 to − to 0.3 Medium−0.5 to − to 0.5 Strong−1.0 to − to 1.0 Interpretation of the Correlation Coefficient As a rule of thumb, correlation values can be interpreted in the following manner:
Copyright © 2010 Pearson Education, Inc SPSS Windows: Correlations 1. Select ANALYZE from the SPSS menu bar. 2. Click CORRELATE and then BIVARIATE. 3. Move “variable x” into the VARIABLES box. Then move “variable y” into the VARIABLES box. 4. Check PEARSON under CORRELATION COEFFICIENTS. 5. Check ONE-TAILED under TEST OF SIGNIFICANCE. 6. Check FLAG SIGNIFICANT CORRELATIONS. 7. Click OK.
Copyright © 2010 Pearson Education, Inc SPSS Example: Correlation Correlations AgeInternetUsage InternetShoppi ng AgePearson Correlation Sig. (1- tailed) N20 InternetUsagePearson Correlation Sig. (1- tailed).000 N20 InternetShoppingPearson Correlation Sig. (1- tailed) N20
Copyright © 2010 Pearson Education, Inc Regression Analysis Regression analysis examines associative relationships between a metric dependent variable and one or more independent variables in the following ways: Determine whether the independent variables explain a significant variation in the dependent variable: whether a relationship exists. Determine how much of the variation in the dependent variable can be explained by the independent variables: strength of the relationship. Determine the structure or form of the relationship: the mathematical equation relating the independent and dependent variables. For example, does a change in age predict a change in Internet usage? Regression can answer this.
Copyright © 2010 Pearson Education, Inc Statistics Associated with Bivariate Regression Analysis Bivariate regression model. The basic regression equation is Y i = + X i + e i, where Y = dependent or criterion variable, X = independent or predictor variable, = intercept of the line, = slope of the line, and e i is the error term associated with the i th observation. Y = B 0 + B 1 X 1 + e Coefficient of determination. The strength of association is measured by the coefficient of determination, r 2. It varies between 0 and 1 and signifies the proportion of the total variation in Y that is accounted for by the variation in X. Note: This is the correlation coefficient squared. Above.5 is good. Estimated or predicted value. The estimated or predicted value of Y i is i = a + b x, where i is the predicted value of Y i, and a and b are estimators of and, respectively. 0 1 0 1 0 1
Copyright © 2010 Pearson Education, Inc Statistics Associated with Bivariate Regression Analysis Regression coefficient. The estimated parameter b is usually referred to as the non- standardized regression coefficient. Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y values from the predicted values. Standard error. The standard deviation of b, SE b, is called the standard error. Y
Copyright © 2010 Pearson Education, Inc Conducting Bivariate Regression Analysis Fig Plot the Scatter Diagram Formulate the General Model Estimate Standardized Regression Coefficients (b) Test for Significance (p-value) Determine the Strength of Association (r-square) Check Prediction Accuracy
Copyright © 2010 Pearson Education, Inc Conducting Bivariate Regression Analysis The Bivariate Regression Model In the bivariate regression model, the general form of a straight line is: Y = X 0 + 1 where Y = dependent variable X = independent (predictor) variable = intercept of the line 0 1 = slope of the line The regression procedure adds an error term: Y i = 0 + 1 X i + e i where e i is the error term associated with the i th observation.
Copyright © 2010 Pearson Education, Inc Plot of Attitude with Duration Actual Responses – Attitude Towards City v. Duration of Residence Is there a pattern? And which line is most accurate? Duration of Residence Attitude
Copyright © 2010 Pearson Education, Inc In order to determine the correct line, we use the Least-squares procedure (or OLS regression). Essentially, this finds a line that minimizes the distance from the line to all the points. Least-squares minimizes the square of the vertical distances of all the points from the line. Once we find the line, a formula can be derived: Attitude = (duration of residence) This means that attitude towards city can be predicted by duration of residence Plot of Attitude with Duration
Copyright © 2010 Pearson Education, Inc SPSS Windows: Bivariate Regression 1. Select ANALYZE from the SPSS menu bar. 2. Click REGRESSION and then LINEAR. 3. Move “Variable y” into the DEPENDENT box. 4. Move “Variable x” into the INDEPENDENT(S) box. 5. Select ENTER in the METHOD box. 6. Click on STATISTICS and check ESTIMATES under REGRESSION COEFFICIENTS. 7. Check MODEL FIT. 8. Click CONTINUE. 9. Click OK.
Copyright © 2010 Pearson Education, Inc Model Summary Model RR Square Adjusted R Square Std. Error of the Estimate Coefficientsa Model Unstandardized Coefficients Standardi zed Coefficien ts tSig. BStd. ErrorBeta 1(Constant) InternetUsage SPSS Example: Bivariate Regression
Copyright © 2010 Pearson Education, Inc Multiple Regression The general form of the multiple regression model is as follows: We will want to run multiple regression if we believe that multiple IVs will predict one DV. Perhaps this is a more appropriate formula: Attitude = (Duration of residence) (Importance of weather) Y= 0 + 1 X 1 + 2 X 2 + 3 X k X k +ee
Copyright © 2010 Pearson Education, Inc SPSS Windows: Multiple Regression 1. Select ANALYZE from the SPSS menu bar. 2. Click REGRESSION and then LINEAR. 3. Move “Variable y” into the DEPENDENT box. 4. Move “Variable x 1, x 2, x 3 …” into the INDEPENDENT(S) box. 5. Select ENTER in the METHOD box. 6. Click on STATISTICS and check ESTIMATES under REGRESSION COEFFICIENTS. 7. Check MODEL FIT. 8. Click CONTINUE. 9. Click OK.
Copyright © 2010 Pearson Education, Inc SPSS Example: Multiple Regression Model Summary Model RR Square Adjusted R Square Std. Error of the Estimate Coefficientsa Model Unstandardized Coefficients Standardi zed Coefficien ts tSig. BStd. ErrorBeta 1(Constant) InternetUsage Age
Copyright © 2010 Pearson Education, Inc Multicollinearity Multicollinearity arises when correlations among the predictors are very high. Multicollinearity can result in several problems, including: The regression coefficients may not be estimated precisely. The magnitudes, as well as the signs of the partial regression coefficients, may change. It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable.
Copyright © 2010 Pearson Education, Inc Multicollinearity In our example, age was actually a significant predictor. That, plus the high correlations, indicate that multicollinearity exists. Since age has a very low coefficient value, we can feel safe just getting rid of it for our final model. Coefficientsa Model Unstandardized Coefficients Standardize d Coefficients tSig. BStd. ErrorBeta 1(Constant) Age
Copyright © 2010 Pearson Education, Inc Thank you! Questions??