Chapter Eighteen MEASURES OF ASSOCIATION
Parametirc vs. Nonparametric Measures of Association Parametirc measure of association is for continuous variables measured on an interval or ratio scale. Bivariate correlation (Pearson correlation) is a typical parametric measure of association. The coefficient does not distinguish between independent and dependent variables. Nonparametric measure of association is for nominal or ordinal data.
Bivariate Correlation Analysis Pearson correlation coefficient r symbolized the coefficient's estimate of linear association based on sampling data Correlation coefficients reveal the magnitude and direction of relationships Coefficient’s sign (+ or -) signifies the direction of the relationship Assumptions of r Linearity Bivariate normal distribution
Bivariate Correlation Analysis Scatterplots Provide a means for visual inspection of data the direction of a relationship the shape of a relationship the magnitude of a relationship (with practice)
Interpretation of Coefficients Relationship does not imply causation Y could cause X X could cause Y X and Y could influence each other. X and Y could be affected by a third variable. Statistical significance is measured by t-value. Statistical significance does not imply a relationship is practically meaningful
Interpretation of Coefficients Be careful about artifact correlations Coefficient of determination (r2) The amount of common variance in X and Y F-test is used for goodness of fit. Correlation matrix used to display coefficients for more than two variables
Bivariate Linear Regression Establish a linear relationship between a independent variable (X) and a dependent variable (Y). Use the observed value of X to estimate or predict corresponding Y value. Regression coefficients Slope: β1 = Δ Y / Δ X Intercept: βo = Y bar – β1 X bar
Bivariate Linear Regression Error term: deviation of the ith observation from the regression line represented by βo + β1 Xi , i.e., εi = Yi - βo - β1 Xi Method of Least Squares Regression line is line of best fit for the data. To find the best fit, the method of least squares is used. Method of least squares is to minimize Σ εi2 (the total squared errors of estimate). Technically, calculus (differentiation) is used to solve for β1 and βo..
Interpreting Linear Regression Goodness of fit T test for individual coefficients Zero slope (β1 = 0) means Y completely unrelated to X and no systematic pattern is evident constant values of Y for every value of X data are related, but represented by a nonlinear function F test for the model F value is related to r2 (coefficient of determination)
Interpreting Linear Regression Residuals What remain after the line is fitted Estimated error terms Standardized residuals are comparable to Z scores with a mean of 0 and a standard deviation of 1. Confidence band vs. prediction band
Measures for Nominal Data When there is no relationship at all, coefficient is 0 When there is complete dependency, the coefficient displays unity or 1 The following measures are used for nominal data (next slide).
Measures for Nominal Data Chi-square based measure Phi Cramer’s V Contingency coefficient of C Proportional reduction in error (PRE) Lambda Tau
Characteristics of Ordinal Data Concordant- subject who ranks higher on one variable also ranks higher on the other variable Discordant- subject who ranks higher on one variable ranks lower on the other variable
Measures for Ordinal Data No assumption of bivariate normal distribution Most based on concordant/discordant pairs Values range from +1.0 to -1.0
Measures for Ordinal Data The following test statistics are used. Gamma Somer’s d Spearman’s rho Kendall’s tau b Kendall’s tau c