Download presentation
Presentation is loading. Please wait.
1
Lecture 2- Alternate Correlation Procedures
EPSY 640 Texas A&M University
2
CORRELATION MEASURES FOR VARIOUS SCALES OF MEASUREMENT
Y Nominal Level Nominal Level Ordinal Level Interval or ratio X (dichotomous) (polychotomous) Level Nominal Level Phi coefficient, Dichotomous Yule's Q, Pearson rank-biserial point-biserial, Goodman;s Association, biserial, Nagelkirke Lambda , Tschuprow’s T, R-square (logistic) tetrachoric Pearson Nominal Level Association, reduce to (Polychotomous) Tschiuprow’s T, dichotomous or R-square Cramer’s C Kruskal-Wallis based statistic Ordinal Spearman, R-squared Level Kendall’s tau Interval/Ratio Pearson r
3
Dichotomous-Dichotomous Case- PHI COEFFICIENT
the phi coefficient can be written as rphi = (ad – bc) / [(a+c)(b+d)(a+b)(c+d)]1/2 MINORITY STATUS a b c d GENDER
4
Dichotomous-Dichotomous Case- PHI COEFFICIENT
Political affiliation rphi = (7x10 – 2x2) / (7+2)(2+10)(7+2)(2+10)]1/2 = (70 - 4) / [9x12x9x12]1/2 = 66/108 = .611 Pearson r = .157/(.5071 x .5071) = .611 Row total 9 12 7 2 2 10 Gender Column Total
5
2 = n.j (nij/n – ni./n)2 / ni./n i=1 j=1
CHI-SQUARE: I J 2 = n.j (nij/n – ni./n)2 / ni./n i=1 j=1 = 9[7/21 – 9/21]2 /( 9/21) + 9[2/21 – 9/21] 2/( 9/21) + 12[2/21 – 12/21]2 /(12/21) [10/21 – 12/21]2 / (12/21) = / / / /21 = 157/21 = PEARSON ASSOCIATION: TSCHUPROW'S T: P = {2 / 2 + n}1/2 T = {2/n[(r-c)(c-1)]1/2}1/2 = {7.476/22.476}1/ = {7.476/21 x [1 x 1]1/2}1/2 = = .597
6
SPSS Crosstabs procedure
Select “Analyze/ Descriptive Statistics/ Crosstabs” Select “Row” and “Column” variables for the two nominal variables Select under “Statistics” the options you want such as “Chi Square” and various “Nominal” association measures
9
TETRACHORIC ASSUMPTIONS- underlying normality of observed dichotomies
0,1 1,1 1 0,0 1,0 1 TETRACHORIC ASSUMPTIONS- underlying normality of observed dichotomies
10
ux = height of normal curve for proportion = 90/210 = U(.4290)
n11 = 70 n12 = 20 n21 = 20 n22 = 100 ux = height of normal curve for proportion = 90/210 = U(.4290) The z-score for .429 = -.18 The ux for = U(.4290) (requires stat table or SPSS procedure Pdf.Norm) = .3637 uy = height of normal curve for proportion = 90/210 = U(.4290) rtet = [ (70 x 100) – (20 x 20) ] / [.3637 x x 2102 ] = 6600/ (.132 x 2102) = not a good estimate! Table 3.4: Computation of tetrachoric correlation 70 20 20 100
11
y u 0.00 0.3989 0.02 0.52 0.3485 1.02 0.2371 1.52 0.1257 0.04 0.3986 0.54 0.3448 1.04 0.2323 1.54 0.1219 0.06 0.3982 0.56 0.3410 1.06 0.2275 1.56 0.1182 0.08 0.3977 0.58 0.3372 1.08 0.2227 1.58 0.1145 0.10 0.3970 0.60 0.3332 1.10 0.2179 1.60 0.1109 0.12 0.3961 0.62 0.3292 1.12 0.2131 1.62 0.1074 0.14 0.3951 0.64 0.3251 1.14 0.2083 1.64 0.1040 0.16 0.3939 0.66 0.3209 1.16 0.2036 1.66 0.1006 0.18 0.3925 0.68 0.3166 1.18 0.1989 1.68 0.0973 0.20 0.3910 0.70 0.3123 1.20 0.1942 1.70 0.0940 0.22 0.3894 0.72 0.3079 1.22 0.1895 1.72 0.0909 0.24 0.3876 0.74 0.3034 1.24 0.1849 1.74 0.0878 0.26 0.3857 0.76 0.2989 1.26 0.1804 1.76 0.0848 0.28 0.3836 0.78 0.2943 1.28 0.1758 1.78 0.0818 0.30 0.3814 0.80 0.2897 1.30 0.1714 1.80 0.0790 0.32 0.3790 0.82 0.2850 1.32 0.1669 1.82 0.0761 0.34 0.3765 0.84 0.2803 1.34 0.1626 1.84 0.0734 0.36 0.3739 0.86 0.2756 1.36 0.1582 1.86 0.0707 0.38 0.3712 0.88 0.2709 1.38 0.1539 1.88 0.0681 0.40 0.3683 0.90 0.2661 1.40 0.1497 1.90 0.0656 0.42 0.3653 0.92 0.2613 1.42 0.1456 1.92 0.0632 0.44 0.3621 0.94 0.2565 1.44 0.1415 1.94 0.0608 0.46 0.3589 0.96 0.2516 1.46 0.1374 1.96 0.0584 0.48 0.3555 0.98 0.2468 1.48 0.1334 1.98 0.0562 0.50 0.3521 1.00 0.2420 1.50 0.1295 2.00 0.0540 N O R M A L D E S I T Y H G
12
y u 2.02 0.0519 2.52 0.0167 3.02 0.0042 3.52 0.0008 2.04 0.0498 2.54 0.0158 3.04 0.0039 3.54 2.06 0.0478 2.56 0.0151 3.06 0.0037 3.56 0.0007 2.08 0.0459 2.58 0.0143 3.08 0.0035 3.58 2.10 0.0440 2.60 0.0136 3.10 0.0033 3.60 0.0006 2.12 0.0422 2.62 0.0129 3.12 0.0031 3.62 2.14 0.0404 2.64 0.0122 3.14 0.0029 3.64 0.0005 2.16 0.0387 2.66 0.0116 3.16 0.0027 3.66 2.18 0.0371 2.68 0.0110 3.18 0.0025 3.68 2.20 0.0355 2.70 0.0104 3.20 0.0024 3.70 0.0004 2.22 0.0339 2.72 0.0099 3.22 0.0022 3.72 2.24 0.0325 2.74 0.0093 3.24 0.0021 3.74 2.26 0.0310 2.76 0.0088 3.26 0.0020 3.76 0.0003 2.28 0.0297 2.78 0.0084 3.28 0.0018 3.78 2.30 0.0283 2.80 0.0079 3.30 0.0017 3.80 2.32 0.0270 2.82 0.0075 3.32 0.0016 3.82 2.34 0.0258 2.84 0.0071 3.34 0.0015 3.84 2.36 0.0246 2.86 0.0067 3.36 0.0014 3.86 0.0002 2.38 0.0235 2.88 0.0063 3.38 0.0013 3.88 2.40 0.0224 2.90 0.0060 3.40 0.0012 3.90 2.42 0.0213 2.92 0.0056 3.42 3.92 2.44 0.0203 2.94 0.0053 3.44 0.0011 3.94 2.46 0.0194 2.96 0.0050 3.46 0.0010 3.96 2.48 0.0184 2.98 0.0047 3.48 0.0009 3.98 0.0001 2.50 0.0175 3.00 0.0044 3.50 N O R M A L D E S I T Y H G
13
POINT-BISERIAL CORRELATION
Y = score on interval measure (eg. Test score) x = 0 or 1 (grouping; eg. gender) Y1. – Y ____________ rpb = _________ n1n0 / n(n-1) sy
14
Descriptive Statistics
Mean SD N Covariance GENDER READING COMPREHENSION 0 (boys) 1 (girls) Total _________________ rpb = – [ 8 x 7 ] / [15 x 14 ] 15.32 = .233 Table 3.5: Calculation of point-biserial correlation coefficient for First Grade reading comprehension of boys and girls
15
POINT BISERIAL CORRELATION
Y X m = mean m X m F M X
16
Dichotomous (Normal)-Interval Case
biserial correlation Y1. – Y0. rbis = _________ . [n1n0 / uxn2] , sy where u = height of normal curve for proportion n1/(n0 + n1)
17
rbis = _________ . [n1n0 / uxn2] , sy
Y1. – Y0. rbis = _________ . [n1n0 / uxn2] , sy = – [ 8 x 7 ] / [.3675 x 152 ] 15.32 = .306
18
BISERIAL CORRELATION Y X m X m F M X
19
RANK-RANK DATA 1. DATA ARE INTERVAL OR RATIO
Transformed to ranks because of odd distribution 2. DATA ARE ORDINAL, NO INTERVAL INFORMATION AVAILABLE USE SPEARMAN CORRELATION (Pearson formula used on the ranks - no ties assumed)
20
Rank distribution of real estate price per square foot in Manhattan
Rank Price per foot Battery Central Park Location The relative position of the ranks above is only approximate, due to typeface limitations. All are ordered correctly. This results in the following ranks: rank: location: price: Computation of rank correlation coefficient: rrank = sxy/sxsy = -.647 rSpearman = (from SPSS, Version 13) Table 3.7: Computation of rank correlation for Real Estate location in Manhattan with Price Per Square Foot
21
Least squares estimation
The best estimate will be one in which the sum of squared differences between each score and the estimate will be the smallest among all possible linear unbiased estimates (BLUES, or best linear unbiased estimate).
22
Least squares estimation
errors or disturbances. They represent in this case the part of the y score not predictable from x: ei = yi – b1xi -b0x. The sum of squares for errors follows: n SSe = e2i . i=1
23
y e e e e e e e e SSe = e2i x
24
Matrix representation of least squares estimation.
We can represent the regression model in matrix form: y = X + e
25
Matrix representation of least squares estimation
y = X e y x1 e1 0 y x 1 e2 y x3 e3 y = x4 + e4
26
Matrix representation of least squares estimation
y = Xb + e The least squares criterion is satisfied by the following matrix equation: b = (X΄X)-1X΄y . The term X΄ is called the transpose of the X matrix. It is the matrix turned on its side. When X΄X is multiplied together, the result is a 2 x 2 matrix n xi xi x2i Note: all information here: sample size, mean (sum of scores), variance (squared scores)
27
SUMS OF SQUARES computational equivalents
SSe = (n – 2 )s2e SSreg = ( b1 xi – y. )2 SSy = SSreg + SSe
28
SUMS OF SQUARES-Venn Diagram
SSy SSreg SSx SSe Fig. 8.3: Venn diagram for linear regression with one predictor and one outcome measure
29
SUMS OF SQUARES- ANOVA Table
SOURCE df Sum of Squares Mean Square F x 1 SSreg SSreg / 1 SSreg/ 1 SSe /(n-2) e n-2 SSe SSe / (n-2) Total n-1 SSy SSy / (n-1) Table 8.1: Regression table for Sums of Squares
30
Rupley and Willson (1997) studied the relationship between word recognition and reading comprehension for 200 six- and seven-year olds using a national sample of students that mirrored the U.S. census of The mean for Word Recognition was 100, SD=15, and the mean for Reading Comprehension was 23.16, SD= The regression analysis is reported in the table below: Dep. Var.: Reading Comprehension SOURCE df Sum of Squares Mean Square F Prob. R2 Word recog nition error total se =6.71
31
Two variable linear regression: Which direction?
Regression equations: y = xb1x+ xb0 x = yb1y + yb0 Regression coefficients: xb1 = rxy sy / sx yb1 = rxy sx / sy
32
Two variable linear regression
y = b1x + b0 If the correlation coefficient is calculated, then b1 can be calculated from the equation above: b1 = rxy sy / sx The intercept, b0, follows by placing the means for x and y into the equation above and solving: b0 = y. – [ rxysy/sx ] x.
33
Two variable linear regression.
yb1 = rxy sx / sy xb1 = rxy sy / sx x y y x Fig. 8.1: Slopes for two regression representations of Pearson correlation
34
Three variable linear regression
y = b1x1 + b2x2 + b0 Two predictors: all variables may be correlated with each other Exact equations exist to compute b1 , b2 but not for more than two predictors, matrix form normal equations must be used
35
Three variable linear regression
Path model representation:unstandardized x1 b1 y e 12 x2 b2
36
Three variable linear regression
Path model representation:standardized x1 1 y e r12 x2 2
37
SUMS OF SQUARES-Venn Diagram
SSx1 SSy SSreg SSe SSx2 Fig. 8.3: Venn diagram for linear regression with two predictors and one outcome measure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.