Correlation A bit about Pearson’s r.

Correlation A bit about Pearson’s r

Questions What does it mean when a correlation is positive? Negative?
What is the purpose of the Fisher r to z transformation? What is range restriction? Range enhancement? What do they do to r? Give an example in which data properly analyzed by ANOVA cannot be used to infer causality. Why do we care about the sampling distribution of the correlation coefficient? What is the effect of reliability on r?

Basic Ideas Nominal vs. continuous IV
Degree (direction) & closeness (magnitude) of linear relations Sign (+ or -) for direction Absolute value for magnitude Pearson product-moment correlation coefficient

Illustrations Positive, negative, zero

Always Plot Your Data!

Simple Formulas Use either N throughout or else use N-1 throughout (SD and denominator); result is the same as long as you are consistent. Pearson’s r is the average cross product of z scores. Product of (standardized) moments from the means.

Graphic Representation
Conversion from raw to z. 2. Points & quadrants. Positive & negative products. 3. Correlation is average of cross products. Sign & magnitude of r depend on where the points fall. 4. Product at maximum (average =1) when points on line where zX=zY.

r = 1.0 Descriptive Statistics N Minimum Maximum Mean Std. Deviation
Ht Wt Valid N (listwise) 10 r = 1.0

r=1 Leave X, add error to Y. r=.99

r=.99 Add more error. r=.91

With 2 variables, the correlation is the z-score slope.

Review What does it mean when a correlation is positive? Negative?

Sampling Distribution of r
Statistic is r, parameter is ρ (rho). In general, r is slightly biased. The sampling variance is approximately: Sampling variance depends both on N and on ρ.

Fisher’s r to z Transformation
.10 .20 .30 .40 .50 .60 .70 .80 .90 z .10 .20 .31 .42 .55 .69 .87 1.10 1.47 Sampling distribution of z is normal as N increases. Pulls out short tail to make better (normal) distribution. Sampling variance of z = (1/(n-3)) does not depend on ρ. R to z function is also atanh in geometry.

Hypothesis test: Result is compared to t with (N-2) df for significance. Say r=.25, N=100 p< .05 t(.05, 98) =

Hypothesis test 2: One sample z test where r is sample value and ρ is hypothesized population value. Say N=200, r = .54, and ρ is .30. =4.13 Compare to unit normal, e.g., 4.13 > 1.96 so it is significant. Our sample was not drawn from a population in which rho is .30.

Hypothesis test 3: Testing equality of correlations from 2 INDEPENDENT samples. Say N1=150, r1=.63, N2=175, r2=70. = -1.18, n.s.

Hypothesis test 4: Testing equality of any number of independent correlations. Compare Q to chi-square with k-1 df. Study r n z (n-3)z zbar (z-zbar)2 (n-3)(z-zbar)2 1 .2 200 39.94 .41 .0441 8.69 2 .5 150 .55 80.75 .0196 2.88 3 .6 75 .69 49.91 .0784 5.64 sum 425 170.6 17.21=Q Chi-square at .05 with 2 df = Not all rho are equal.

Hypothesis test 5: dependent r
Hotelling-Williams test Say N=101, r12=.4, r13=.6, r23=.3 t(.05, 98) = 1.98 See my notes.

Review What is the purpose of the Fisher r to z transformation?
Test the hypothesis that Given that r1 = .50, N1 = 103 r2 = .60, N2 = 128 and the samples are independent. Why do we care about the sampling distribution of the correlation coefficient?

Range Restriction

Range enhancement

Reliability Reliability sets the ceiling for validity. Measurement error attenuates correlations. If correlation between true scores is .7 and reliability of X and Y are both .8, observed correlation is 7.sqrt(.8*.8) = .7*.8 = .56. Disattenuated correlation If our observed correlation is .56 and the reliabilities of both X and Y are .8, our estimate of the correlation between true scores is .56/.8 = .70.

Add Error to Y only The correlation decreases.
Distribution of X does not change. Distribution of Y becomes wider (increased variance). Slope of Y on X remains constant (SDy effect on b and r cancels out. Not true for error in X.

Review What is range restriction? Range enhancement? What do they do to r? What is the effect of reliability on r?

SAS Power Estimation proc power; onecorr dist=fisherz corr = 0.35
nullcorr = 0.2 sides = 1 ntotal = 100 power = .; run; proc power; onecorr corr = 0.35 nullcorr = 0 sides = 2 ntotal = . power = .8; run; Computed N Total Alpha = .05 Actual Power = .801 Ntotal = 61 Computed Power Actual alpha = .05 Power = .486

Power for Correlations
Rho N required against Null: rho = 0 .10 782 .15 346 .20 193 .25 123 .30 84 .35 61 Sample sizes required for powerful conventional significance tests for typical values of the correlation coefficient in psychology. Power = .8, two tails, alpha is .05.

Programs Review ‘corrs’ Excel program from website
Download Excel file Show examples of tests for correlations Review R program for computing correlations

Exercises Download Spector’s data
Compute univariates & correlation matrix 5 vbls: Age, Autonomy, Work hours, Interpersonal conflict, Job Satisfaction Problems: Which pairs are significant? (use the per comparison or nominal alpha) Is the absolute value of the correlation between conflict and job satisfaction significantly different from .5? Is the correlation between age and conflict different than the correlation between age and job satisfaction?

Correlation A bit about Pearson’s r.

Similar presentations

Presentation on theme: "Correlation A bit about Pearson’s r."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Correlation A bit about Pearson’s r.

Similar presentations

Presentation on theme: "Correlation A bit about Pearson’s r."— Presentation transcript:

Similar presentations

About project

Feedback