Correlation I have two variables, practically „equal“ (traditionally marked as X and Y) – I ask, if they are independent and if they are „correlated“, how much then.
(Pearson) Correlation coefficient If positive deviations from mean in X are connected with positive deviations in Y, and negative ones with negative ones, then the sum is positive Dimensionless number (covariance standardized by variances of single variables), -1 means deterministic negative dependence, +1 deterministic positive dependence.
We presume linear relation, or two-dimensional normal distribution
Even here is r~0, though values aren’t independent But mind, that Y hasn’t normal distribution for this X
r=+0.99 r=-0.99
r=-0.83 r=+0.83
r=-0.45 r=+0.45
Test of null hypothesis H0: =0 r is estimation of parameter of population - . Again translates to the t-test We can use again both, one- and two-tailed test. It is even possible to test null hypothesis, that =some non-zero value, procedure is more complicated.
There are also tabled critical values of r (for different sample sizes)
Comparison with regression It holds, that coefficient of determination in regression (R2) is square of correlation coefficient computed from the same two variables. Probability level of significance test about independence is exactly the same in regression and for correlation coefficient.
Just manipulative experiment proves causality
Power of test Regression is significant just when correlation coefficient is significant. Power of test increases (in both) with strength of relation and with number of observations. When I want to estimate somehow, how much observations I need, I must have an idea, how tight the relation is (how high R2 or ρ is in population).
Power of test: critical values r – it is possible to look for how much observations I need to have ~50% chance to reject H0 on given level of significance (at known ρ) More precise calculations are possible, but in any case, I need to have an idea, what is the correlation in population.
Coefficient of rank correlation (Spearmann) [there is also Kendall] I replace every variable with its rank and I compute its correlation coefficient from rank. For greater samples even values for normal (Pearson) correlation coefficient hold. We can use formula d is difference in rank
But also Spearmann c. will be 0 in this case We can say, that Pearson correlation coefficient is a measure of linear dependence, Spearman is a measure of monotonic dependence.
Another possibility is to use permutation test I change values of independent variable randomly and I count, how many times the resulted dependent variable will be “so nice” as from our data.