Sampling Distribution of Pearson Correlation
IMLMLIB Some IML functions and a subroutine The RANDNORMAL function The corr function The QNTL subroutine
Generate random samples bivariate normal and calculate correlation for each sample %let obs = 20; /* size of each sample */ %let reps = 1000; /* number of samples*/ proc iml; call randseed(54321); mu = {0 0}; /*mean*/ print mu; Sigma = {1 0.3, 0.3 1};/*covariance*/ print sigma; rho = j(&reps, 1); /* allocate vector for results*/ do i = 1 to &reps; /* simulation loop*/ /* simulated data */ x = RandNormal(&obs, mu, Sigma); /* corr returns a matrix, get Pearson correlation for ith sample*/ rho[i] = corr(x)[1,2]; end; print x; print (rho[1:5,]);
Compute quantiles, create data set call qntl(q, rho, {0.05 0.25 0.5 0.75 0.95}); print (q`)[colname={"P5" "P25" "Median" "P75" "P95"}]; create corr var {"Rho"}; append; close; quit;
Visualize approx. sampling distribution proc univariate data=Corr; label Rho = "Pearson Correlation Coefficient"; histogram Rho / kernel; ods select Histogram; run;
Find the percentage of negative correlations in the approximate sampling distribution. proc sql; select sum(Rho<0)/count(*) as pctneg "Percent negative" format=percent6.1 from corr ; quit;