Generating Correlated Random Variables Kriss Harris Senior Statistician
Why? I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables. 2
Previous Method 3 Use Excel to fill down and then generate another column that was fairly correlated
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 4
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 5
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 6
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 7
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 8
Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 9
Y and x for different correlation coefficients 10
Generating Correlated Random Variables using Proc IML To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML. IML = Interactive Matrix Language 11
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 12 Use is similar to set. Reading in the simulated data and the means
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 13 Variance covariance matrix
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 14 Applying Cholesky’s decompositon
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 15 Concatenating the variables
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 16 Correlated Variables
Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 17 Outputting the variables
References Generating Multivariate Normal Data by using Proc IML Generating Multivariate Normal Data by using Proc IML Lingling Han, University of Georgia, Athens, GA 18
Appendix Correlation Coefficient = 19
R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; 20
R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1 y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2 21
R Code - Generating Correlated Random Variables using Matrices C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2) cholc = chol(C) R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow = F) mean = matrix(c(mean1,mean2), nrow = 100, ncol = 2, byrow = T) RC = mean + R %*% cholc 22 Use previous values of r1 and r2