Generating Correlated Random Variables Kriss Harris Senior Statistician

Slides:



Advertisements
Similar presentations
Simulation Examples in EXCEL Montana Going Green 2010.
Advertisements

The Stand Structure Generator - further explorations into the joy of copula Dr. John A. Kershaw, Jr., CF, RPF Professor of Forest Mensuration UNB, Faculty.
An Introduction to R: Logic & Basics. The R language Command line Can be executed within a terminal Within Emacs using ESS (Emacs Speaks Statistics)
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Generating Multivariate Gaussian
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Augmented Designs Mike Popelka & Jason Morales. What is an augmented design? A replicated check experiment augmented by unreplicated entries. Step 1:
Path Analysis SAS/Calis. Read in the Data options formdlim='-' nodate pagno=min; TITLE 'Path Analysis, Ingram Data' ; data Ingram(type=corr); INPUT _TYPE_.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
L9: Vector Time Series1 Lecture 9: Multivariate Time Series Analysis The following topics will be covered: Modeling Mean –Cross-correlation Matrixes of.
Introduction to Multivariate Analysis Frühling Rijsdijk & Shaun Purcell Twin Workshop 2004.
Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Basic And Advanced SAS Programming
Correlations and Copulas Chapter 10 Risk Management and Financial Institutions 2e, Chapter 10, Copyright © John C. Hull
LISA Short Course Series R Basics
Joint Probability distribution
Joint Probability Distributions
Inverting Matrices Determinants and Matrix Multiplication.
Multivariate Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Separate multivariate observations
LISA Short Course Series Basics of R Lin Zhang Feb. 16, 2015 LISA: Basics of RFeb. 16, 2015.
Statistical Computing Spring 2014
Correlations and Copulas 1. Measures of Dependence 2 The risk can be split into two parts: the individual risks and the dependence structure between them.
Lecture 7: Simulations.
LISA Short Course Series R Basics Ana Maria Ortega Villa Fall 2013 LISA: R BasicsFall 2013.
AEB 6184 – Simulation and Estimation of the Primal Elluminate - 6.
Marginal and Conditional distributions. Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
Knowing Understanding the Basics Writing your own code SAS Lab.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
SAS Interactive Matrix Language Computing for Research I Spring 2012 Ramesh.
Elements of Financial Risk Management Second Edition © 2012 by Peter Christoffersen 1 Simulating the Term Structure of Risk Elements of Financial Risk.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Social Sub-groups Overview Substantive papers: Wayne Baker Social structure in a place where there should be none Scott Feld What causes clustering in.
Exploratory Tools for Spatial Data: Diagnosing Spatial Autocorrelation Main Message when modeling & analyzing spatial data: SPACE MATTERS! Relationships.
Distributions, Iteration, Simulation Why R will rock your world (if it hasn’t already)
The Pearson Product-Moment Correlation Coefficient.
Analysis of RT distributions with R Emil Ratko-Dehnert WS 2010/ 2011 Session 04 –
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
ANCOVA.
Description of Multivariate Data. Multivariate Analysis The analysis of many variables.
Copyright © 2006, SAS Institute Inc. All rights reserved. Company confidential - for internal use only SAS/IML Studio Ian Sedgwick.
Principles of Biostatistics Chapter 17 Correlation 宇传华 网上免费统计资源(八)
מאת: יעקב דדוש. פיסול –בין יחיד לרבים יחידה 1 לתלמיד המתבונן לפניך שתי יצירות פיסוליות. התבונן וכתוב (בשקופית הבאה) מהם ההבדלים בין הפסלים המוצגים לפניך?
March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.
Using the Macro Facility to Create HTML, XML and RTF output Rick Langston, SAS Institute Inc.
Introduction to Vectors and Matrices
Haas MFE SAS Workshop Lecture 3:
13.4 Product of Two Matrices
Bivariate Regression Vote 08 Vote 04.
Multivariate Analysis
Introduction to Multivariate Genetic Analysis
Correlations and Copulas
Sampling Distribution of Pearson Correlation
Appendix D: SAS PROC IML
Combining Data Sets in the DATA step.
Matrix Algebra and Random Vectors
Ch11 Curve Fitting II.
Introduction to Vectors and Matrices
(Approximately) Bivariate Normal Data and Inference Based on Hotelling’s T2 WNBA Regular Season Home Point Spread and Over/Under Differentials
Sampling Distribution of the Mean in IML
Let’s continue to review some of the statistics you’ve learned in your first class: Bivariate analyses (two variables measured at a time on each observation)
Multivariate Statistics
Multivariate Genetic Analysis: Introduction
Matrix Multiplication Sec. 4.2
Multivariate Statistics
Presentation transcript:

Generating Correlated Random Variables Kriss Harris Senior Statistician

Why? I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables. 2

Previous Method 3 Use Excel to fill down and then generate another column that was fairly correlated

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 4

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 5

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 6

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 7

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 8

Generating Correlated Random Variables using the SAS Datastep data bivariate_final; mean1=0; *mean for y1; mean2=10; *mean for y2; sig1=2; *SD for y1; sig2=5; *SD for y2; rho=0.90; *Correlation between y1 and y2; do i = 1 to 100; r1 = rannor(1245); r2 = rannor(2923); y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2- sig2**2*rho**2)*r2; output; end; run; 9

Y and x for different correlation coefficients 10

Generating Correlated Random Variables using Proc IML To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML. IML = Interactive Matrix Language 11

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 12 Use is similar to set. Reading in the simulated data and the means

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 13 Variance covariance matrix

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 14 Applying Cholesky’s decompositon

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 15 Concatenating the variables

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 16 Correlated Variables

Generating Correlated Random Variables using Proc IML proc iml; use bivariate_final; read all var {r1} into x3; read all var {r2} into x4; read all var {mean1} into mean1; read all var {mean2} into mean2; x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]); Cholesky_decomp = root(x); /* U */ matrix_con = x3||x4; mean = mean1||mean2; final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/ varnames = {y3 y4}; create Cholesky_correlation from final_simulated (|colname = varnames|); append from final_simulated; quit; 17 Outputting the variables

References Generating Multivariate Normal Data by using Proc IML Generating Multivariate Normal Data by using Proc IML Lingling Han, University of Georgia, Athens, GA 18

Appendix Correlation Coefficient = 19

R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1; y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2; 20

R Code - Generating Correlated Random Variables mean1 = 0 mean2 = 10 sig1 = 2 sig2 = 5 rho = 0.9 r1 = rnorm(100, 0, 1) r2 = rnorm(100, 0, 1) y1 = mean1 + sig1*r1 y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2 21

R Code - Generating Correlated Random Variables using Matrices C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2) cholc = chol(C) R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow = F) mean = matrix(c(mean1,mean2), nrow = 100, ncol = 2, byrow = T) RC = mean + R %*% cholc 22 Use previous values of r1 and r2