Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013.

Similar presentations


Presentation on theme: "Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013."— Presentation transcript:

1 Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013.

2 Wicklin, Rick. Statistical programming with SAS/IML software
Wicklin, Rick. Statistical programming with SAS/IML software. SAS Institute, 2010.

3 SIMULATION Real World Traffic Flow Arrivals Climate Epidemics

4 SIMULATION Statistics Examine distributions
Examine sampling distributions Examine theoretical results Compare models

5 Simulating a sample from Univariate Distributions in SAS The Data Step

6 Everything starts with random uniform(0,1).
Pseudo random number generators In SAS, the RAND function generates realizations from multiple distributions

7 The RAND function %let obs=10; %let seed=54321; data tmp;
do i=1 to &obs; x=rand("uniform"); output; end; run; proc print data=tmp;run;

8 From SAS help The RAND function uses the Mersenne-Twister random number generator (RNG) that was developed by Matsumoto and Nishimura (1998). The random number generator has a very long period ( – 1) and very good statistical properties. The period is a Mersenne prime.

9 Desired properties of random number generators.
Repeatability -- the same sequence should be produced with the same initial values (seeds). Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) Long period --a pseudo-random number sequence uses finite Precision arithmetic, so the sequence must repeat itself with a finite period. This should be much longer than the amount of random numbers needed for the simulation. Period and randomness properties should not depend on the initial seeds.

10 Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) data uniforms(drop=i); call streaminit(45321); do i=1 to ; x=ceil(10*rand("uniform")); output; end; run; proc freq data=uniforms ; tables x/chisq plots=none;

11 Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) data uniforms2; set uniforms; x1=lag(x); run; proc corr data=uniforms2;

12 From SAS Help The Mersenne-Twister RNG algorithm has an extremely long period, but this does not imply that large random samples are devoid of duplicate values. The RAND function returns at most 232 distinct values (4,294,967,296). In a random uniform sample of size 105, the chance of drawing at least one duplicate is greater than 50%. The expected number of duplicates in a random uniform sample of size M is approximately M2/233 when M is much less than 232. For example, you should expect about 115 duplicates in a random uniform sample of size M=106. These results are consequences of the famous “birthday matching problem” in probability theory.

13 How many duplicates? data rand (drop=i); call streaminit(54321);
do i=1 to ; x=rand("Uniform"); output; end; run; proc sort data=rand out=tmp nodupkeys; by x;

14 How Long before repeat? %let seed=54321;/*i=372,972,159*/
data _null_; call streaminit(&seed); x0=rand("Uniform"); do until (x=x0); i+1; x=rand("Uniform"); end; put "i=" i ; run;

15 Univariate Distributions Available with the RAND function.
Bernoulli Distribution Beta Distribution Binomial Distribution Cauchy Distribution Chi-Square Distribution Erlang Distribution Exponential Distribution F Distribution Gamma Distribution Geometric Distribution Hypergeometric Distribution Lognormal Distribution Negative Binomial Distribution Normal Distribution Poisson Distribution T Distribution Tabled Distribution Triangular Distribution Uniform Distribution Weibull Distribution Univariate Distributions Available with the RAND function.

16 Some Continuous Distributions

17 Generating a random sample from a univariate distribution in the data step – The standard normal
%let seed=754313; %let numobs=10000; data normals; call streaminit(&seed); do i=1 to &numobs; x=rand("normal"); output; end; run; ods select histogram goodnessoffit; proc univariate data=normals; var x; histogram x/normal;

18 Generating a random sample from a univariate distribution in the data step – chi-square (4 df)
%let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma;

19 Generating a random sample from a univariate distribution in the data step – chi-square (4 df)
%let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma(alpha=2,sigma=2);

20 Generating a random sample from a univariate distribution in the data step – chi-square (4 df)
%let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma(alpha=2,sigma=2) kernel(color=red);

21 Exponential %let seed=754313; %let numobs=10000; data exponentials;
call streaminit(&seed); do i=1 to &numobs; x=rand("exponential"); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red);

22 Exponential %let seed=754313; %let numobs=10000;
data exponentials(drop=i); call streaminit(&seed); do i=1 to &numobs; x=5*rand("exponential"); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red); proc means data=exponentials n std mean; run;

23 Weibull %let seed=754313; %let numobs=10000; data exponentials;
call streaminit(&seed); do i=1 to &numobs; x=rand("weibull",1,.2); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red);

24 Discrete Distributions

25 Bernoulli %let seed=754313; %let numobs=10000; data bernoullis;
call streaminit(&seed); do i=1 to &numobs; x=rand("bernoulli",.25); output; end; run; proc freq data=bernoullis; tables x;

26 Binomial %let seed=754313; %let numobs=10000; data binomials;
call streaminit(&seed); do i=1 to &numobs; x=rand("binomial",.25,12); output; end; run; proc freq data=binomials; tables x; proc means data=binomials; var x;

27 Poisson %let seed=754313; %let numobs=10000; data poissons(drop=i);
call streaminit(&seed); do i=1 to &numobs; x=rand("poisson",10); output; end; run; proc means data=poissons;

28 Tabled distribution %let seed=754313; %let numobs=10000;
data tabular(drop=i); call streaminit(&seed); do i=1 to &numobs; x=rand("table",.2,.3,.1,.4); output; end; run; proc freq data=tabular;


Download ppt "Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013."

Similar presentations


Ads by Google