Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013.

Slides:



Advertisements
Similar presentations
Special random variables Chapter 5 Some discrete or continuous probability distributions.
Advertisements

Exponential Distribution. = mean interval between consequent events = rate = mean number of counts in the unit interval > 0 X = distance between events.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Statistics review of basic probability and statistics.
Random number generation Algorithms and Transforms to Univariate Distributions.
STAT 270 What’s going to be on the quiz and/or the final exam?
Statistics for Financial Engineering Part1: Probability Instructor: Youngju Lee MFE, Haas Business School University of California, Berkeley.
Review of Basic Probability and Statistics
Random-Variate Generation. Need for Random-Variates We, usually, model uncertainty and unpredictability with statistical distributions Thereby, in order.
Simulation Modeling and Analysis
Probability Densities
Simulation Modeling and Analysis
Chapter 3-Normal distribution
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
2. Random variables  Introduction  Distribution of a random variable  Distribution function properties  Discrete random variables  Point mass  Discrete.
Important Random Variables EE570: Stochastic Processes Dr. Muqaiebl Based on notes of Pillai See also
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Discrete and Continuous Distributions G. V. Narayanan.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Chapter 5 Statistical Models in Simulation
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Moment Generating Functions
Traffic Modeling.
Random Variables and Stochastic Processes –
CS433 Modeling and Simulation Lecture 15 Random Number Generator Dr. Anis Koubâa 24 May 2009 Al-Imam Mohammad Ibn Saud Islamic University College Computer.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
Discrete Probability Distributions. Random Variable Random variable is a variable whose value is subject to variations due to chance. A random variable.
COMP 170 L2 L17: Random Variables and Expectation Page 1.
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 6 Some Continuous Probability Distributions.
Pemodelan Kualitas Proses Kode Matakuliah: I0092 – Statistik Pengendalian Kualitas Pertemuan : 2.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Starting point for generating other distributions.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Learning Simio Chapter 10 Analyzing Input Data
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Chap 5-1 Chapter 5 Discrete Random Variables and Probability Distributions Statistics for Business and Economics 6 th Edition.
Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008.
1 Opinionated in Statistics by Bill Press Lessons #15.5 Poisson Processes and Order Statistics Professor William H. Press, Department of Computer Science,
Statistics -Continuous probability distribution 2013/11/18.
Ondrej Ploc Part 2 The main methods of mathematical statistics, Probability distribution.
Random Variables Introduction to Probability & Statistics Random Variables.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
Chapter 14 Fitting Probability Distributions
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
3. Random Variables (Fig.3.1)
Math 4030 – 4a More Discrete Distributions
Statistical Modelling
Appendix A: Probability Theory
Chapter 7: Sampling Distributions
The Bernoulli distribution
Moment Generating Functions
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Some Discrete Probability Distributions
Part 2: Named Discrete Random Variables
Chapter 8 Random-Variate Generation
The sampling distribution of a statistic
Sampling Distribution of Pearson Correlation
Chapter 6 Some Continuous Probability Distributions.
Computer Simulation Techniques Generating Pseudo-Random Numbers
Sampling Distribution of the Mean in IML
3. Random Variables Let (, F, P) be a probability model for an experiment, and X a function that maps every to a unique point.
Chapter 3 : Random Variables
Continuous Probability Distributions
Further Topics on Random Variables: 1
Moments of Random Variables
Presentation transcript:

Wicklin, Rick. Simulating data with SAS. SAS Institute, 2013.

Wicklin, Rick. Statistical programming with SAS/IML software Wicklin, Rick. Statistical programming with SAS/IML software. SAS Institute, 2010.

SIMULATION Real World Traffic Flow Arrivals Climate Epidemics …

SIMULATION Statistics Examine distributions Examine sampling distributions Examine theoretical results Compare models …

Simulating a sample from Univariate Distributions in SAS The Data Step

Everything starts with random uniform(0,1). Pseudo random number generators In SAS, the RAND function generates realizations from multiple distributions

The RAND function %let obs=10; %let seed=54321; data tmp; do i=1 to &obs; x=rand("uniform"); output; end; run; proc print data=tmp;run;

From SAS help The RAND function uses the Mersenne-Twister random number generator (RNG) that was developed by Matsumoto and Nishimura (1998). The random number generator has a very long period (219937 – 1) and very good statistical properties. The period is a Mersenne prime.

Desired properties of random number generators. Repeatability -- the same sequence should be produced with the same initial values (seeds). Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) Long period --a pseudo-random number sequence uses finite Precision arithmetic, so the sequence must repeat itself with a finite period. This should be much longer than the amount of random numbers needed for the simulation. Period and randomness properties should not depend on the initial seeds.

Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) data uniforms(drop=i); call streaminit(45321); do i=1 to 1000000; x=ceil(10*rand("uniform")); output; end; run; proc freq data=uniforms ; tables x/chisq plots=none;

Randomness -- Produce independent uniformly distributed random variables (pass statistical tests for randomness.) data uniforms2; set uniforms; x1=lag(x); run; proc corr data=uniforms2;

From SAS Help The Mersenne-Twister RNG algorithm has an extremely long period, but this does not imply that large random samples are devoid of duplicate values. The RAND function returns at most 232 distinct values (4,294,967,296). In a random uniform sample of size 105, the chance of drawing at least one duplicate is greater than 50%. The expected number of duplicates in a random uniform sample of size M is approximately M2/233 when M is much less than 232. For example, you should expect about 115 duplicates in a random uniform sample of size M=106. These results are consequences of the famous “birthday matching problem” in probability theory.

How many duplicates? data rand (drop=i); call streaminit(54321); do i=1 to 1000000; x=rand("Uniform"); output; end; run; proc sort data=rand out=tmp nodupkeys; by x;

How Long before repeat? %let seed=54321;/*i=372,972,159*/ data _null_; call streaminit(&seed); x0=rand("Uniform"); do until (x=x0); i+1; x=rand("Uniform"); end; put "i=" i ; run;

Univariate Distributions Available with the RAND function. Bernoulli Distribution Beta Distribution Binomial Distribution Cauchy Distribution Chi-Square Distribution Erlang Distribution Exponential Distribution F Distribution Gamma Distribution Geometric Distribution Hypergeometric Distribution Lognormal Distribution Negative Binomial Distribution Normal Distribution Poisson Distribution T Distribution Tabled Distribution Triangular Distribution Uniform Distribution Weibull Distribution Univariate Distributions Available with the RAND function.

Some Continuous Distributions

Generating a random sample from a univariate distribution in the data step – The standard normal %let seed=754313; %let numobs=10000; data normals; call streaminit(&seed); do i=1 to &numobs; x=rand("normal"); output; end; run; ods select histogram goodnessoffit; proc univariate data=normals; var x; histogram x/normal;

Generating a random sample from a univariate distribution in the data step – chi-square (4 df) %let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma;

Generating a random sample from a univariate distribution in the data step – chi-square (4 df) %let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma(alpha=2,sigma=2);

Generating a random sample from a univariate distribution in the data step – chi-square (4 df) %let seed=754313; %let numobs=10000; data chisquares; call streaminit(&seed); do i=1 to &numobs; x=rand("chisq",4); output; end; run; ods select histogram goodnessoffit; proc univariate data=chisquares; var x; histogram x/gamma(alpha=2,sigma=2) kernel(color=red);

Exponential %let seed=754313; %let numobs=10000; data exponentials; call streaminit(&seed); do i=1 to &numobs; x=rand("exponential"); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red);

Exponential %let seed=754313; %let numobs=10000; data exponentials(drop=i); call streaminit(&seed); do i=1 to &numobs; x=5*rand("exponential"); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red); proc means data=exponentials n std mean; run;

Weibull %let seed=754313; %let numobs=10000; data exponentials; call streaminit(&seed); do i=1 to &numobs; x=rand("weibull",1,.2); output; end; run; ods select histogram goodnessoffit; proc univariate data=exponentials; var x; histogram x/exponential kernel(color=red);

Discrete Distributions

Bernoulli %let seed=754313; %let numobs=10000; data bernoullis; call streaminit(&seed); do i=1 to &numobs; x=rand("bernoulli",.25); output; end; run; proc freq data=bernoullis; tables x;

Binomial %let seed=754313; %let numobs=10000; data binomials; call streaminit(&seed); do i=1 to &numobs; x=rand("binomial",.25,12); output; end; run; proc freq data=binomials; tables x; proc means data=binomials; var x;

Poisson %let seed=754313; %let numobs=10000; data poissons(drop=i); call streaminit(&seed); do i=1 to &numobs; x=rand("poisson",10); output; end; run; proc means data=poissons;

Tabled distribution %let seed=754313; %let numobs=10000; data tabular(drop=i); call streaminit(&seed); do i=1 to &numobs; x=rand("table",.2,.3,.1,.4); output; end; run; proc freq data=tabular;