Random Numbers and Simulation Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Values are generated from deterministic algorithms © Fall 2011 John Grego and the University of South Carolina
Random Numbers Pseudo-random deviates can pass any statistical test for randomness They appear to be independent and identically distributed Random number generators for common distributions are available in R Special techniques (STAT 740) may be needed as well
Monte Carlo Simulation Some common uses of simulation Modeling stochastic behavior Calculating definite integrals Approximating the sampling distribution of a statistic (e.g., maximum of a random sample) Use when analytical results are difficult, unavailable or unconfirmed. Students are used to analytical results.
Modeling Stochastic Behavior Buffon’s needle Random Walk Observe X1, X2, …, where p=P(Xi=1)=P(Xi=-1)=.5 and study S1,S2,…, where This is also called Gambler’s ruin; each Xi represents a $1 bet with a return of $2 for a win and $0 for a loss. Buffon’s Needle: why bother? The analytical result isn’t that difficult.
A Fair Game The properties of a fair game (p=.5) are a lot more interesting than the properties of an unfair game (p≠.5) Some properties of this process are easy to anticipate (E(S)) Run code. Source Random_Walk.R
Gambler’s Ruin Some properties are difficult to anticipate, and can be aided by simulation. Expected number of returns to 0 Expected length of a winning streak Probability of going broke given an initial bank
Calculating Definite Integrals In statistics, we often have to calculate difficult definite integrals (posterior distributions, expected values) (here, x could be multidimensional)
Integral Examples Example 1 Example 2
Hit-or-Miss Monte Carlo Example Determine c such that c≥h(x) across entire region of interest (here, c=4)
Hit-or-Miss Monte Carlo Simulation Generate n random uniform (Xi,Yi) pairs, Xi’s from U[a,b] (here, U[0,1]) and Yi’s from U[0,c] (here, U[0,4]) Count the number of times (call this m) that Yi is less than h(Xi) Then I1 ≈c(b-a)m/n I.e., (height)(width)(proportion under curve)
Classical Monte Carlo Integration Take n random uniform values, U1,…,Un over [a,b] and estimate I using This method seems straightforward, but is actually more efficient than Hit-or-Miss Monte Carlo b-a is the width. The sum of h’s over n is the sample average of height from a uniform random sample on the x axis.
Expected Values Suppose X is a random variable with density f. Find E[h(x)] for some function h, e.g.,
Esimtating Expected Values For n random values X1, X2, …, Xn from the distribution of X (i.e., with density f), For the first term, integrate over the density of X (the population average).
Examples Example 3: If X is a random variable with a N(10,1) distribution, find E(X2) Example 4: If Y is a random variable with a Beta(5,1) distribution, E(-lnY) There are more advanced methods of integration using simulation (Importance Sampling) E(X^2) is 101
Integration integrate() performs numerical integration for functions of a single variable (not using simulation techniques) adapt() in the adapt package performs multivariate numerical integration
The Sampling Distribution of a Statistic To perform inference (CI’s, hypothesis tests) based on sampling statistics, we need to know the sampling distribution of the statistics, at least up to an approximation Example: X1, X2, …, Xn ~ iid N(m,s2).
Approximating the Sampling Distribution of a Statistic What if the data’s distribution is not known? Large sample: Central Limit Theorem Small sample: Normal theory or nonparametric procedures based on permutation distributions
Simulating the Sampling Distribution of a Statistic If the population distribution is known, we can approximate the sampling distribution with simulation. Repeatedly (m times) generate random samples of size n from the population distribution Calculate a statistic (say, S) each time The empirical (observed) distribution of S-values approximates the true distribution of S
Example X1, X2, X3, X4 ~Expon(1) What is the sampling distribution of: