#9 SIMULATION OUTPUT ANALYSIS Systems Fall 2000 Instructor: Peter M. Hahn
SIMULATION OUTPUT ANALYSIS Simulation is a computer-based statistical sampling experiment Appropriate statistical techniques must be employed for design and analysis Virtually all simulation outputs are non-stationary and auto-(i.e., self)correlated Techniques based on IID assumptions are not applicable and grossly misleading. Adequate computer time must be allocated
SIMULATION OUTPUT ANALYSIS
OUTPUT - A STOCHASTIC PROCESS A collection of RVs ordered over time which are all defined in a common sample space) Discrete time stochastic process: {X 1, X 2,…} Continuous time stochastic proc: {X(t), t ≥ 0} The {.} means the process can be repeated, each time with a different set of x i or x(t) Stationarity - statistics do not change w/time
MEASURES OF DEPENDENCE
If C ij > 0 (positive correlation) X i > i and X j > j tend to occur together and X i < i and X j < j tend to occur together Negative correlation is vice-versa A discrete time stochastic process is covariance- stationary iff i = for all i, i 2 = 2 for all i and Cov(X i, X i+j ) = C i,i+j = C j is independent of i for all j Thus,
MEASURES OF DEPENDENCE If X 1, X 2, … is an output of a simulation beginning at time 0, it is quite likely not to be covariance-stationary Later in the simulation, covariance-stationarity is more likely. We call this a warm-up period Generally, j 0 as j ∞ Actually, j 0 usually happens within a reasonable number of samples Time for j 0 is independent of warm-up
IID SIMULATION OUTPUT OBSERVATIONS If observations are X 1, X 2,…, X n the sample mean is Sample variance is Proof of unbiased-ness is to be done for homework
IID SIMULATION OUTPUT OBSERVATIONS is a sum of RVs. Thus itself is a RV Thus, we are unsure of just how close to is A possible pdf of is
IID SIMULATION OUTPUT OBSERVATIONS var[ ] is clearly a measure of just how close is to
IID SIMULATION OUTPUT OBSERVATIONS Thus, the larger the sample size n, the narrower the pdf of and the closer is likely to be to An unbiased estimator of var[ ] is simply
CORRELATED OUTPUT OBSERVATIONS Now suppose that the X 1, X 2,…, X n are instead correlated, as is often found in practice. For a covariance stationary process it has been proven that where j is the correlation between observations X i and X i+j Remember j 0 as j gets large If the j are positive then E[S 2 (n)] < 2. Can you see why? We call this a negative bias. We may think that our estimate of is good (i.e., that the variance is small) but it may not be.
ASSESSING SAMPLE INDEPENDENCE Calculating sample correlation :
CORRELATED OUTPUT OBSERVATIONS It can also be shown that Prove this as Problem 3 in homework #9 Thus, if one estimates there would be two sources of error –The bias in S 2 (n) as an estimator of –Neglect of the correlation terms in the above expression These two errors unfortunately do not cancel each other We illustrate this with the following example
EXAMPLE - CORRELATED OBSERVATIONS Given 10 delays from a covariance-stationary M/M/1 queue with utilization factor* 0.9. The 1, 2,…, 9, are known to be 0.99, 0.98, 0.97, 0.96, 0.95, 0.94, 0.935, 0.93, respectively * = arrival rate/maximum servicing rate
EXAMPLE - CORRELATED OBSERVATIONS We would grossly underestimate the variance of the sample mean and not be aware that the simulation run was too short Be careful about estimating the j from too few samples Figure shows case for only 10 sample delays (L&K p253)
CONFIDENCE INTERVALS We discuss how to construct a confidence interval for the mean of some important simulation output For this purpose we assume X 1, X 2, …, X n be IID RVs with finite mean and finite variance 2 > 0 To assure IID, it will be necessary to make multiple simulation runs (as explained earlier) For large sample size (i.e., sufficient number of simulation runs) we invoke the central limit theorem for calculating confidence interval For less than ‘sufficient number’ of samples we assume that samples have a normal distribution and utilize the “t-confidence interval”.
THE CENTRAL LIMIT THEOREM Thus, for n ‘sufficiently large’ is approximately distributed as a normal RV w/ mean and variance 2 /n
THE CENTRAL LIMIT THEOREM But we really don’t know 2 We have to estimate it using S 2 (n) Fortunately, S 2 (n) converges to 2 as n gets large The central limit theorem remains true if we replace 2 by S 2 (n) in the expression for Z n The theorem now says, if n is ‘sufficiently large’, the RV is approximately distributed as a normal (Gaussian) RV of mean 0 and variance 1.
CONFIDENCE INTERVAL It follows, for large n, that where z 1- (for 0 < < 1) is the upper 1- critical point for a Normal RV of mean 0 and = 1. Table A.3 in B,C,N&N gives these values
CONFIDENCE INTERVAL Upper and lower critical points
CONFIDENCE INTERVAL
Therefore, for large n, an approximate 100(1- ) percent confidence interval for is given by There is no guarantee that actually falls within these limits (only a probability)
SMALL SAMPLE CONFIDENCE INTERVAL What if n is not ‘sufficiently’ large? If n is too small, the interval given above will have probability So, we use another model - we assume that the X i s are from a Normal distribution and use the exact result that has a t distribution with n-1 degrees of freedom
SMALL SAMPLE CONFIDENCE INTERVAL The statistic t n has an exact 100(1- ) percent confidence interval for given by Table A.5 in B,C,N&N gives these values
SMALL SAMPLE CONFIDENCE INTERVAL If we quadruple the number of samples n, the t confidence interval is approximately halved It is recommended that you use the t confidence interval for n < 120, as it is more conservative EXAMPLE: 10 simulation runs produce the following independently distributed output observations (assumed Normally distributed with mean ) 1.2, 1.5, 1.68, 1.89, 0.95, 1.49, 1.58, 1.55, 0.5, 1.09 We wish a 90% confidence interval for
t CONFIDENCE INTERVAL EXAMPLE
HOMEWORK # Do the proof from slide #14 in this presentation 4-