Modeling and Simulation CS 313 Sample Statistics Modeling and Simulation CS 313
SAMPLE STATISTICS Discrete-event simulations generate a lot of experimental data. To facilitate the analysis of all this data, it is conventional to compress the data into a handful of meaningful statistics. We have already seen examples of this, where job averages and time averages were used to characterize the performance of a single-server service node. Each time a discrete-event simulation program is used to generate data, it is important to appreciate that this data is only a sample from that much larger population.
SAMPLE STATISTICS If the size of sample is small, essentially all that can be done is compute the sample mean and standard deviation. If the size of sample is not small, a sample-data histogram can be computed and then used to analyze the distribution of data in the sample.
SAMPLE MEAN AND STANDARD DEVIATION How to collect data in DES? Two types of statistical analysis: Within-the-run (e.g., job avg and time avg used to characterize the performance of a SSQ system) Between-the-run: simulate the system repeatedly by simply changing the initial seed from run to run.
SAMPLE MEAN AND STANDARD DEVIATION Definitions: Consider a sample x1, x2, . . . , xn (continuous or discrete) Sample Mean: Sample Variance: Sample Standard Deviation: Coefficient of Variation:
UNDERSTANDING THE STATISTICS Mean: a measure of central tendency Variance, Deviation: measures of dispersion about the mean The sample standard deviation has the same "units" as the data and the sample mean. For example, if the data has units of sec then so also does the sample mean and standard deviation. Although the sample variance is more amenable to mathematical manipulation (because it is free of the square root), the sample standard deviation is typically the preferred measure of dispersion, since it has the same units as the data. Note that the coefficient of variation (C.V.) is unit-less, but a common shift in data changes the C.V. e.g.: measure students’ heights on the floor, in chairs
RELATING THE MEAN AND STANDARD DEVIATION The root-mean-square (rms) function d(x) measures dispersion about any value x d(x) measures dispersion about any value x Theorem 4.1.1 The sample mean gives the smallest possible value for d(x) The standard deviation s is that smallest value:
RELATING THE MEAN AND STANDARD DEVIATION Example: Collect 50 observations The sample mean is 1.095 The sample standard deviation is 0.354: The smallest value of d(x) is s, as shown in the figure
LINEAR DATA TRANSFORMATION Often the output data generated by simulations should be converted to different units (sec), the change in system statistics can be determined directly, without any need to re-process the converted data.
LINEAR DATA TRANSFORMATION
NONLINEAR DATA TRANSFORMATION When data is used to generate a Boolean (1 or 0) outcome, we need nonlinear data transformation The value of xi is not important as the effect E.g., consider the effect: it will rain tomorrow. How much rain we will have is not important Let A be a fixed set and
NONLINEAR DATA TRANSFORMATION
DISCRETE-DATA HISTOGRAMS
DISCRETE-DATA HISTOGRAMS Example 1:
DISCRETE-DATA HISTOGRAMS Example 2:
HISTOGRAM MEAN AND STANDARD DEVIATION The discrete-data histogram mean is The discrete-data histogram standard deviation is The discrete-data histogram variance is s2
HISTOGRAM MEAN AND STANDARD DEVIATION
HISTOGRAM MEAN AND STANDARD DEVIATION Example 4.2.3 For the data in Example 4.2.1 (three dice) For the data in the Example 4.2.2 (balls placed in boxes)
CONTINUOUS-DATA HISTOGRAMS
CONTINUOUS-DATA HISTOGRAMS Binning
CONTINUOUS-DATA HISTOGRAMS
CONTINUOUS-DATA HISTOGRAMS Example: buffon
HISTOGRAM PARAMETER GUIDELINES
CONTINUOUS-DATA HISTOGRAMS Example 4.3.2: Smooth, Noisy Histograms
Relative Frequency
Histogram Integrals
HISTOGRAM MEAN AND STANDARD DEVIATION