Discrete Event Simulation - 4 Statistics and Random Samples 12/2/2018
Discrete Event Simulation - 4 We want to study the behavior of some system: what we can do is perform observations of the system and collect data about the observations. This will result in our having a sequence of values of some random variable: what can we say about the totality of values this random variable can take? Can we say something about the probability distribution of this variable? 12/2/2018
Discrete Event Simulation - 4 Collect Data Construct a (hypothesized) Probability Distribution either by guessing a "well-known" one or by using some methodology for fitting curves to data points. Test this hypothesis - hopefully the proposed Probability Distribution will pass the test. Make any other hypotheses you wish and test them: estimate parameters (mean and variance for the normal distribution; mean for the Poisson; degrees of freedom for the Student's t and the F distributions - are these latter easily testable?) and test that your estimates satisfy some criteria. 12/2/2018
Discrete Event Simulation - 4 Statistic: any function of the observations of a random variable which does not depend on unknown parameters. For example: the population mean and variance are not statistics, since they are not part of a set of observations. The sample mean and variance are. Statistical inference would let us infer the population parameters from repeatable and independent observations of population behavior - i.e. from statistics. 12/2/2018
Discrete Event Simulation - 4 How do we obtain statistics? Decide on a variable you wish to measure: for example (as in the text) the processing time (in ms) of a job on a particular computer system. Run an experiment to obtain a value for the variable. Repeat the experiment until you decide you have enough data (what is enough? - this is a hard question). You must have reason to believe that the successive repetitions of the experiment are independent. This may be hard, too: if you run your copies of the job one immediately after the other is your assumption of independence reasonable? (how about on successive days at the same exact time?) 12/2/2018
Discrete Event Simulation - 4 You have your data, now what? 1) If the distribution is discrete, create a table with the number of occurrences for each discrete value in the range. For example, if you are measuring the "number of occurrences of something per unit time" you might have: Number of observed occurrences over 43 unit time trials 1 3 5 6 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 12 Occurrences per unit time 12/2/2018
Discrete Event Simulation - 4 If the distribution is continuous, break the range of values into adjacent - non-overlapping - intervals and count the frequency of occurrence in each interval. The text book example of the job times would give rise to: Occurrences for each time interval 7 10 14 11 8 8 7 7 6 6 6 5 3 2 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Unit time intervals 12/2/2018
Discrete Event Simulation - 4 We can graph this: 12/2/2018
Discrete Event Simulation - 4 What next? The determination of "useful parameters". These are, usually, the sample mean and sample variance. There are two roughly equivalent ways to compute the sample mean: (1/n)*∑i =1n xi where n is the sample size and xi is the value of the i-th item in the sample. 12/2/2018
Discrete Event Simulation - 4 If the sample has already been decomposed into ranges (for a continuous distribution), we can compute the sample mean by taking the midpoint of each range, multiplying it by the number of occurrences within that range, adding and dividing by the sum of the numbers of occurrence (= sample size). This will generally give a slightly different value, since it does not take into account the distribution of occurrences within each range. 12/2/2018
Discrete Event Simulation - 4 One can also compute the sample variance: there is some disagreement as to whether the "sum of the squares of the differences" should be divided by the sample size, OR by "sample size - 1" - since there is some "dependence" in the values used for the computation. One can actually show that the expression with the (n - 1) is the "unbiased maximum likelihood estimator" for the variance, while the other is not: this might sway a user to the (n - 1) choice. Just check which of the two a particular writer is using. 12/2/2018
Discrete Event Simulation - 4 Hypothesis Testing. Hypothesis: assumption about population (or probability distribution of population) being sampled. Test of hypothesis: a rule for acceptance or rejection. Sample Statistics: test statistics. Critical Region: set of values of the test statistics that result in rejection of the hypothesis. H0 usually stands for the hypothesis being tested for, and is called the "null hypothesis" H1 (or A in our text) stands for the alternative hypothesis 12/2/2018
Discrete Event Simulation - 4 A Classification of Errors: Test Result H0 true H0 false H0 Accepted Correct Type II error (b) H0 Rejected Type I error (a) Correct a = level of significance: the probability of making a Type I error = accept when false. b = probability of Type II error (reject when true) 12/2/2018
Discrete Event Simulation - 4 As one might expect, given a particular sample, reducing the probability of Type I error will increase the probability of Type II error and vice-versa. One is then left with the decision of how to optimize the choice… Some possibilities: For a given sample and a given a, try to minimize b. There are usually some choices for this. For a given sample and a given b, try to minimize a. This simply means that Type II errors are more important than Type I errors. Decide on a and b, and then determine the needed sample size - if possible. 12/2/2018
Discrete Event Simulation - 4 12/2/2018