Discrete Event Simulation - 5

Discrete Event Simulation - 5
Some Examples of Test Distributions and their Uses 12/2/2018

An Example (6.1): We want to test the hypothesis that a service station has an average of 12 customers per hour: H0 : µ = 12. The alternative can take several different forms: A : µ = 14 will leave us with a simple hypothesis, while A : µ ≠ 12, A : µ < 12, or A : µ > 12 will give compound hypotheses. We will look at a couple of tests, and the functions we must examine for our conclusions. We assume that the arrival distribution is normal with mean µ and variance s2. 12/2/2018

What do we do? 1) Decide on the hypothesis: H0:µ = 12, A:µ ≠ 12 2) Decide on critical region: a = 0.05(for example) and split between rejection because the experiment returned values too large or too small. 3) The statistic you will use - in this case a sample mean obtained from a sample of size n. Since we are assuming a normal distribution, we can standardize: 12/2/2018

Note that we are making the assumption that the true population variance (s2) is known - which may or may not be possible. 4) The Central Limit Theorem states that the sample mean is normally distributed with mean µ (same as population mean) and standard deviation s/√n, at least if n is large enough. 5) The critical region is depicted in the next figure: H0 will be rejected if 12/2/2018

Standardized Normal Distribution. The sum of the areas of the tails is 0.05. 12/2/2018

Assume that a sample of size (n =) 16 was collected, that s = 2, and that the sample yielded a mean of 14 ( ). And we reject the null hypothesis, since Z = 4 > 1.96. Notice that the text appears to have another misprint… e denominator is NOT 1 but 1/2 - and this has an effect on all current and later formulae and computations. 12/2/2018

Distributions: µ = 12 and µ = 14. These are computed by choosing the range - in 0.1 increments - and evaluating NORMDIST(value, mean, stand.dev., FALSE), Followed by charting the series. 12/2/2018

If the alternative hypothesis had been A: µ = 14, we can compute the probability of a Type II error by: Which does not appear very large in this case. Notice that the alternate mean, although only one population standard deviation removed from the test mean, is, actually, 4 sample mean standard deviations removed… so you might expect little chance of error. If we subtract the LEFT tail (as we should), we will get an even smaller (by not much) probability of error. 12/2/2018

Here we have the probability of Type II error (= value of b), with a dependence on the actual mean µ: one can see that the largest such probabilities occur when the actual mean is close to 12 - but not equal to 12.. 12/2/2018

And here are both the Operating Characteristic and the Power Curves : the second is just (1 - b). Since the distribution is bell-shaped, these are of the same shape. 12/2/2018

Student's t-test (and distribution). We can use this test to verify whether two normal populations have the same mean, to obtain confidence limits for a mean, and confidence limits for a regression coefficient. Let's look at the first application. Null Hypothesis: H0: µ1 - µ2 = 0 Alternative: A : µ1 - µ2 ≠ 0 We need to assume both populations have the same variance: (s1)2 = (s2)2 12/2/2018

One problem we need to solve is the determination of a formula for the variance of a difference: If and are normally and independently distributed, with population means µx and µy and population variances sx2 and sy2 then is normally distributed with mean And standard deviation: Where nx and ny are the sample sizes. A proof can be obtained (easily) as an application of Moment Generating Function techniques. 12/2/2018

Since all we are likely to have is a couple of sets of sample values, the population variance may not be known. Starting from the computed sample variances Sx2 and Sy2 for the two samples, one can also show that: Where S2 is the variance of the pooled populations. Combining the previous result and the current one, we can define the t-variable to be 12/2/2018

We can deal with this as a "standard normal distribution", compute the tails and compare the sample results to the critical values. Since t= 0.339… in Example 6.4 (see the Excel SpreadSheet) and the critical value for 0.05 normal tails is 1.96, we cannot reject the null hypothesis. We can also try to deal with this as a candidate for the Student's-t test (as the textbook claims). 12/2/2018

In that case Must be reinterpreted as being of the form t = u √n/v, where u is a standard normal variable and v2 is a c2 variable with n degrees of freedom. Start with: Which possesses the properties expected of u above: standard normal. 12/2/2018

Since are c2-variables with nx-1 and ny-1 degrees of freedom, their sum, is a c2-variable with nx + ny - 2 degrees of freedom. Now simplify the expression: 12/2/2018

The critical region for the test (bilateral) is: See Example6.4.xls 12/2/2018

If the observations in the two samples cannot be claimed independent, but the differences xi - yi can be so claimed, we can still test for equality of means. Basically, the sample of the differences Di is a random sample of size n from a normal population with mean µx - µy and variance s 2. The hypothesis: H0: µ = 0; A : µ ≠ 0. Degrees of freedom: n - 1. (we'll see why) 12/2/2018

The Sample Variance (for the Dis) is given by: where , the average Difference, and the test statistic is with critical region The number of degrees of freedom is determined by the number of independent observations in SD. 12/2/2018

For an example, see Example6.5.xls. In case one wished to test for a "one sided alternative" - e.g., A: µ1 > µ2, one would use a "one tailed test": or , depending on the problem. 12/2/2018

Combination of Sample Variances for single unbiased estimate of the population variance. Let s12, s22,…,sk2, denote k sample variances based on samples of size n1, n2, …, nk. Then, if each sample variance is weighted with the size of the sample on which it is based (-1, since the formula for si2 is already chosen for an unbiased estimate), the proper weighted average to use for s 2 is given by: Where a is chosen to make this estimate unbiased, i.e., E[t] = s 2 . 12/2/2018

Since the unbiased estimate requirement implies that E[t] = s 2 we must have a = n1 + … + nk - k. 12/2/2018

The F-Distribution. If we start with two random variables u and v possessing c2 distributions with n1 and n2 degrees of freedom, the quotient (u/n1)/(v/n2) possesses an F-Distribution with (n1,n2) degrees of freedom. Our text refers to the quotient 12/2/2018

Recall that: Recall further that: if x is normally distributed with mean 0 and variance 1, the sums of the squares of n random samples of x has a c2-distribution with n degrees of freedom. The terms j = 1..ni, i=1,2, can be argued to satisfy the hypothesis, except that one degree of freedom is lost by the use of the sample mean in place of the population mean. 12/2/2018

The form of the F-distribution can then be reconciled by letting with a similar form for v. If one looks at the tables in back of the book, the only values of a considered are in the "high range" - for right tail critical regions. How do we compute the "low range", for left tail or bilateral regions? Excel provides a function that works, but the tables are (often) incomplete. 12/2/2018

Let F1 denote the value of F such that P{F < F1} = a, and let F2 denote the value such that P{F > F2} = a. If the sample falls outside of the interval (F1, F2) the null hypothesis will be rejected (note our text has transposed the two values in Ex and, apparently, in the discussion right before: or there is a "transposition" of the meaning of "critical region"). Let F' = 1/F . Then F' is an F-distribution with (n2,n1) degrees of freedom. We have: 12/2/2018

Thus the left tail boundary of the (n1,n2) -distribution can be computed as the reciprocal of the right tail boundary of the (n2,n1) distribution. From the table we have F2 = F(0.975),(9,7) = boundary of right tail F1 = F(0.025),(9,7) = 1/F(0.975),(7,9) = 1/4.20 = left tail. Since the computed value of F = (in my version of Ex which has a missing value added), the result falls outside of the acceptance region (0.238, 4.82). We reject the null hypothesis. 12/2/2018

Ex. 6.7 changes the null hypothesis: H0: s12 ≤ s22; A : s12 > s22. In this case all we need is the one-tailed version: F1 = F(0.95),(9,7) = again, reject the null hypothesis. If we change it again: H0: s12 ≥ s22; A : s12 < s22. The one tailed version is F1 = F(0.05),(9,7) = 1/F(0.95),(7,9) = 1/3.29 = 0.304 Since > 0.304, we cannot reject the null hypothesis - thus accept it. 12/2/2018

The c2 Goodness of Fit Test. How do we determine whether a set of samples fits a hypothesized population distribution? This test provides a way. Here is the recipe: 1) Construct a frequency table of the observed values of the random variable. No interval should have fewer than 5 occurrences: if you initial set of intervals does not meet this criterion, coalesce adjacent intervals until it does. Let Oi denote the number of observations in interval i. 2) Use the assumption on the distribution to compute expected frequencies for each interval 12/2/2018

2) Use the assumption on the distribution to compute expected frequencies for each interval. Let Ei denote the expected frequency for interval i. 3) Calculate the quantity for each interval i =1, 2, …, r. 4) calculate the c2 statistic using the formula: 12/2/2018

The parameter for the c2 statistic is "degrees of freedom". In this case the number of degrees of freedom is determined by two quantities: r = number of intervals; p = number of parameters the hypothesized distribution possesses (one for a Poisson, two for a normal, etc.). n = r + p - 1 is the number of degrees of freedom for the relevant c2 distribution. See Ex. 6.8 and 6.9. Ex. 6.8: observe that the expected values in the first and last two intervals don't satisfy the "minimum size" requirements - the resulting c2 value is very large and may not reflect reality. Coalesce... 12/2/2018

Ex. 6.9: the previous example dealt with a distribution with just ONE parameter - the Poisson Distribution. This example deals with a Normal Distribution: 2 parameters (mean and standard deviation) 12/2/2018

The Kolmogorov-Smirnov Test. This is a non-parametric test - the derivation of its details is beyond the scope of this course. It helps determine a probability density (or cumulative distribution function) for a set of sample observations: Thus the Null Hypothesis is simply that the data belong to a given population for which we know the distribution. It does NOT appear supported by Microsoft Excel... The test goes as follows: 12/2/2018

1) Construct an empirical cumulative distribution function S(x) form a sample of N observations. 2) Let F(x) be the theoretical cumulative distribution function hypothesized by the Null Hypothesis. 3) For each of the N sample points, compute: 12/2/2018

4) Go to the table, choose the value of a for the critical region, and if the value of D is greater than the tabulated critical value for a, reject the null hypothesis. See Ex and 6.11 In all cases note that some discrepancies on values of distributions keep showing. What is this due to? Where are the discrepancies coming from? They would not make any difference in acceptance or rejection of hypotheses for which the computed statistics fall "well" within or without the critical regions. They would certainly make a difference in the "close call" cases... 12/2/2018

12/2/2018

Discrete Event Simulation - 5

Similar presentations

Presentation on theme: "Discrete Event Simulation - 5"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discrete Event Simulation - 5

Similar presentations

Presentation on theme: "Discrete Event Simulation - 5"— Presentation transcript:

Similar presentations

About project

Feedback