381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)
381 Multinomial Experiments A is a probability experiment consisting of a fixed number of trials in which there are more than two possible outcomes for each independent trial. The probability for each outcome is fixed and each outcome is classified into. Examples of multinomial experiments include: You sample 100 animals from a population. The categories could be age, length, maturity state. You sample 1000 poppies in a field. The categories could be colour. You sample 20 animals and calculate the frequency that each has a particular genetic haplotype.
381 Goodness-of-fit Tests A is used to test whether an observed frequency distribution fits an expected distribution. We need to specify a null and an alternative hypothesis. Generally the null hypothesis is that the observed frequency distribution (the data) fits the expected distribution. The alternative hypothesis is that this is not the case.
381 Example-I We expect that a “healthy” marine mammal population should consist of an equal number of males and females, and that 60% of the population should be mature. We sample 150 animals and assess the fraction in each of four categories to be: Mature Female Mature Male Immature Female Immature Male
381 Observed and Expected Frequencies The of a category is the frequency for the category observed in the data. The of a category is the calculated frequency for the category. Expected frequencies are obtained by assuming the specified (or hypothesized) distribution is correct. The expected frequency for the i th category is: Where n is the number of trials, and p i is the assumed probability for the i th category.
381 Observed and Expected Frequencies (Example) Mature Female Mature Male Immature Female Immature Male Observed frequency Assumed probability Expected frequency 45 (150 x 0.3) 45 (150 x 0.3) 30 (150 x 0.2) 30 (150 x 0.2)
381 The Chi-square goodness-of-fit Test-I IF: 1. the observed frequencies are obtained from a random sample, and 2. the expected frequencies are greater than or equal to 5 (pool categories if this is not the case). then the sampling distribution for the goodness-of-fit test is a chi-square distribution with k-1 degrees of freedom where k is the number of categories. The test statistic is:
381 The Chi-square goodness-of-fit Test-II 1. Identify the claim and state the null and alternative hypotheses. 2. Specify the level of significance, . 3. Determine the degrees of freedom, d.f=k Find the critical value of the chi-square distribution and hence define the rejection region for the test. 5. Calculate the test statistic. 6. Check whether or not the value of the test statistic is in the rejection region.
381 Example (Test using =0.01) H 0 : the distribution of animals between sex and maturity classes equals that expected for a healthy population. The degrees of freedom=k-1=3. The critical value of the chi-square distribution is (CHIINV(0.01,3))
381 Example (Test using =0.01) Mature Female Mature Male Immature Female Immature Male Observed frequency Expected frequency We reject the null hypothesis at the 1% level of significance.
381 Example-A-1 ( =0.05) The probability of a particular bird species utilizing each of five habitats is known. We collect data for a different species (n=137) and wish to assess whether the two species differ in their habitat requirements. Habitat type Expected p Observed
381 Example-A-2 ( =0.05) Habitat type Observed frequency Expected frequency The critical value is 9.49 – we fail to reject the null hypothesis
381 Testing for Normality We can use the chi-square test in some cases to assess whether a variable is normally distributed. The null and alternative hypotheses are that: The variable has a normal distribution. The variable does not have a normal distribution.
381 Example Class boundaries Frequency Can we assume that these data are normal (assume =0.05)?
381 Calculating the Test Statistic Class boundaries Observed frequency O Cumulative normalExpected p Expected Frequency E LowerUpperDifference x i is the mid-point of each class E i =p i x 149