Chi-squared tests Goodness of fit: Does the actual frequency distribution of some data agree with an assumption? Test of Independence: Are two characteristics of a set of data dependent or not? Test of homogeneity: Do different populations have the same characteristic in the same proportion? page 574 of text
Example: The distribution of colors in a bag of M&M’s is the same Goodness-of-Fit Test Test the hypothesis that an observed frequency distribution fits some claimed distribution Example: The distribution of colors in a bag of M&M’s is the same Another example: The distribution of colors in a bag of M&M’s is the following: 30% Brown 10% Green 20% Red 10% Blue 20% Yellow 10% Orange The name for this test comes from the idea that we are testing how well an observed frequency distribution fits some specified theoretical distribution.
Categories have equal frequencies Hypothesis Testing Categories have equal frequencies H0: p1 = p2 = p3 = . . . = pk H1: at least one of the probabilities is different Categories have unequal frequencies H0 : p1 , p2, p3, . . . , pk are as claimed H1 : at least one of the above proportions is not the claimed value Examples on page 578 - 579 of text.
Claim: The distribution of colors is the same: Expected Frequencies Brown Yellow Red Orange Green Blue Observed frequency (O) 33 26 21 8 7 5 Expected frequency (E) Claim: The distribution of colors is the same: First, we collect our sample (Observed frequency) Next, we calculate our Expected frequency: If all expected frequencies are equal: If all expected frequencies are not all equal: Each category’s expected frequency is the product the total and the category’s probability page 577 of text
Test Statistics and Critical Values Brown Yellow Red Orange Green Blue Observed frequency (O) 33 26 21 8 7 5 Expected frequency (E) (O-E)2/E Test Statistic: This formula is a measure of the discrepancies between the O and E values. The chi-distribution was first presented in Chapter 6, Section 6-6, page 343.
Critical Values in Chi Square table Fail to Reject Reject Discuss with students that the test does not identify where the significant discrepancies are. However, the figure on the following slide helps to visualize where they might be. Heading: Alpha Goodness-of-fit hypothesis tests are always right-tailed. Degrees of freedom: number of categories minus 1
Uneven Distribution Example Mars claims its M&M candies are distributed with the color percentages of 30% brown, 20% yellow, 20% red, 10% orange, 10% green, and 10% blue. At the 0.05 significance level, test the claim that the color distribution is as claimed by Mars, Inc. H0 (and claim): p1 = 0.30, p2 = 0.20, p3 = 0.20, p4 = 0.10, p5 = 0.10, p6 = 0.10 H1: At least one of the proportions is different from the claimed value.
Uneven Distribution Test statistic Critical Value Brown Yellow Red Orange Green Blue Observed frequency(O) 33 26 21 8 7 5 Expected frequency (O-E)2/E Test statistic Critical Value
Locating the test statistic = 0.05 Fail to Reject Reject Discuss with students that the test does not identify where the significant discrepancies are. However, the figure on the following slide helps to visualize where they might be.
For example Even distribution: Uneven distribution: The population of South students is evenly distributed among the four classes Uneven distribution: The population of South students is distributed as follows: 15% Lincroft 15% River Plaza 20% Nut Swamp 20% Navesink 20% Leonard 5% Village 5% Private
Live example Leonardo 20 Frosh 29 Lincroft 18 Sophomore 24 Navesink 19 Junior Nut Swamp 21 Senior 23 River Plaza 10 Private 7 Village 5
Your Turn Month Frequency 1 2 3 5 4 6 7 9 8 10 11 12 Claim: students’ favorite month is equally distributed throughout the year
Your Turn, again Claim: students’ favorite colors distribution are are 25% green, 25% blue, and 10% the remaining colors red 8 green 10 blue 13 orange 2 purple 7 black yellow 4