Copyright © 2010, 2007, 2004 Pearson Education, Inc Chapter 11 Goodness of Fit Test (section 11.2)
Copyright © 2010, 2007, 2004 Pearson Education, Inc Preview We focus on analysis of categorical (qualitative or attribute) data that can be separated into different categories (often called cells). Hypothesis test: Observed counts agree with some claimed distribution. Use the x 2 (chi-square) distribution.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example There are four blood types: A, B, AB, O A sample of 100 patients had their blood type identified. Determine what percentage of people have each blood type. Blood type Observed Frequency A42 B9 AB6 O43
Copyright © 2010, 2007, 2004 Pearson Education, Inc Definition A goodness-of-fit test is used to test the hypothesis that an observed frequency distribution fits (or conforms to) some claimed distribution. Example The claimed frequency distribution of blood types is as follows: 40% type A4% type AB 11% type B45% type O Source:
Copyright © 2010, 2007, 2004 Pearson Education, Inc Goodness-of-Fit Test We will compare each categories frequency with what we would expect (with our claimed distribution). To determine if it is a good fit, we look at how much they differ (i.e. how much they vary). Basically, our test uses the relative variance between our sample and claimed distribution. If the claimed distribution works, the variance should be small. How it works:
Copyright © 2010, 2007, 2004 Pearson Education, Inc Goodness-of-Fit Test 1.The data is randomly selected. 2.The sample data consist of frequency counts for each of the different categories. 3.For each category, the expected frequency is at least 5. Requirements
Copyright © 2010, 2007, 2004 Pearson Education, Inc O Observed frequency of an outcome E Expected frequency of an outcome k Number of different categories (or outcomes) n Total number of trials p 1 … p k Relative frequencies of each category Goodness-of-Fit Test Notation
Copyright © 2010, 2007, 2004 Pearson Education, Inc Goodness-of-Fit Test Statistic This is the relative variance between the expected frequencies and the observed frequencies. Follows a χ 2 -distribution (k-1 degrees of freedom)
Copyright © 2010, 2007, 2004 Pearson Education, Inc Expected Frequencies If expected frequencies are not all equal: If expected frequencies are all equal:
Copyright © 2010, 2007, 2004 Pearson Education, Inc Write what you know 2.State the claim 3.State the null and alternative hypothesis (H 0 and H 1 ) 4.Draw a diagram (Note: only use a right-tailed test for Goodness-of-Fit) 5.Calculate the test statistic and critical value 6.Make an initial conclusion (reject or fail to reject H 0 ) 7.Make a final conclusion Goodness-of-Fit Test Step-by-step guide
Copyright © 2010, 2007, 2004 Pearson Education, Inc Figure 11-2
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example 1 There are four blood types: A, B, AB, O A sample of 100 patients had their blood type identified. Determine if the claimed frequency distribution fits this data. Blood type Observed Frequency Claimed Frequency A4240% B911% AB64% O4345%
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example 1 Blood type Observed Frequency Claimed Frequency A4240% B911% AB64% O4345% Using StatCrunch (See video tutorial for details)
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example 1 Blood type Observed Frequency Claimed Frequency A4240% B911% AB64% O4345% A B AB O
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example 2 To check if collected has been accurately recorded and not made up, one can check if the value of the last decimal (between 0 and 9) is uniformly distributed. A set of weights were recorded and calculated the frequency of the last decimal. Test if this data is uniformly distributed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example 2 Last DigitFrequency Using StatCrunch
Copyright © 2010, 2007, 2004 Pearson Education, Inc Example Last DigitFrequency