The Analysis of Categorical Data and Goodness of Fit Tests

The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

Chi-Square Tests for Univariate Data
Section 12.1 Chi-Square Tests for Univariate Data

We could record the color of each candy in the bag.
Suppose we wanted to determine if the proportions for the different colors in a large bag of M&M candies matches the proportions that the company claims is in their candies. k is used to denote the number of categories for a categorical variable We could record the color of each candy in the bag. Create a jar with different types of coins . . . There are six colors, so k = 6. This would be univariate, categorical data. How many categories for color would there be?

We could count how many candies of each color are in the bag.
M&M Candies Continued . . . We could count how many candies of each color are in the bag. Red Blue Green Yellow Orange Brown 23 28 21 19 22 25 A goodness-of-fit test will allow us to determine if these observed counts are consistent with what we expect to have. A one-way frequency table is used to display the observed counts for the k categories. Create a jar with different types of coins . . .

Notation k = number of categories of a categorical variable
p1 = true proportion for category 1 p2 = true proportion for category 2  pk = true proportion for category k (note: p1 + p2 +  + pk = 1)

Hypotheses H0: p1 = hypothesized proportion for category 1
 pk = hypothesized proportion for category k Ha: H0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

Expected Counts For each category, the expected count for that category is the product of the total number of observations with the hypothesized proportion for that category.

Expected Counts Example
Consider the sample of faculty from a large university system and recall that the newsperson wanted to test to see if each of the groups occurred with equal frequency. Categories Full Professor Associate Professor Assistant Professor Instructor Adjunct/Part Time Total Frequency 22 31 25 35 41 Hypothesized Proportion 0.2 Expected Count 154 1 30.8 30.8 30.8 30.8 30.8 154

Goodness-of-Fit Test Procedure
Hypotheses: H0: p1 = hypothesized proportion for Category 1 pk = hypothesized proportion for Category k Ha: H0 is not true Test Statistic: The goodness-of-fit statistic, denoted by c2, is a quantitative measure to the extent to which the observed counts differ from those expected when H0 is true. The goodness-of-fit test is used to analyze univariate categorical data from a single sample. . . . Read “chi-squared” The c2 value can never be negative.

Goodness-of-Fit Test Procedure Continued . . .
P-values: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = k – 1. Therefore, the P-value associated with the computed test statistic value is the area to the right of X2 under the df = k – 1 chi-square curve. Assumptions: Observed cell counts are based on a random sample The sample size is large enough as long as every expected cell count is at least 5

Facts About c2 distributions
Different df have different curves c2 curves are skewed right As df increases, the c2 curve shifts toward the right and becomes more like a normal curve df=3 df=5 df=10

Chi-square distributions

Upper-tail Areas for Chi-square Distributions

Faculty Distribution Example
Consider the newsperson’s desire to determine if the faculty of a large university system were equally distributed. Let us test this hypothesis at a significance level of 0.05. Let p1, p2, p3, p4, and p5 denote the proportions of all faculty in this university system that are full professors, associate professors, assistant professors, instructors and adjunct/part time respectively. H0: p1 = 0.2, p2 = 0.2, p3 = 0.2, p4= 0.2, p5 = 0.2 Ha: H0 is not true

Assumptions: As we saw in an earlier slide, the expected counts were all 30.8 which is greater than 5. Although we do not know for sure how the sample was obtained for the purposes of this example, we shall assume selection procedure generated a random sample. Significance level:  = 0.05 Test statistic:

Calculation:

P-value: The P-value is based on a chi-squared distribution with df = = 4. The computed value of 2, 7.56 is smaller than 7.77, the lowest value of 2 in the table for df = 4, so that the P-value is greater than Conclusion: Since the P-value > 0.05 = , H0 cannot be rejected. There is insufficient evidence to refute the claim that the proportion of faculty in each of the different categories is the same.

There are eight phases so k = 8.
A common urban legend is that more babies than expected are born during certain phases of the lunar cycle, especially near the full moon. The table below shows the number of days in the eight lunar phases with the number of births in each phase for 24 lunar cycles. There are eight phases so k = 8. Lunar Phase Number of Days Number of Births New Moon 24 7680 Waxing Crescent 152 48,442 First Quarter 7579 Waxing Gibbous 149 47,814 Full Moon 7711 Waning Gibbous 150 47,595 Last Quarter 7733 Waning Crescent 48,230

The hypothesis statements would be:
Lunar Phases Continued . . . Let: p1 = proportion of births that occur during the new moon p2 = proportion of births that occur during the waxing crescent moon p3 = proportion of births that occur during the first quarter moon p4 = proportion of births that occur during the waxing gibbous moon p5 = proportion of births that occur during the full moon p6 = proportion of births that occur during the waning gibbous moon p7 = proportion of births that occur during the last quarter moon p8 = proportion of births that occur during the waning crescent moon There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days. The hypothesis statements would be: H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175 Ha: H0 is not true p1 = p2 = p3 = p4 = .2132 P5 = p6 = p7 = p8 = .2175

Lunar Phases Continued . . .
H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175 Ha: H0 is not true Lunar Phase Observed Number of Births Expected Number of Births New Moon 7680 Waxing Crescent 48,442 First Quarter 7579 Waxing Gibbous 47,814 Full Moon 7711 Waning Gibbous 47,595 Last Quarter 7733 Waning Crescent 48,230 There is a total of 222,784 births in the sample. If there is no relationship between the number of births and lunar phase, then the expected counts for each category would equal n (hypothesized proportion).

Lunar Phases Continued . . .
H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175 Ha: H0 is not true What type of error could we have potentially made with this decision? Type II Test Statistic: P-value > .10 df = 7 a = .05 Since the P-value > a, we fail to reject H0. There is not sufficient evidence to conclude that lunar phases and number of births are related. The c2 test statistic is smaller than the smallest entry in the df = 7 column of Appendix Table 8.

Tests for Homogeneity and Independence in a Two-way Table
Section 12.2 Tests for Homogeneity and Independence in a Two-way Table

Tests for Homogeneity and Independence in a Two-Way Table
Data resulting from observations made on two different categorical variables can be summarized using a tabular format. For example, consider the student data set giving information from a sample of 79 students taking elementary statistics. The table is on the next slide.

This is an example of a two-way frequency table, or contingency table. The numbers in the 6 cells with clear backgrounds are the observed cell counts.

Marginal totals are obtained by adding the observed cell counts in each row and also in each column. Contacts Glasses None Row Marginal Total Female 5 9 11 Male 22 27 Column Marginal Total 25 54 10 31 38 79 The sum of the column marginal totals (or the row marginal totals) is called the grand total.

Tests for Homogeneity in a Two-Way Table
Typically, with a two-way table used to test homogeneity, the rows indicate different populations and the columns indicate different categories or vice versa. For a test of homogeneity, the central question is whether the category proportions are the same for all of the populations

When the row indicates the population, the expected count for a cell is simply the overall proportion (over all populations) that have the category times the number in the population. To illustrate: = overall proportion of students using contacts 54 = total number of male students = expected number of males that use contacts as primary vision correction

The expected values for each cell represent what would be expected if there is no difference between the groups under study can be found easily by using the following formula.

Calculate the expected count for each cell. Contacts Glasses None Row Marginal Totals Female 5 9 11 25 Male 22 27 54 Column Marginal Totals 10 31 38 79 25 • 10 79 25 • 31 79 25 • 38 79 54 • 10 79 54 •31 79 54 • 38 79

Expected counts are in parentheses. Contacts Glasses None Row Marginal Totals Female 5 9 11 25 Male 22 27 54 Column Marginal Totals 10 31 38 79 (3.16) (9.81) (12.03) (6.84) (21.19) (25.97)

X2 Test for Homogeneity Null Hypothesis:
H0: the true category proportions are the same for all the populations or treatments Alternative Hypothesis: Ha: the true category proportions are not all the same for all the populations or treatments Test Statistic: The c2 Test for Homogeneity is used to analyze univariate categorical data from 2 or more independent samples.

X2 Test for Homogeneity Continued
Expected Counts: (assuming H0 is true) P-value: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of X2 under the appropriate chi-square curve.

X2 Test for Homogeneity Continued
Assumptions: Data are from independently chosen random samples or from subjects who were assigned at random to treatment groups. The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

These values in green are the observed counts.
A study was conducted to determine if collegiate soccer players had an increased risk of concussions over other athletes or students. The two-way frequency table below displays the number of previous concussions for students in independently selected random samples of 91 soccer players, 96 non-soccer athletes, and 53 non-athletes. If there were no difference between these 3 populations in regards to the number of concussions, how many soccer players would you expect to have no concussions? We would expect (158/240)(91). These values in green are the observed counts. Also called a contingency table. Number of Concussions 1 2 3 or more Total Soccer Players 45 25 11 10 91 Non-Soccer Players 68 15 8 5 96 Non-Athletes 3 53 158 22 240 This is univariate categorical data - number of concussions - from 3 independent samples. These values in blue are the marginal totals. This value in red is the grand total.

df = (number of rows – 1)(number of columns – 1)
Soccer Players Continued . . . State the hypotheses. Number of Concussions 1 2 3 or more Total Soccer Players 45 25 11 10 91 Non-Soccer Players 68 15 8 5 96 Non-Athletes 3 53 158 22 240 H0: Proportions in each response category (number of concussions) are the same for all three groups Ha: Category proportions are not all the same for all three groups Df = (2)(3) = 6 To find df count the number of rows and columns – not including the totals! df = (number of rows – 1)(number of columns – 1) Another way to find df – you can also cover one row and one column, then count the number of cells left (not including totals)

Soccer Players Continued
Number of Concussions 1 2 or more Total Soccer Players 45 (59.9) 25 (17.1) 21 (14.0) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 13 (14.8) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53 158 45 22 240 Number of Concussions 1 2 3 or more Total Soccer Players 45 (59.9) 25 (17.1) 11 (8.3 10 (5.7) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 8 (8.8) 5 (6.0) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (4.9) 0 (3.3) 53 158 45 22 15 240 df = 4 Test Statistic: Notice that NOT all the expected counts are at least 5. So combine the column for 2 concussions and the column for 3 or more concussions. This combined table has a df = (2)(2) = 4. Expected counts are shown in the parentheses next to the observed counts. P-value < a = .05

These cells had the largest contributions to the X2 test statistic.
Soccer Players Continued . . . Number of Concussions 1 2 or more Total Soccer Players 45 (59.9) 25 (17.1) 21 (14.0) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 13 (14.8) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53 158 45 22 240 Since the P-value < a, we reject H0. There is strong evidence to suggest that the category proportions for the number of concussions is not the same for the groups. We can look at the chi-square contributions – which of the cells above have the greatest contributions to the value of the X2 statistic? These cells had the largest contributions to the X2 test statistic. Is that all I can say – that there is a difference in proportions for the groups?

Drug Example The following data came from a clinical trial of a drug regime used in treating a type of cancer, lymphocytic lymphoma. A sample of 273 patients were randomly divided into two groups, with one group of patients receiving cytoxan plus prednisone (CP) and the other receiving BCNU plus prednisone (BP). The responses to treatment were graded on a qualitative scale. The two-way table summary of the results is on the following slide.

Drug Example Set up and perform an appropriate hypothesis test at the 0.05 level of significance. Complete Response Partial Response No Change Progression Row Marginal Total BP 26 51 21 40 CP 31 59 11 34 Column Marginal Total 138 135 57 110 32 74 273

Drug Example (28.81) (55.60) (16.18) (37.41) (28.19) (54.40) (15.82)
Assumptions: Samples were chosen randomly All expected cell counts are at least 5 Complete Response Partial Response No Change Progression Row Marginal Total BP 26 51 21 40 138 CP 31 59 11 34 135 Column Marginal Total 57 110 32 74 273 (28.81) (55.60) (16.18) (37.41) (28.19) (54.40) (15.82) (36.59)

Drug Example Hypotheses:
H0: The true response to treatment proportions are the same for both treatments (homogeneity of populations). Ha: The true response to treatment proportions are not all the same for both treatments. Significance level:  = 0.05 Test statistic:

Drug Example Calculations:
The two-way table for this example has 2 rows and 4 columns, so the appropriate df is (2-1)(4-1) = 3. Since 4.60 < 6.25, the P-value > 0.10 >  = 0.05 so H0 is not rejected. There is insufficient evidence to conclude that the response rates are different for the two treatments.

Shopping Example A student decided to study the shoppers in Wegman’s, a local supermarket, to see if males and females exhibited the same behavior patterns with regard to the device used to carry items. He observed 57 shoppers (presumably randomly) and obtained the results that are summarized in the table on the next slide.

Shopping Example Determine if the carrying device proportions are the same for both genders using a 0.05 level of significance. Cart Basket Nothing Row Marginal Total Male 9 21 5 Female 7 8 Column Marginal Total 35 22 16 28 13 57

Shopping Example Using Minitab, we get the following output:
Chi-Square Test: Basket, Cart, Nothing Expected counts are printed below observed counts Basket Cart Nothing Total Total Chi-Sq = = 5.251 DF = 2, P-Value = 0.072

Shopping Example Test statistic: Hypotheses:
H0: The true proportions of the device used are the same for both genders. Ha: The true proportions of the device used are not the same for both genders. Significance level:  = 0.05 Test statistic:

Shopping Example Conclusion: P-value = 0.072 a = 0.05
Since P-value > a, we fail to reject H0. there is insufficient evidence to support a claim that males and females are not the same in terms of proportionate use of carrying devices at Wegman’s supermarket.

X2 Test for Independence
Null Hypothesis: H0: The two variables are independent Alternative Hypothesis: Ha: The two variables are not independent Test Statistic: The X2 Test for Independence is used to analyze bivariate categorical data from a single sample.

c2 Test for Independence Continued
Expected Counts: (assuming H0 is true) P-value: When H0 is true and assumptions for X2 test are satisfied, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of c2 under the appropriate chi-square curve.

X2 Test for Independence Continued
Assumptions: The observed counts are based on data from a random sample. The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

Both Body Piercing and Tattoos
The paper “Contemporary College Students and Body Piercing” (Journal of Adolescent Health, 2004) described a survey of 450 undergraduate students at a state university in the southwestern region of the United States. Each student in the sample was classified according to class standing (freshman, sophomore, junior, senior) and body art category (body piercing only, tattoos only, both tattoos and body piercing, no body art). Is there evidence that there is an association between class standing and response to the body art question? Use a = .01. Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 7 14 86 Sophomore 43 11 10 64 Junior 20 9 Senior 21 17 23 54 State the hypotheses.

H0: class standing and body art category are independent
Body Art Continued Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 7 14 86 Sophomore 43 11 10 64 Junior 20 9 Senior 21 17 23 54 Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) H0: class standing and body art category are independent Ha: class standing and body art category are not independent df = 9 Assuming H0 is true, what are the expected counts? How many degrees of freedom does this two-way table have?

Both Body Piercing and Tattoos
Body Art Continued Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) Test Statistic: P-value < a = .01

Which cell contributes the most to the X2 test statistic?
Body Art Continued Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) Since the P-value < a, we reject H0. There is sufficient evidence to suggest that class standing and the body art category are associated. Seniors having both body piercing and tattoos contribute the most to the X2 statistic. Which cell contributes the most to the X2 test statistic?

Vision Correction Example
Consider the two categorical variables, gender and principle form of vision correction for the sample of students used earlier in this presentation. Contacts Glasses None Row Marginal Total Female 5 9 11 Male 22 27 Column Marginal Total 25 54 10 31 38 79 We shall now test to see if the gender and the principle form of vision correction are independent.

Assumptions: Sample of students was randomly & independently chosen. All expected cell counts are at least 5, Contacts Glasses None Row Marginal Total Female 5 9 11 25 Male 22 27 54 Column Marginal Total 10 31 38 79 (3.16) (9.81) (12.03) (6.84) (21.19) (25.97)

Assumptions: Notice that the expected count is less than 5 in the cell corresponding to Female and Contacts. So we should combine the columns for Contacts and Glasses to get Contacts or Glasses None Row Marginal Total Female 11 25 Male 27 54 Column Marginal Total 38 79 14 (12.97) (12.03) 27 (28.03) (25.97) 41

Hypotheses: H0: Gender and principle method of vision correction are independent. Ha: Gender and principle method of vision correction are not independent. Significance level: We have not chosen one, so we shall look at the practical significance level. Test statistic:

Calculations: The contingency table for this example has 2 rows and 2 columns, so the appropriate df is (2-1)(2-1) = 1. Since < 2.70, the P-value is substantially greater than H0 would not be rejected for any reasonable significance level. There is not sufficient evidence to conclude that the gender and vision correction are related. (I.e., For all practical purposes, one would find it reasonable to assume that gender and need for vision correction are independent.

Minitab would provide the following output if the frequency table was input as shown. Chi-Square Test: Contacts or Glasses, None Expected counts are printed below observed counts Contacts None Total Total Chi-Sq = = 0.246 DF = 1, P-Value = 0.620

The Analysis of Categorical Data and Goodness of Fit Tests

Similar presentations

Presentation on theme: "The Analysis of Categorical Data and Goodness of Fit Tests"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Analysis of Categorical Data and Goodness of Fit Tests

Similar presentations

Presentation on theme: "The Analysis of Categorical Data and Goodness of Fit Tests"— Presentation transcript:

Similar presentations

About project

Feedback