The Analysis of Categorical Data and Goodness of Fit Tests

Slides:



Advertisements
Similar presentations
Chapter 11 Other Chi-Squared Tests
Advertisements

Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
AP Statistics Tuesday, 15 April 2014 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi-square test for independence;
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Chi-Square Tests and the F-Distribution
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Chapter 13: Inference for Tables – Chi-Square Procedures
Analysis of Count Data Chapter 26
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Analysis of Count Data Chapter 14  Goodness of fit  Formulas and models for two-way tables - tests for independence - tests of homogeneity.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.
Analysis of Count Data Chapter 26  Goodness of fit  Formulas and models for two-way tables - tests for independence - tests of homogeneity.
Chapter 11: Inference for Distributions of Categorical Data
Chi-square test or c2 test
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Chapter Chi-Square Tests and the F-Distribution 1 of © 2012 Pearson Education, Inc. All rights reserved.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Other Chi-Square Tests
Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
+ Chi Square Test Homogeneity or Independence( Association)
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Chapter Outline Goodness of Fit test Test of Independence.
1 Chapter 10. Section 10.1 and 10.2 Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Learning from Categorical Data
CHAPTER INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods.
Chapter Fifteen Chi-Square and Other Nonparametric Procedures.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
AP Statistics Tuesday, 05 April 2016 OBJECTIVE TSW (1) identify the conditions to use a chi-square test; (2) examine the chi- square test for independence;
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Chapter 12 Lesson 12.2b Comparing Two Populations or Treatments 12.2: Test for Homogeneity and Independence in a Two-way Table.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chi-square test or c2 test
The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 11 Chi-Square Tests.
The Analysis of Categorical Data and Chi-Square Procedures
Lecture 18 Section 8.3 Objectives: Chi-squared distributions
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
The Analysis of Categorical Data and Goodness of Fit Tests
Analyzing the Association Between Categorical Variables
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 11 Chi-Square Tests.
Presentation transcript:

Please pick up a M&M activity sheet, form a group of 2-3 and choose a bag of M&Ms

The Analysis of Categorical Data and Goodness of Fit Tests Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests

We could record the color of each candy in the bag. Suppose we wanted to determine if the proportions for the different colors in a large bag of M&M candies matches the proportions that the company claims is in their candies. k is used to denote the number of categories for a categorical variable We could record the color of each candy in the bag. Create a jar with different types of coins . . . There are six colors – so k = 6. This would be univariate, categorical data. How many categories for color would there be?

We could count how many candies of each color are in the bag. M&M Candies Continued . . . We could count how many candies of each color are in the bag. Red Blue Green Yellow Orange Brown 23 28 21 19 22 25 A goodness-of-fit test will allow us to determine if these observed counts are consistent with what we expect to have. A one-way frequency table is used to display the observed counts for the k categories. Create a jar with different types of coins . . .

Goodness-of-Fit Test Procedure Null Hypothesis: H0: p1 = hypothesized proportion for Category 1 pk = hypothesized proportion for Category k Ha: H0 is not true Test Statistic: The goodness-of-fit statistic, denoted by X2, is a quantitative measure to the extent to which the observed counts differ from those expected when H0 is true. The goodness-of-fit test is used to analysze univariate categorical data from a single sample. . . . Read “chi-squared” The X2 value can never be negative.

Goodness-of-Fit Test Procedure Continued . . . P-values: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = k – 1. Therefore, the P-value associated with the computed test statistic value is the area to the right of X2 under the df = k – 1 chi-square curve. Assumptions: Observed cell counts are based on a random sample The sample size is large enough as long as every expected cell count is at least 5

Facts About c2 distributions Different df have different curves c2 curves are skewed right As df increases, the c2 curve shifts toward the right and becomes more like a normal curve df=3 df=5 df=10

There are eight phases so k = 8. A common urban legend is that more babies than expected are born during certain phases of the lunar cycle, especially near the full moon. The table below shows the number of days in the eight lunar phases with the number of births in each phase for 24 lunar cycles. There are eight phases so k = 8. Lunar Phase Number of Days Number of Births New Moon 24 7680 Waxing Crescent 152 48,442 First Quarter 7579 Waxing Gibbous 149 47,814 Full Moon 7711 Waning Gibbous 150 47,595 Last Quarter 7733 Waning Crescent 48,230

Expected Number of Births Lunar Phases Continued . . . Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon 24 7680 =24/699=.0343    Waxing Crescent 152 48,442 First Quarter 7579 Waxing Gibbous 149 47,814 Full Moon 7711 Waning Gibbous 150 47,595 Last Quarter 7733 Waning Crescent 48,230 There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days.

Expected Number of Births Lunar Phases Continued . . . Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon 24 7680 =24/699=.0343    Waxing Crescent 152 48,442 .217  First Quarter 7579 .0343  Waxing Gibbous 149 47,814 .213  Full Moon 7711 Waning Gibbous 150 47,595 .215  Last Quarter 7733 Waning Crescent 48,230 There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days.

Expected Number of Births Lunar Phases Continued . . . Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon 24 7680 0343  =.0343*222784=7641.49  Waxing Crescent 152 48,442 .217    First Quarter 7579 .0343  Waxing Gibbous 149 47,814 .213  Full Moon 7711 Waning Gibbous 150 47,595 .215  Last Quarter 7733 Waning Crescent 48,230 There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days.

Expected Number of Births Lunar Phases Continued . . . Lunar Phase Number of Days Number of Births Proportion of Days Expected Number of Births New Moon 24 7680 0343  =.0343*222784=7641.49  Waxing Crescent 152 48,442 .217  48433.24  First Quarter 7579 .0343  7641.49  Waxing Gibbous 149 47,814 .213  47452.27 Full Moon 7711 Waning Gibbous 150 47,595 .215  47809.44  Last Quarter 7733 Waning Crescent 48,230 There is a total of 699 days in the 24 lunar cycles. If there is no relationship between the number of births and lunar phase, then the expected proportions equal the number of days in each phase out of the total number of days.

Lunar Phases Continued . . . H0: p1 = .0343, p2 = .2175, p3 = .0343, p4 = .2132, p5 = .0343, p6 = .2146, p7 = .0343, p8 = .2175 Ha: H0 is not true What type of error could we have potentially made with this decision? Type II Test Statistic: P-value > .10 df = 7 a = .05 Since the P-value > a, we fail to reject H0. There is not sufficient evidence to conclude that lunar phases and number of births are related. The X2 test statistic is smaller than the smallest entry in the df = 7 column of Appendix Table 8.

Get your quizzes and homework from your folder Have your practice test out

These values in green are the observed counts. A study was conducted to determine if collegiate soccer players had in increased risk of concussions over other athletes or students. The two-way frequency table below displays the number of previous concussions for students in independently selected random samples of 91 soccer players, 96 non-soccer athletes, and 53 non-athletes. If there were no difference between these 3 populations in regards to the number of concussions, how many soccer players would you expect to have no concussions? We would expect (158/240)(91). These values in green are the observed counts. Also called a contingency table. Number of Concussions 1 2 3 or more Total Soccer Players 45 25 11 10 91 Non-Soccer Players 68 15 8 5 96 Non-Athletes 3 53 158 22 240 This is univariate categorical data - number of concussions - from 3 independent samples. These values in blue are the marginal totals. This value in red is the grand total.

X2 Test for Homogeneity Null Hypothesis: H0: the true category proportions are the same for all the populations or treatments Alternative Hypothesis: Ha: the true category proportions are not all the same for all the populations or treatments Test Statistic: The c2 Test for Homogeneity is used to analyze univariate categorical data from 2 or more independent samples.

X2 Test for Homogeneity Continued . . . Expected Counts: (assuming H0 is true) P-value: When H0 is true and all expected counts are at least 5, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of X2 under the appropriate chi-square curve.

X2 Test for Homogeneity Continued . . . Assumptions: Data are from independently chosen random samples or from subjects who were assigned at random to treatment groups. The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

df = (number of rows – 1)(number of columns – 1) Soccer Players Continued . . . State the hypotheses. Number of Concussions 1 2 3 or more Total Soccer Players 45 25 11 10 91 Non-Soccer Players 68 15 8 5 96 Non-Athletes 3 53 158 22 240 H0: Proportions in each response category (number of concussions) are the same for all three groups Ha: Category proportions are not all the same for all three groups Df = (2)(3) = 6 To find df count the number of rows and columns – not including the totals! df = (number of rows – 1)(number of columns – 1) Another way to find df – you can also cover one row and one column, then count the number of cells left (not including totals)

Soccer Players Continued . . . Number of Concussions 1 2 or more Total Soccer Players 45 (59.9) 25 (17.1) 21 (14.0) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 13 (14.8) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53 158 45 22 240 Number of Concussions 1 2 3 or more Total Soccer Players 45 (59.9) 25 (17.1) 11 (8.3 10 (5.7) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 8 (8.8) 5 (6.0) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (4.9) 0 (3.3) 53 158 45 22 15 240 df = 4 Test Statistic: Notice that NOT all the expected counts are at least 5. So combine the column for 2 concussions and the column for 3 or more concussions. This combined table has a df = (2)(2) = 4. Expected counts are shown in the parentheses next to the observed counts. P-value < .001 a = .05

These cells had the largest contributions to the X2 test statistic. Soccer Players Continued . . . Number of Concussions 1 2 or more Total Soccer Players 45 (59.9) 25 (17.1) 21 (14.0) 91 Non-Soccer Players 68 (63.2) 15 (18.0) 13 (14.8) 96 Non-Athletes 45 (34.9) 5 (10.0) 3 (8.2) 53 158 45 22 240 Since the P-value < a, we reject H0. There is strong evidence to suggest that the category proportions for the number of concussions is not the same for the 3 groups. We can look at the chi-square contributions – which of the cells above have the greatest contributions to the value of the X2 statistic? These cells had the largest contributions to the X2 test statistic. Is that all I can say – that there is a difference in proportions for the groups?

X2 Test for Independence Null Hypothesis: H0: The two variables are independent Alternative Hypothesis: Ha: The two variables are not independent Test Statistic: The c2 Test for Independence is used to analyze bivariate categorical data from a single sample.

X2 Test for Independence Continued . . . Expected Counts: (assuming H0 is true) P-value: When H0 is true and assumptions for X2 test are satisfied, X2 has approximately a chi-square distribution with df = (number of rows – 1)(number of columns – 1). The P-value associated with the computed test statistic value is the area to the right of X2 under the appropriate chi-square curve.

X2 Test for Independence Continued . . . Assumptions: The observed counts are based on data from a random sample. The sample size is large: all expected cell counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

Both Body Piercing and Tattoos The paper “Contemporary College Students and Body Piercing” (Journal of Adolescent Health, 2004) described a survey of 450 undergraduate students at a state university in the southwestern region of the United States. Each student in the sample was classified according to class standing (freshman, sophomore, junior, senior) and body art category (body piercing only, tattoos only, both tattoos and body piercing, no body art). Is there evidence that there is an association between class standing and response to the body art question? Use a = .01. State the hypotheses. Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 7 14 86 Sophomore 43 11 10 64 Junior 20 9 Senior 21 17 23 54

H0: class standing and body art category are independent Body Art Continued . . . Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 7 14 86 Sophomore 43 11 10 64 Junior 20 9 Senior 21 17 23 54 Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) H0: class standing and body art category are independent Ha: class standing and body art category are not independent df = 9 Assuming H0 is true, what are the expected counts? How many degrees of freedom does this two-way table have?

Both Body Piercing and Tattoos Body Art Continued . . . Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) Test Statistic: P-value < .001 a = .01

Which cell contributes the most to the X2 test statistic? Body Art Continued . . . Body Piercing Only Tattoos Only Both Body Piercing and Tattoos No Body Art Freshman 61 (49.7) 7 (15.1) 14 (18.5) 86 (84.7) Sophomore 43 (37.9) 11 (11.5) 10 (14.1) 64 (64.5) Junior 20 (23.4) 9 (7.1) 7 (8.7) 43 (39.8) Senior 21 (34.0) 17 (10.3) 23 (12.7) 54 (58.0) Since the P-value < a, we reject H0. There is sufficient evidence to suggest that class standing and the body art category are associated. Seniors having both body piercing and tattoos contribute the most to the X2 statistic. Which cell contributes the most to the X2 test statistic?