1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.

Slides:



Advertisements
Similar presentations
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
Advertisements

Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Tests for Homogeneity.
The Analysis of Categorical Data and Goodness of Fit Tests
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Hypothesis Testing.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 11 Comparing Two Populations or Treatments.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Copyright © 2004 Pearson Education, Inc.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Other Chi-Square Tests
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 12: The Analysis of Categorical Data and Goodness- of-Fit Test.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
+ Chi Square Test Homogeneity or Independence( Association)
Copyright © 2010 Pearson Education, Inc. Slide
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter Outline Goodness of Fit test Test of Independence.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Learning from Categorical Data
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lesson 12 - R Chapter 12 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc Tests for Homogeneity and Independence in a Two-Way Table Data resulting from observations.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Presentation 12 Chi-Square test.
The Analysis of Categorical Data and Goodness of Fit Tests
Chapter 12 Tests with Qualitative Data
The Analysis of Categorical Data and Chi-Square Procedures
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Inference on Categorical Data
The Analysis of Categorical Data and Goodness of Fit Tests
Analyzing the Association Between Categorical Variables
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
The Analysis of Categorical Data and Goodness of Fit Tests
Presentation transcript:

1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests

2 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Univariate Categorical Data Univariate categorical data is best summarized in a one-way frequency table. For example, consider the following observations of sample of faculty status for faculty in a large university system.

3 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Univariate Categorical Data A local newsperson might be interested in testing hypotheses about the proportion of the population that fall in each of the categories. For example, the newsperson might want to test to see if the five categories occur with equal frequency throughout the whole university system. To deal with this type of question we need to establish some notation.

4 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Notation k = number of categories of a categorical variable  1 = true proportion for category 1  2 = true proportion for category 2    k = true proportion for category k (note:  1 +  2 +  +  k = 1)

5 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hypotheses H 0 :  1 = hypothesized proportion for category 1  2 = hypothesized proportion for category 2    k = hypothesized proportion for category k H a :H 0 is not true, so at least one of the true category proportions differs from the corresponding hypothesized value.

6 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Expected Counts For each category, the expected count for that category is the product of the total number of observations with the hypothesized proportion for that category.

7 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Expected Counts - Example Consider the sample of faculty from a large university system and recall that the newsperson wanted to test to see if each of the groups occurred with equal frequency.

8 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-fit statistic,  2 The value of the  2 statistic is the sum of these terms. The goodness-of-fit statistic,  2, results from first computing the quantity for each cell.

9 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chi-square distributions

10 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Upper-tail Areas for Chi-square Distributions

11 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure Hypotheses: H 0 :  1 = hypothesized proportion for category 1  2 = hypothesized proportion for category 2    k = hypothesized proportion for category k H a :H 0 is not true Test statistic:

12 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure P-values: When H 0 is true and all expected counts are at least 5,  2 has approximately a chi-square distribution with df = k-1. Therefore, the P-value associated with the computed test statistic value is the area to the right of  2 under the df = k-1 chi-square curve.

13 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Goodness-of-Fit Test Procedure Assumptions: 1.Observed cell counts are based on a random sample. 2.The sample size is large. The sample size is large enough for the chi-squared test to be appropriate as long as every expected count is at least 5.

14 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Consider the newsperson’s desire to determine if the faculty of a large university system were equally distributed. Let us test this hypothesis at a significance level of Let  1,  2,  3,  4, and  5 denote the proportions of all faculty in this university system that are full professors, associate professors, assistant professors, instructors and adjunct/part time respectively. H 0 :  1 = 0.2,  2 = 0.2,  3 = 0.2,  4 = 0.2,  5 = 0.2 H a : H 0 is not true

15 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Significance level:  = 0.05 Assumptions: As we saw in an earlier slide, the expected counts were all 30.8 which is greater than 5. Although we do not know for sure how the sample was obtained for the purposes of this example, we shall assume selection procedure generated a random sample. Test statistic:

16 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Calculation: recall

17 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example P-value: The P-value is based on a chi-squared distribution with df = = 4. The computed value of  2, 7.56 is smaller than 7.77, the lowest value of  2 in the table for df = 4, so that the P-value is greater than Conclusion: Since the P-value > 0.05 = , H 0 cannot be rejected. There is insufficient evidence to refute the claim that the proportion of faculty in each of the different categories is the same.

18 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity and Independence in a Two-Way Table Data resulting from observations made on two different categorical variables can be summarized using a tabular format. For example, consider the student data set giving information on 79 student dataset that was obtained from a sample of 79 students taking elementary statistics. The table is on the next slide.

19 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity and Independence in a Two-Way Table This is an example of a two-way frequency table, or contingency table. The numbers in the 6 cells with clear backgrounds are the observed cell counts.

20 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity and Independence in a Two-Way Table Marginal totals are obtained by adding the observed cell counts in each row and also in each column. The sum of the column marginal total (or the row marginal totals) is called the grand total.

21 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity in a Two-Way Table Typically, with a two-way table used to test homogeneity, the rows indicate different populations and the columns indicate different categories or vice versa. For a test of homogeneity, the central question is whether the category proportions are the same for all of the populations

22 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity in a Two-Way Table When the row indicates the population, the expected count for a cell is simply the overall proportion (over all populations) that have the category times the number in the population. To illustrate: 54 = total number of male students = overall proportion of students using contacts = expected number of males that use contacts as primary vision correction

23 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity in a Two-Way Table The expected values for each cell represent what would be expected if there is no difference between the groups under study can be found easily by using the following formula.

24 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity in a Two-Way Table

25 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tests for Homogeneity in a Two-Way Table Expected counts are in parentheses.

26 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Two or More Populations Using the  2 Statistic Hypotheses: H 0 :The true category proportions are the same for all of the populations (homogeneity of populations). H a :The true category proportions are not all the same for all of the populations.

27 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Two or More Populations Using the  2 Statistic The expected cell counts are estimated from the sample data (assuming that H 0 is true) using the formula Test statistic:

28 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Two or More Populations Using the  2 Statistic P-value:When H 0 is true,  2 has approximately a chi-square distribution with The P-value associated with the computed test statistic value is the area to the right of  2 under the chi-square curve with the appropriate df. df = (number of rows - 1)(number of columns - 1)

29 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Two or More Populations Using the  2 Statistic Assumptions: 1.The data consists of independently chosen random samples. 2.The sample size is large: all expected counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.

30 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example The following data come from a clinical trial of a drug regime used in treating a type of cancer, lymphocytic lymphomas.* Patients (273) were randomly divided into two groups, with one group of patients receiving cytoxan plus prednisone (CP) and the other receiving BCNU plus prednisone (BP). The responses to treatment were graded on a qualitative scale. The two-way table summary of the results is on the following slide. * Ezdinli, E., S., Berard, C. W., et al. (1976) Comparison of intensive versus moderate chemotherapy of lympocytic lymphomas: a progress report. Cancer, 38,

31 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Set up and perform an appropriate hypothesis test at the 0.05 level of significance.

32 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hypotheses: H 0 :The true response to treatment proportions are the same for both treatments (homogeneity of populations). H a :The true response to treatment proportions are not all the same for both treatments. Example Significance level:  = 0.05 Test statistic:

33 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Assumptions: All expected cell counts are at least 5, and samples were chosen independently so the  2 test is appropriate.

34 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Calculations: The two-way table for this example has 2 rows and 4 columns, so the appropriate df is (2-1)(4-1) = 3. Since >  = 0.05 so H 0 is not rejected. There is insufficient evidence to conclude that the response rates are different for the two treatments.

35 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Comparing Two or More Populations Using the  2 Statistic P-value:When H 0 is true,  2 has approximately a chi-square distribution with df = (number of rows - 1)(number of columns - 1) The P-value associated with the computed test statistic value is the area to the right of  2 under the chi-square curve with the appropriate df.

36 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example A student decided to study the shoppers in Wegman’s, a local supermarket to see if males and females exhibited the same behavior patterns with regard to the device use to carry items. He observed 57 shoppers (presumably randomly) and obtained the results that are summarized in the table on the next slide.

37 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Determine if the carrying device proportions are the same for both genders using a 0.05 level of significance.

38 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hypotheses: H 0 :The true proportions of the device used are the same for both genders. H a :The true proportions of the device used are not the same for both genders. Example Significance level:  = 0.05 Test statistic:

39 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Using Minitab, we get the following output: Example Chi-Square Test: Basket, Cart, Nothing Expected counts are printed below observed counts Basket Cart Nothing Total Total Chi-Sq = = DF = 2, P-Value = 0.072

40 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. We draw the following conclusion. Example With a P-value of 0.072, there is insufficient evidence at the 0.05 significance level to support a claim that males and females are not the same in terms of proportionate use of carrying devices at Wegman’s supermarket.

41 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hypotheses: H 0 :The two variables are independent. H a :The two variables are not independent.  2 Test for Independence The  2 test statistic and procedures can also be used to investigate the association between tow categorical variable in a single population.

42 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. The expected cell counts are estimated from the sample data (assuming that H 0 is true) using the formula  2 Test for Independence Test statistic:

43 © 2008 Brooks/Cole, a division of Thomson Learning, Inc.  2 Test for Independence The P-value associated with the computed test statistic value is the area to the right of  2 under the chi-square curve with the appropriate df. P-value:When H 0 is true,  2 has approximately a chi-square distribution with df = (number of rows - 1)(number of columns - 1)

44 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Assumptions: 1.The observed counts are from a random sample. 2.The sample size is large: all expected counts are at least 5. If some expected counts are less than 5, rows or columns of the table may be combined to achieve a table with satisfactory expected counts.  2 Test for Independence

45 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Consider the two categorical variables, gender and principle form of vision correction for the sample of students used earlier in this presentation. We shall now test to see if the gender and the principle form of vision correction are independent.

46 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Hypotheses: H 0 :Gender and principle method of vision correction are independent. H a : Gender and principle method of vision correction are not independent. Significance level: We have not chosen one, so we shall look at the practical significance level. Test statistic:

47 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Assumptions: We are assuming that the sample of students was randomly chosen. All expected cell counts are at least 5, and samples were chosen independently so the  2 test is appropriate.

48 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Assumptions: Notice that the expected count is less than 5 in the cell corresponding to Female and Contacts. So that we should combine the columns for Contacts and Glasses to get

49 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example The contingency table for this example has 2 rows and 2 columns, so the appropriate df is (2-1)(2-1) = 1. Since < 2.70, the P-value is substantially greater than H 0 would not be rejected for any reasonable significance level. There is not sufficient evidence to conclude that the gender and vision correction are related. (I.e., For all practical purposes, one would find it reasonable to assume that gender and need for vision correction are independent. Calculations:

50 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example Minitab would provide the following output if the frequency table was input as shown. Chi-Square Test: Contacts or Glasses, None Expected counts are printed below observed counts Contacts None Total Total Chi-Sq = = DF = 1, P-Value = 0.620