Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.

Slides:



Advertisements
Similar presentations
Chapter 26 Comparing Counts
Advertisements

1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Goodness-of-Fit Tests.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Goodness-of-Fit Test.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts
Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Inferences About Process Quality
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Testing Hypotheses About Proportions
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Comparing Two Proportions
Hypothesis Testing for Proportions
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
CHAPTER 26: COMPARING COUNTS OF CATEGORICAL DATA To test claims and make inferences about counts for categorical variables Objective:
Chapter 26 Chi-Square Testing
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 26: Comparing counts of categorical data
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 26 Comparing Counts.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Chapter 20 Testing Hypothesis about proportions
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11 Chi-Square Procedures 11.1 Chi-Square Goodness of Fit.
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Chapter 26 Comparing Counts. Objectives Chi-Square Model Chi-Square Statistic Knowing when and how to use the Chi- Square Tests; Goodness of Fit Test.
Copyright © 2009 Pearson Education, Inc. Chapter 26 Comparing Counts.
Comparing Counts Chi Square Tests Independence.
CHAPTER 26 Comparing Counts.
Testing Hypotheses About Proportions
Hypothesis Testing Review
Chapter 25 Comparing Counts.
Comparing Two Proportions
Testing Hypotheses about Proportions
Comparing Two Proportions
Chapter 11: Inference for Distributions of Categorical Data
Overview and Chi-Square
15.1 Goodness-of-Fit Tests
When You See (This), You Think (That)
Testing Hypotheses About Proportions
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit test. As usual, there are assumptions and conditions to consider…

Assumptions and Conditions Counted Data Condition: Check that the data are counts for the categories of a categorical variable. Independence Assumption: –Randomization Condition: The individuals who have been counted and whose counts are available for analysis should be a random sample from some population.

Assumptions and Conditions (cont.) Sample Size Assumption: We must have enough data for the methods to work. –Expected Cell Frequency Condition: We should expect to see at least 5 individuals in each cell. This is similar to the condition that np and nq be at least 10 when we tested proportions.

Calculations Since we want to examine how well the observed data reflect what would be expected, it is natural to look at the differences between the observed and expected counts (Obs – Exp). These differences are actually residuals, so we know that adding all of the differences will result in a sum of 0. That’s not very helpful.

Calculations (cont.) We’ll handle the residuals as we did in regression, by squaring them. To get an idea of the relative sizes of the differences, we will divide these squared quantities by the expected values.

Calculations (cont.) The test statistic, called the chi-square (or chi-squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:

Calculations (cont.) The chi-square models are actually a family of distributions indexed by degrees of freedom (much like the t-distribution). The number of degrees of freedom for a goodness-of-fit test is n – 1, where n is the number of categories.

One-Sided or Two-Sided? The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals. If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.” So the chi-square test is always one- sided. –If the calculated value is large enough, we’ll reject the null hypothesis.

One-Sided or Two-Sided? (cont.) The mechanics may work like a one-sided test, but the interpretation of a chi-square test is in some ways many-sided. There are many ways the null hypothesis could be wrong. There’s no direction to the rejection of the null model—all we know is that it doesn’t fit.

The Chi-Square Calculation 1.Find the expected values: –Every model gives a hypothesized proportion for each cell. –The expected value is the product of the total number of observations times this proportion. 2.Compute the residuals: Once you have expected values for each cell, find the residuals, Observed – Expected. 3.Square the residuals.

The Chi-Square Calculation (cont.) 4.Compute the components. Now find the components for each cell. 5.Find the sum of the components (that’s the chi-square statistic).

The Chi-Square Calculation (cont.) 6.Find the degrees of freedom. It’s equal to the number of cells minus one. 7.Test the hypothesis.  Use your chi-square statistic to find the P-value. (Remember, you’ll always have a one-sided test.)  Large chi-square values mean lots of deviation from the hypothesized model, so they give small P- values.

But I Believe the Model… Goodness-of-fit tests are likely to be performed by people who have a theory of what the proportions should be, and who believe their theory to be true. Unfortunately, the only null hypothesis available for a goodness-of-fit test is that the theory is true. As we know, the hypothesis testing procedure allows us only to reject or fail to reject the null.

But I Believe the Model… (cont.) We can never confirm that a theory is in fact true. At best, we can point out only that the data are consistent with the proposed theory. –Remember, it’s that idea of “not guilty” versus “innocent.”

Example Test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency Observed Expected Hypothesis H o : The distribution of the observed equals the expected distribution H a : The distribution of the observed does not equals the expected distribution

Example Test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency Observed Expected Check Assumptions/Conditions SRS is assumed We are working with counts All expected counts  5

Example Test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency Observed Expected Calculate Test

Example Test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency Observed Expected Calculate Test

4. Conclusion  = 0.05 Since the p-value is greater than alpha, we fail to reject the claim that the observed distribution is equal to the expected distribution. Example Test the claim that the responses occur with percentages of 15%, 20%, 25%, 25%, and 15% respectively. Response A B C D E Frequency Observed Expected

Example In studying the occurrence of genetic characteristics, the following sample data were obtained. Are they uniformly distributed? CharacteristicABCDEF Frequency Observed Expected38 1. Hypothesis H o : The distribution of the genetic characteristics is uniformly distributed H a : The distribution of the genetic characteristics is not uniformly distributed

2. Check Assumptions/Conditions SRS is assumed We are working with counts All expected counts  5 Example In studying the occurrence of genetic characteristics, the following sample data were obtained. Are they uniformly distributed? CharacteristicABCDEF Frequency Observed Expected38

3. Calculate Test Example In studying the occurrence of genetic characteristics, the following sample data were obtained. Are they uniformly distributed? CharacteristicABCDEF Frequency Observed Expected38

4. Conclusion  = 0.05 Since the p-value is greater than alpha, we fail to reject the claim that the distribution of the genetic characteristics is uniformly distributed. Example In studying the occurrence of genetic characteristics, the following sample data were obtained. Are they uniformly distributed? CharacteristicABCDEF Frequency Observed Expected38