Chapter 12 Analysis of count data.

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Inference about the Difference Between the
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Chi-Square Tests and the F-Distribution
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
Chi-Squared Test.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
The p-value approach to Hypothesis Testing
Chapter 11 Chi Square Distribution and Its applications.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chi Square Test Dr. Asif Rehman.
Comparing Counts Chi Square Tests Independence.
Chapter 13 Logistic regression.
Virtual University of Pakistan
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 9: Non-parametric Tests
Presentation 12 Chi-Square test.
Chi-square test or c2 test
10 Chapter Chi-Square Tests and the F-Distribution Chapter 10
CHAPTER 11 CHI-SQUARE TESTS
Chapter 11 Chi-Square Tests.
Lecture Slides Elementary Statistics Twelfth Edition
Hypothesis Testing Review
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Chi-Square Tests and the F-Distribution
Patterns of inheritance
Chapter 12: Inference about a Population Lecture 6b
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11 Goodness-of-Fit and Contingency Tables
Analysis of count data 1.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Paired Samples and Blocks
CHAPTER 11 CHI-SQUARE TESTS
Chapter 13: Inference for Distributions of Categorical Data
Goodness of Fit.
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11 Chi-Square Tests.
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Presentation transcript:

Chapter 12 Analysis of count data

Example: Mendelian inheritance According to Mendel's theory, pea plant phenotypes should appear in the ratio 9:3:3:1. Thus, Mendel's model specifies the probabilities, 0.5625, 0.1875, 0.1875, and 0.0625, of the four groups: where “r”, “w”, “y”, and “g” in the subscripts denote round, wrinkled, yellow, and green, respectively.

Example: Mendelian inheritance In the experiment, he examined 556 pea plants, and if we believe Mendel's model to be true (i.e., we look at the distribution under H0), then we would expect (556)(9/16)=312.75 plants of round and yellow peas. If we do the same calculation for every group, we can summarize the observed data and the expected data in a table.

Goodness of fit Assume that we have classified n individuals into k groups. We wish to test a null hypothesis H0 that completely specifies the probabilities of each of the k categories: Here pi represents the probability that a randomly chosen individual will belong to category i and p01, …, p0k are the corresponding pre-specified probabilities under the null hypothesis.

Chi-square test To test a hypothesis for tabular data, we can use the chi-square statistic Here the summation is over all possible categories. The numerator in the ith term contains the squared difference between the observed and expected number of observations in category i, so it is close to zero if the two nearly coincide and it becomes large if there is a large discrepancy between the two.

Degrees of freedom It turns out that the chi-square test statistic X2 approximately follows a χ2(df) distribution, where df is the number of degrees of freedom if H0 is true. The critical value for X2 at significance level α is the value such that the right-most area under the χ2 distribution is exactly α. Then the p-value is the probability (given that the null hypothesis is true) of observing a value that is more extreme (i.e., further away from zero) than what we have observed.

Chi-square distribution The left panel shows the density for the χ2(r) distribution with r = 1 (solid), r = 5 (dashed), as well as r = 10 (dotted). The right panel illustrates the density for the χ2(5) distribution. The 95% quantile is 11.07, as illustrated by the gray area which has area 0.95.

Example: Mendelian inheritance The X2 test statistic becomes We compare this value to a χ2(3) distribution because we have 4 categories, which results in 3 degrees of freedom. The critical value for a χ2(3) distribution when testing at a significance level of 0.05 is 7.81 (see, for example, the statistical tables on p. 399). Since X2 = 0.470 is less than 7.81, we fail to reject the null hypothesis. In other words, these data do not contradict Mendel's hypothesis.

Contingency table A typical situation is that observations are classified or tabulated according to two categorical variables and not just a single categorical variable. In the following table we have two categorical variables each with two categories. It can be extended to more general r × k contingency tables, where we have two categorical variables—one with r categories and the other with k categories.

Test for homogeneity For the first population, we can look at the probability of observing response 1. If p11 is the probability of observing response 1 for an individual from population 1, then we have We can estimate the corresponding probability for population 2 Then it is natural to be interested in the test for homogeneity

Test for homogeneity If we assume that the probability of observing response 1 is the same for both populations, then we can use the data from both populations to estimate that probability. Our combined estimate of the probability of observing response 1, p, would be There are n1 observations in population 1. Thus, we would expect From population 1 to result in response 2, we would expect

Test for homogeneity In general, the expected number of observations in cell (i,j) for a contingency table is given by Now that we have the expected number of observations for each cell under H0, we can carry out a test of H0 using the same type of chi-square test statistic

Example: Avadex Treatment The increasing risk of the fungicide Avadex on pulmonary cancer in mice was studied. Sixteen male mice were continuously fed small concentrations of Avadex (the treatment population), while 79 male mice were given the usual diet (the control population). After 85 weeks, all animals were examined for tumors, with the results shown below:

Example: Avadex Treatment If we let pt and pc denote the probabilities that a mouse from the treatment population and the control population, respectively, develops a tumor, then we can estimate those probabilities by The 95% confidence interval for pt − pc becomes Regarding the null hypothesis we fail to reject it.

Example: Avadex Treatment Under the hypothesis H0: pt = pc, the probability of developing a tumor is not affected by population (treatment group). The probability of observing a tumor is given by Then we would expect 0.0947· 16 = 1.52 of the mice from the treatment group and 0.0947· 79 = 7.48 from the control group to develop tumors. We can calculate all four expected values:

Example: Avadex Treatment When testing H0, we calculate the test statistic By comparing to a χ2(1) distribution, we obtain the p-value of 0.020, and reject the null hypothesis. Thus, we find that the relative frequencies of tumors are different in the two populations. Furthermore, we found Therefore, we are able to conclude that Avadex appears to increase the cancer rate in mice.

Example: Avadex Treatment We conclude that the result is significant, and that Avadex appears to increase the cancer rate in mice. However, the second conclusion from p-value is different from the first conclusion from the 95% confidence interval for pt − pc. We get different results from the two methods because the normal and χ2 approximations are not identical for small sample sizes. However, we found that zero was barely in the 95% confidence interval and got a p-value for the chi-square test statistic of 0.020. These two results are really not that different, and when we get a p-value close to 0.05 or get a confidence interval where zero is barely inside or outside, then we should be careful with the conclusions.