Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Practice of Statistics in the Life Sciences Fourth Edition

Similar presentations


Presentation on theme: "The Practice of Statistics in the Life Sciences Fourth Edition"— Presentation transcript:

1 The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 21: The chi-square test for goodness of fit Copyright © 2018 W. H. Freeman and Company

2 Objectives (PSLS Chapter 21)
The chi-square test for goodness of fit Hypotheses for goodness of fit Expected counts and chi-square statistic The chi-square test Interpreting significant chi-square results Conditions for the chi-square test

3 Goodness of fit hypotheses
The chi-square test can be used to for a categorical variable (one SRS) with any number of levels. The null hypothesis can be that all population proportions are equal (uniform hypothesis) Are hospital births uniformly distributed in the week? H0: p1 = p2 = p3 = p4 = p5 = p6 = p7 = 1/7 or that they are equal to some specific values, as long as the sum of all the population proportions in H0 equals 1. When crossing homozygote parents expressing two co-dominant phenotypes A and B, we would expect in F2 H0: pA = ¼ , pAB = ½ , pB = ¼ where AB is an intermediate phenotype.

4 Idea of the chi-square test
The chi-square test is used when the data are categorical. It measures how different the observed data are from what we would expect if H0 was true. Observed sample proportions (one SRS of 700 births) Expected proportions under H0: p1 = p2 = p3 = p4 = p5 = p6 = p7 = 1/7 Again, we want to know if differences in sample proportions are likely to have occurred by chance just because of the random sampling.

5 The chi-square statistic
The chi-square statistic (X2) compares observed and expected counts. Observed counts are the actual number of observations of each type. Expected counts are the number of observations that we would expect to see of each type if the null hypothesis were true. (calculated for each type separately and then summed) Large values of X2 represent strong deviations from the expected distribution under H0, and will tend to be statistically significant.

6 The chi-square distributions
The chi-square distributions are a family of distributions that take only positive values, are skewed to the right, and are described by a specific degrees of freedom. Published tables and software give the upper-tail area for critical values of many chi-square distributions.

7 Table D

8 Chi-square test for goodness of fit
The chi-square statistic for goodness of fit with k proportions measures how much observed counts differ from expected counts. It follows the chi-square distribution with k − 1 degrees of freedom and has the formula: The P-value is the tail area under the chi-square distribution, with df = k – 1.

9 Goodness of fit example (1 of 4)
Aphids evade predators (ladybugs) by dropping off the leaf. An experiment examined the mechanism of aphid drops. “When dropped upside-down from delicate tweezers, live aphids landed on their ventral side in 95% of the trials (19 out of 20). In contrast, dead aphids landed on their ventral side in 52.2% of the trials (12 out of 23).” Is there evidence (at significance level 5%) that live aphids’ landing right side up (on their ventral side) is not a chance event? We test

10 Goodness of fit example (2 of 4)
Observed counts Expected counts X2 contributions 19 1 20(0.5) = 10 (19 – 10)2/10 = 8.1 (1 – 10)2/10 = 8.1 Expected counts are large enough (both are greater than 5.0).

11 Goodness of fit example (3 of 4)
From Table D, we find 𝑋 2 > 12.12, so P < (software gives P-value = ), highly significant. We reject 𝐻 0 . 𝑋 2 =16.2 We have found very strong evidence that the righting behavior of live aphids is not chance (P < ).

12 Goodness of fit example (4 of 4)
Note: the one sample test for proportions from Chapter 19 gives the exact same P-value. The test statistic is: 𝑧= − − 𝑃 𝑧 >4.02 =

13 Interpreting significant chi-square results
The individual values summed in the X2 statistic are the chi-square components (or contributions). When the test is statistically significant, the largest components indicate which condition(s) are most different from the expected H0. Compare the observed and expected counts to interpret the findings. You can also compare the actual proportions qualitatively in a graph. The chi-square components (contributions) give a measure of the relative disparity between observed and expected counts for each condition. Observed counts Expected counts Chi-square contributions 19 1 20(0.5) = 10 (19 – 10)2/10 = 8.1 (1 – 10)2/10 = 8.1 Aphids land the right side up (ventral side) a lot more often (95%) than chance alone would predict (50%).

14 Lack of significance: Avoid a logical fallacy
A non-significant P-value is not conclusive: H0 could be true, or not. This is particularly relevant in the chi-square goodness of fit test where we are often interested in the hypothesis that the data fit a particular model. A significant P-value suggests that the data do not follow that model. But finding a non-significant P-value is NOT a validation of the null hypothesis and does NOT suggest that the data do follow the hypothesized model. It only shows that the data are not inconsistent with the model.

15 Lack of significance example (1 of 4)
The Frizzle fowl is a chicken with curled feathers. A genetic crossing between Frizzle and Leghorn (straight-feathered) varieties produced chickens in F1 all with slightly frizzled feathers. The most likely genetic model is that of a single gene locus with two co-dominant alleles producing a 1:2:1 ratio in F2. Are the data consistent with such a model?

16 Lack of significance example (2 of 4)
Category Observed Test Proportion Expected Contribution to Chi-Sq F 23 0.25 23.25 SF 50 0.50 46.50 S 20

17 Lack of significance example (3 of 4)
DF Chi-sq P-Value 93 2 0.698 Failing to reject the null hypothesis is not proof that the null hypothesis is true, just that it is one possibility.

18 Lack of significance example (4 of 4)
Degrees of freedom = k – 1 = 2, and X2 = 0.72, P > 0.25 (Table D). Software gives P = This is not statistically significant and we fail to reject H0. The observed data are consistent with a single gene locus with two co-dominant alleles (1:2:1 genetic model). The small observed deviations from the model could simply have arisen from the random sampling process alone. Because of the lack of rejection, we cannot conclude that we have proven the null is true. Rather we have shown that the null hypothesis is plausible. Failing to reject the null hypothesis is not proof that the null hypothesis is true, just that it is one possibility.

19 Conditions for the goodness of fit test
The chi-square test for goodness of fit is used when we have a single SRS from a population and the variable is categorical with k mutually exclusive levels. We can safely use the chi-square test when: all expected counts have values ≥ 1.0 no more than 20% of the k expected counts have values < 5.0 For every 5 levels, the test can accommodate 1 small-ish expected count (greater than 1.0 but smaller than 5.0). There are techniques to use if the expected counts are too low: you can either collapse some levels of the categorical variable, or you can use an “exact test” (not covered in this textbook).

20 The Practice of Statistics in the Life Sciences Fourth Edition
Chapter 22: Chi-square test for two-way tables Copyright © 2018 W. H. Freeman and Company

21 Objectives The chi-square test for two-way tables Two-way tables
Hypotheses for two-way tables Expected counts in a two-way table The chi-square test Conditions for the chi-square test The chi-square test and the two-sample z test Simpson’s paradox

22 Two-way tables (1 of 2) An experiment has a two-way, or block, design if two categorical factors are studied with several levels of each factor. Two-way tables organize data about two categorical variables with any number of levels/treatments obtained from a two-way, or block, design. A two-way table refers to the fact that there are two ways to group or summarize the data.

23 Two-way tables (2 of 2) High school students were asked whether they smoke and whether their parents smoke: First factor: Parent smoking status Second factor: Student smoking status A two-way table refers to the fact that there are two ways to group or summarize the data.

24 Marginal distribution (1 of 4)
The marginal distributions (in the “margins” of the table) summarize each factor separately.

25 Marginal distribution (2 of 4)

26 Marginal distribution (3 of 4)
With two factors, there are two marginal distributions.

27 Marginal distribution (4 of 4)

28 The conditional distribution (1 of 2)
The cells of the two-way table represent the intersection of a given level of one factor with a given level of the other factor. They can be used to compute the conditional distributions.

29 The conditional distribution (2 of 2)

30 Hypotheses for two-way tables
A two-way table has r rows and c columns. H0 states that there is no association between the row and column variables in the table. H0: There is no association between the row and column variables. Ha: There is an association/relationship between the two variables. We will compare actual counts from the sample data with expected counts given the null hypothesis of no relationship. The null hypothesis can also be stated as, “The row and column variables are independent.”

31 Expected counts in a two-way table
A two-way table has r rows and c columns. H0 states that there is no association between the row and column variables (factors) in the table. The expected count in any cell of a two-way table when H0 is true is: expected count= row total×column total table total Compute an expected count for each cell INSIDE the table (excluding the margin totals). Do not round the expected counts beyond reasonable rounding; the expected counts do not need to be whole numbers, because they are only theoretical expectations.

32 Expected counts example (1 of 4)
Cocaine produces short-term feelings of physical and mental well-being. To maintain the effect, the drug may have to be taken more frequently and at higher doses. After stopping use, users will feel tired, sleepy, and depressed. A study compares the rates of successful rehabilitation for cocaine addicts following one of three treatment options: antidepressant treatment (desipramine) standard treatment (lithium) placebo (“sugar pill”) Art from

33 Expected counts example (2 of 4)
Art from

34 Expected counts example (3 of 4)

35 Expected counts example (4 of 4)

36 Conditions for the chi-square test (1 of 3)
The chi-square test for two-way tables looks for evidence of association between two categorical variables (factors) in sample data. The samples can be drawn either: By randomly selecting SRSs from different populations (or from a population subjected to different treatments) girls vaccinated for HPV or not among 8th-graders and 12th-graders remission or no remission for different treatments

37 Conditions for the chi-square test (2 of 3)
Or by taking one SRS and classifying the individuals according to two categorical variables (factors) obesity and ethnicity among high school students

38 Conditions for the chi-square test (3 of 3)
We can safely use the chi-square test when: very few (no more than 1 in 5) expected counts are < 5.0 all expected counts are ≥ 1.0 [Note: If one factor has many levels and too many expected counts are too low, you might be able to “collapse” some of the levels (regroup them) and thus have large enough expected counts.]

39 The chi-square test for two-way tables (1 of 2)
H0 : there is no association between the row and column variables Ha : H0 is not true The X2 statistic is summed over all r × c cells in the table: 𝑋 2 = observed count−expected count 2 expected count Use Table D or technology to obtain the P-value.

40 The chi-square test for two-way tables (2 of 2)
When H0 is true, the X2 statistic follows an approximate chi-square distribution with (r − 1)(c − 1) degrees of freedom. Use Table D or technology to obtain the P-value. P-value: P(chi-square variable ≥ calculated X2)

41 Table D

42 Chi-square statistic example (1 of 3)
Table of counts: “actual/expected,” with three rows and two columns: df = (3 − 1)(2 − 1) = 2

43 Chi-square statistic example (2 of 3)
We compute the X2 statistic: 𝑋 2 = 15− − − − − − =10.74

44 Chi-square statistic example (3 of 3)
Using Table D: < X2 <  > P > The P-value is very small (software gives P = ) and we reject H0.  There is a significant relationship between treatment type (desipramine, lithium, placebo) and outcome (relapse or not).

45 The two-sample z test (1 of 2)
The chi-square test in a 2 x 2 table can be calculated using the two-sample z test from Chapter 20. A 2014 Pew Research Center survey of a random sample of American adults found that 455 of the 549 participants who had chosen to ignore online harassment felt that it was an effective way to deal with the issue. Of the 368 participants who had chosen to respond to online harassment, 276 felt that it had been effective. The 2x2 table is on the following slide:

46 The two-sample z test (2 of 2)
Minitab output for the chi-square test gave X2 = with P = TI-83 output for the 2-PropZTest gave z = with P = , which rounds to The two tests will always give identical P-values.

47 Interpreting the chi-square output (1 of 2)
When the X2 test is statistically significant: The largest components indicate which condition(s) are most different from H0. You can also compare the observed and expected counts, or compare the computed proportions in a graph.

48 Interpreting the chi-square output (2 of 2)
The largest X2 component, 4.41, is for desipramine/no relapse. Desipramine has the highest success rate (see graph).

49 Parental smoking example (1 of 2)
Here is a computer output for a chi-square test performed on the data from a random sample of high school students (rows are parental smoking habits; columns are the students’ smoking habits). What does it tell you? N = 5375 H0: there is no association between parental smoking habits and student smoking habits; Ha: H0 is not true. The data are an SRS and the expected counts are all greater than 5, so a chi-square test is okay. There is strong evidence of an association between parental and student smoking habits (P < 0.001). In particular, students with no parents smoking are less likely to smoke, whereas students with both parents smoking are more likely to smoke than we would expect if there was no relationship.

50 Parental smoking example (2 of 2)
Sample size? Hypotheses? Are the data okay for a chi-square test? Interpretation? N = 5375 H0: there is no association between parental smoking habits and student smoking habits; Ha: H0 is not true. The data are an SRS and the expected counts are all greater than 5, so a chi-square test is okay. There is strong evidence of an association between parental and student smoking habits (P < 0.001). In particular, students with no parents smoking are less likely to smoke, whereas students with both parents smoking are more likely to smoke than we would expect if there was no relationship.

51 Physician-assisted suicide example (1 of 5)
A 2013 Gallup study investigated how phrasing affects the opinions of Americans regarding physician-assisted suicide. Telephone interviews were conducted with a random sample of 1,535 national adults. Using random assignment, 719 heard the question in Form A (“end the patient’s life by some painless means”) and 816 the one in Form B (“assist the patient to commit suicide”). A. With such a large test statistic, the P-value is very small (software gives P = 3E − 13) and highly significant: we reject the null hypothesis of no relationship between phrasing and opinions. This is a comparative randomized experiment and the expected counts are all greater than 5.0 (smallest is 719 × 55/1535=25.76). So the test assumptions are met.

52 Physician-assisted suicide example (2 of 5)
The chi-square test statistic for these data is X2 = Conclude using  = 0.05. There is significant evidence of a relationship between opinions and question wording. We failed to find significant evidence of a relationship between opinions and question wording. The test assumptions are not met. A. With such a large test statistic, the P-value is very small (software gives P = 3E − 13) and highly significant: we reject the null hypothesis of no relationship between phrasing and opinions. This is a comparative randomized experiment and the expected counts are all greater than 5.0 (smallest is 719 × 55/1535=25.76). So the test assumptions are met.

53 Physician-assisted suicide example (3 of 5)
This is a comparative randomized experiment; therefore, a causal conclusion is appropriate.

54 Physician-assisted suicide example (4 of 5)
We found that phrasing significantly (P < ) influences opinions about physician-assisted suicide. Specifically, the phrasing of “painless means” resulted in a substantially higher approval (70% in favor) than the phrasing of “commit suicide” (51% in favor). This is a comparative randomized experiment; therefore, a causal conclusion is appropriate.

55 Physician-assisted suicide example (5 of 5)
This is a comparative randomized experiment; therefore, a causal conclusion is appropriate.

56 Caution with categorical data (1 of 2)
Beware of lurking variables! An association that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s paradox. Kidney stones A study compared the success rates of two different procedures for removing kidney stones: open surgery and percutaneous nephrolithotomy (PCNL), a minimally invasive technique.

57 Caution with categorical data (2 of 2)
Can you think of a possible lurking variable here?

58 Simpson’s Paradox example (1 of 3)
The procedures are not chosen randomly by surgeons! In fact, the minimally invasive procedure is most likely used for smaller stones (with a good chance of success), whereas open surgery is likely used for more problematic conditions.

59 Simpson’s Paradox example (2 of 3)

60 Simpson’s Paradox example (3 of 3)
For both small stones and large stones, open surgery has a lower failure rate. This is Simpson’s paradox. The more challenging cases with large stones tend to be treated more often with open surgery, making it appear as if the procedure was less reliable overall.


Download ppt "The Practice of Statistics in the Life Sciences Fourth Edition"

Similar presentations


Ads by Google