Inference for Tables: Chi-Squares procedures (2 more chapters to go!)

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

Chapter 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Chapter 13: Inference for Tables – Chi-Square Procedures
AP STATISTICS LESSON 13 – 1 (DAY 1) CHI-SQUARE PROCEDURES TEST FOR GOODNESS OF FIT.
Chapter 11: Inference for Distributions of Categorical Data.
Chapter 11: Inference for Distributions of Categorical Data
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Chapter 11 Chi Square Distribution and goodness of fit.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Section 11.1 Chi-Square Goodness-of-Fit Tests
Chapter 13 Inference for Tables: Chi-Square Procedures AP Statistics 13 – Chi-Square Tests.
Chi-square test Chi-square test or  2 test. crazy What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
The Practice of Statistics Third Edition Chapter 14: Inference for Distributions of Categorical Variables: Chi-Square Procedures Copyright © 2008 by W.
Chi-square test Chi-square test or  2 test. Chi-square test countsUsed to test the counts of categorical data ThreeThree types –Goodness of fit (univariate)
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chapter 14 Inference for Distribution of Categorical Variables: Chi-Squared Procedures.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi-square test or c2 test
Check your understanding: p. 684
CHAPTER 11 Inference for Distributions of Categorical Data
Warm up On slide.
CHAPTER 11 Inference for Distributions of Categorical Data
11.1 Chi-Square Tests for Goodness of Fit
Chi-square test or χ2 test
Chapter 11: Inference for Distributions of Categorical Data
Chi-squared test or c2 test
Chi-square test or c2 test
Chi-square test or c2 test
Test for Goodness of Fit
Chi-square test or c2 test
Chi-square test or c2 test
Chapter 11: Inference for Distributions of Categorical Data
Chi-Square Goodness of Fit
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 13 Inference for Tables: Chi-Square Procedures
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 14.1 Goodness of Fit Test.
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square test or c2 test
Chapter 11: Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Inference for Distributions of Categorical Data
Chapter 11: Inference for Distributions of Categorical Data
Presentation transcript:

Inference for Tables: Chi-Squares procedures (2 more chapters to go!)

Sometimes we want to examine the distribution of proportions in a single population. The chi-square test for goodness of fit allows us to determine whether a specified population distribution seems valid. We can compare two or more population proportions using a chi-square test for homogeneity of populations In doing so, we will organize our data in a two-way table. It is also possible to use the information provided in a two-way table to determine whether the distribution of one variable has been influenced The chi-square test of association/independence helps us decide this issue.

Does your zodiac sign determine how successful you will be in later life? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Here are the numbers of births for each sign: Births Sign 23 Aries 20 Taurus 18 Gemini Cancer Leo 19 Virgo Libra 21 Scorpio Sagittarius 22 Capricorn 24 Aquarius 29 Pisces We can see some variation in the number of births per sign and there are more Pisces, but is it enough to claim that successful people are more likely to be born under some signs than others? How closely do the observed numbers of births per sign fit this simple “null” model?

Goodness-of-fit We have specified a model for the distribution and want to know whether it fits. There is no single parameter to estimate so a confidence interval wouldn’t make any sense.

M&M’s To see if you got a fair share of blue M&M’s you could perform significance tests (like we have been doing) You could then perform significance tests on the other colors This would be inefficient and would not tell us how likely it is that six sample proportions differ from the values stated by the company as much as our sample does.

Chi-square Test for Goodness of Fit Compare the distribution of each bag

Goodness of Fit H0: the actual population proportions are equal to the hypothesized proportions, Ha: the actual population proportions are different from the hypothesized proportions In order to determine whether the distribution is different calculate the quantity Χ 2 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 The sum is (χ2) and degrees of freedom are = number of categories - 1

Properties of the chi-squared distribution The chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A specific chi-squared distribution is specified by only one parameter called the degrees of freedom (number of categories -1)

Chi squared density curve The total area under a chi-squared curve is equal to 1 Chi square begins at 0 on the horizontal axis, increase to a peak, and then approaches the horizontal axis asymptotically Chi squared curve is skewed to the right Number of degrees of freedom increase, the curve becomes more and more symmetrical and look more like a normal curve

Density curves for three members of the chi-square family of distributions As the degrees of freedom increase, the density curve becomes less skewed and larger values become more probable

Hypotheses used for this test Follow same 4 step process- remember when no critical value is given use 0.05 Conditions: all individual expected counts are at least 1 no more than 20% of the expected counts are less than 5 Random

Example 13.2: Biologists wish to mate two fruit flies having genetic makeup RrCc, indicating that it has one dominant gene (R) and one recessive gene (r) for eye color, along with one dominant (C) and one recessive (c) gene for wing type. Each offspring will receive one gene for each of the two traits from both parents. The following table, often called a Punnett square, shows the possible combinations of genes received by the offspring. Parent 2 Passes on   RC Rc rC rc RRCC (x) RRCc (x) RrCC (x) RrCc (x) RRcc (y) Rrcc (y) rrCC (z) rrCc (z) rrCc (z) Rrcc (w) Parent 1 Passes on

Any offspring receiving an R gene will have red eyes, and any offspring receiving a C gene will have straight wings. So based on this Punnett square, the biologists predict a ratio of 9 red-eyed, straight wing (x): 3 red-eyed, curly wing (y): 3 white-eyed, straight (z): 1 white-eyed, curly (w) offspring. In order to test their hypothesis, the biologists mate the fruit flies. Of 200 offspring, 101 had red eyes and straight wings, 42 had red eyes and curly wings, 49 had white eyes and straight wings, and 10 had white eyes and curly wings. Do these data differ significantly from what the biologists have predicted?

Step 1: State The biologists are interested in the proportion of offspring that fall into each genetic category for the population of all fruit flies that would result from crossing two parents with genetic makeup RrCc. Ho = pred,straight = 0.5625, pred,curly = 0.1875, pwhite,straight = 0.1875, pwhite,curly = 0.0625 Ha = at least one of these proportions is incorrect

Step 2: Plan/Conditions We will use a chi-square goodness of fit test provided all conditions are met. Random- fruit flies selected randomly Conditions: check expected counts red-eyed, straight-wing: 200*0.5625 = 112.5 red-eyed, curly-wing: 200*0.1875 = 37.5 white-eyed, straight-wing: 200*0.1875 = 37.5 white-eyed, curly-wing: 200*0.0625 = 12.5 Since all expected counts are >5, then we can continue

Step 3: Do Χ 2 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 Χ 2 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 Χ 2 = 101−112.5 2 112.5 + 42−37.5 2 37.5 + 49−37.5 2 37.5 + 10−12.5 2 12.5 Χ 2 =1.1756+0.54+3.5267+0.5=5.742 Using Table E we get a p-value between 0.10 and 0.15. Using our calculator: go to DISTR, down to option 8 χ2cdf(χ2, 999, df) = χ2cdf(5.742, 999, 3) = 0.1248 Try STAT, TESTS, go down to D: χ2GOF-Test

Step 4: Conclude The P-value of 0 Step 4: Conclude The P-value of 0.1248 indicates that the probability of obtaining a sample of 200 fruit fly offspring in which the proportions differ from the hypothesized values by at least as much as the ones in our sample is over 12%, assuming that the null hypothesis is true. This is not enough evidence to reject the biologists’ predicted distribution.

Follow-up analysis Even though there is evidence that the distribution has changed significantly, one must look at the individual components of 2 to see where the largest changes have occurred

Going back to the zodiac problem, I want to know whether births of successful people are uniformly distributed across the signs of the zodiac or not. H0: births are uniformly distributed over zodiac signs Ha: births are not uniformly distributed over zodiac signs Use a chi-square goodness-of-fit test if conditions are met.

Check conditions: Random: this is a convenience sample of executives but there’s no reason to suspect bias Expected number: 1/12*256 = 21.3 These are > 5 so this condition is satisfied. Df = 12-1 = 11

Χ 2 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑐𝑜𝑢𝑛𝑡−𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 2 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑐𝑜𝑢𝑛𝑡 χ2 = 5.094 P-value = χ2 cdf(5.094, 999,11) = 0.9265 The P-value of 0.926 says that if the zodiac signs of executives were in fact distributed uniformly, an observed chi-square value of 5.09 or higher would occur about 93% of the time. This certainly isn’t unusual, so I fail to reject the null hypothesis, and conclude that these data show virtually no evidence of nonuniform distribution of zodiac signs among executives.